xIUA | |||||
To make ICU implementation easier we need a clean migration method. Developers often have large systems that are undergoing development and cannot be frozen during the conversion to ICU. This is why we developed xIUA. It gives users the ability to access the richness of ICU with minimal changes to their existing applications. If appropriate, they can later chose to convert portions of the code, a section at a time, to use UTF-16 to improve performance. It gives them a framework so that they can migrate without tears. They can write a single piece of code that may be used throughout the application that will dynamically adapt to using codepage, UTF-8, UTF-16 or UTF-32 data. It also allows them to add ICU support anywhere. For example we can have a utility function that may nest many layers deep in the code and is used throughout the application. How does one pass information such as locale without changing a significant amount of code? We can not ask developers to have to change all APIs just to add globalization. This was the first major problem that X.Net's clients had. To solve the problem we stored this information in thread locale storage for systems that support multithreading and simulate thread local storage on those that don't. Once we had that mechanism working for cross platform environments we had a structure that we could also pass converters. Then we could convert on the fly so that we did not have to convert all data to UTF-16 but we could do it as needed. To make the locale context complete, we added time zone info because they may also vary by thread. Converting on the fly and calling ICU as needed created another problem. The conversion code need work areas. Having each API calculate the needed storage is too a big burden of the developers. Using malloc/free is high overhead so you need a routine the reuses the storage but does not hold on to large blocks of storage. The xIUA code also provides this management to enhance performance.To really support operating in a mixed environment you also need functions like strtok that have to have separate implementations for codepage, UTF-8, UTF-16 and UTF-32. There are applications that need to process different contexts. For example if we have a PHP script in Shift_JIS, the user browser is running in EUC and the database uses UTF-8 and ICU needs UTF-16. You have to be able to switch contexts, convert between contexts and have callable functions that operate on the data based on the current context. By using a middle layer you can improve performance by putting the smarts into a single place. For example if the browser is using a char set of UTF-8 and your database has a context of UTF-8 it recognizes that it does not need a converter but can just move the data. This approach also allows you better platform independence. You can have an OS context that reflects UTF-16, UTF-32 or UTF-8 Unicode support. It also lets you transform UTF-16, UTF-32 or UTF-8 without needing an ICU converter if that is all that is needed.Another issue is that we believe that to do i18n right you have to understand the internationalization issues. Most programmers, for example, don't understand collation. To give them a collator with all the whistles and bells is a disaster. Different programmers with produce different sequences and things will not match. The differences can be subtle so that to code passes the QA tests but mysteriously does not work in the field. We believe that you need to provide customer tailored APIs that meet the requirements of the facility but that programmers can use consistently. This also isolates people from changes. For example, ICU 1.6, 1.7 and 1.8 could produce different results. But if you changed the calling parameters you could get the code to produce same results. xIUA allows you to make those changes in one place for faster, easier, and more reliable ICU upgrades. For example in ICU 2.8 one of the C APIs was dropped and the only way to emulate the function was in C++. Making this kind of change through out an application can be a real problem. With xIUA it only took a couple of hours to redo the code. |
|||||