1 / 22

Vietnamese Support In Unicode

Goals. Provide information on how Microsoft uses Unicode to support multi-languagesProvide programming technique on how to support Vietnamese using UnicodeProvide technical information on how to move from ANSI to UnicodeShow examples through demos and case-studies. Not covered in this talk. This

noe
Download Presentation

Vietnamese Support In Unicode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Vietnamese Support In Unicode Chu Vu~ Development manager Microsoft Office Complex Scripts

    2. Goals Provide information on how Microsoft uses Unicode to support multi-languages Provide programming technique on how to support Vietnamese using Unicode Provide technical information on how to move from ANSI to Unicode Show examples through demos and case-studies

    3. Not covered in this talk This event does not provide technical information about Unicode Talk in general about globalization Enabling covered Locale not covered Localization not covered

    4. Globalization User Interface Windows, menus, dialogboxes, layout/mirroring, etc Locale Date/time/calendar, currency, paper size, etc Application Universal aware app, multi-language UI, language specifics (justification, type/replace, etc) Input method Simple keyboard layout, IME/telex, etc Localization Translation of UI, help, etc Fonts Bitmap, Vector, TrueType, OpenType, ClearType

    5. Why move to Unicode ? Support multi-languages - Dont limit your application to one language New languages will not have a codepage (Ex. Indic scripts such as Hindi, Tamil) The question is not how much does it cost to move to Unicode ? but when It took many years for Microsoft to move to Unicode with the introduction of Win NT Future OS and applications are built as Unicode

    6. Vietnamese support on Microsoft products Windows 2000 or later (based on NT platform) Office 2000 SA; OfficeXP or later IE 4.1 + Vietnamese language pack Most Microsoft products use combining method for keyboard input and storage Most Microsoft core fonts support both combining and pre-composed characters (ex. Arial, Courier New, Times New Roman, etc) CharMap applet

    7. Unicode 16-bit international character encoding Windows 2000 uses Unicode version 2.1

    8. Vietnamese ranges in Unicode Vietnamese characters are random in Unicode table 0x0041-0x005A A-Z uppercase Latin 0x0061-0x007A a-z lowercase Latin 0x00C2..0x00F4 ,,..,, Latin extended 0x0300-0x0323 huy`n, ho?i, nga~, sac, na?ng 0x0102..0x01B0 A, a, .., U, u 0x1EA0-0x1EF9 A?, a?, .., Y~, y~ 0x20AB d`ng

    9. Advantage of Unicode Unicode makes multi-lingual computing possible. Data sharing between platforms Any language version of applications can run on any version of Win2000/XP Non-Unicode applications behavior depends on the users settings.

    10. Relatives to Unicode UTF-7 7 bit transformation format, seldom used UTF-8 8 bit transformation format For transmission over unknown lines: e.g. Web pages Codepage number CP_UTF8 = 65001 UTF-16 or UCS-2: The standard 16 bit Unicode UTF-32 or UCS-4

    11. Implementing Unicode apps Win32 APIs and C run time lib make it easy to port your existing apps to Unicode Four possibilities (depending on your design): Pure Unicode Dual compile path Support Unicode internally Generic Unicode

    12. Create a pure Unicode application Advantage: Easy to implement Disadvantage: Works only on Windows NT Option 1: pure Unicode

    13. Create two binaries: Default compile for Windows 9x Compile with -DUNICODE and D_UNICODE for NT Advantages: Runs on both platforms Easy to implement Disadvantages: Maintenance of two binaries is messy No Unicode support on Windows 9x Option 2: dual compile path

    14. Always register as ANSI application, convert to/from Unicode as needed Advantages: Moderate engineering effort Uses Unicode on Windows 9x and Windows NT Disadvantages: Does not support new scripts (Devanagari, Tamil, Armenian, Georgian), even when on NT Multi-script support more difficult Option 3: support Unicode internally

    15. Use Unicode everywhere with single binary, two code paths: On Windows NT, use W entry points On Windows 9x, convert Unicode ? ANSI, use A entry points Advantages: Full Unicode support when on Windows NT Use Unicode uniformly in all Win32 apps Disadvantages: Substantial engineering effort Option 4: generic Unicode

    16. Converting between ANSI & Unicode MultiByteToWideChar for codepage ? Unicode WideCharToMultiByte for Unicode ? codepage Codepage can be: Any legal codepage number. Predefined: CP_ACP, CP_SYMBOL, CP_UTF8, etc.

    17. On Windows NT, the A routines are wrappers that: Convert ANSI text to Unicode. Call corresponding W routine. Use system code page for conversion. On Windows 9x A routines are native. Most W routines fail, SetLastError to ERROR_CALL_NOT_IMPLEMENTED. On Windows 2000/XP, you can: Set system locale to any supported value. Reboot. Emulate a localized Windows 9x system. Behavior of W and A routines in Win32

    18. Generic text mapping routines: C run time extensions

    19. Use dual path constants/macros Generic data types and function prototypes Explicit types LPBYTE for byte pointers Replace p++/p with CharNext/CharPrev Compute String length in bytes by: NumChars * sizeof(TCHAR) Compile: Default to ANSI application or -DUNICODE and -D_UNICODE for Unicode Best practices

    20. ANSI codepages or ISO character encodings Disadvantage: Mono-lingual or restricted to one script Raw Unicode: UTF-16 OK for intranet on Windows NT networks May not work for Internet pages Number entities: क OK for occasional use such as inserting characters not in the main script of page Not good for large amounts of multi-lingual text UTF-8: Recommended encoding Works just about everywhere Supported by IE 4.0+ and Netscape 4.0+ Encoding options in Web pages

    21. HTML/DHTML: Tag in the head of the document <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=<value>"> XML: <?xml version=1.0 encoding=<value>?> ASP: Defaults to system codepage (IIS 6.0 not Unicode!) Specify charset using ASP directives: Per session: <%Session.CodePage=<charset>%> Per page: <%@CODEPAGE=<charset>%> To set encoding in Web pages

    22. Universally encoded Web page

    23. Resources General guidelines on internationalization: http://www.microsoft.com/globaldev Information on Unicode implementation: http://www.microsoft.com/msj/0499/multilangUnicode/multilangUnicodetop.htm

More Related