1 / 15

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode). Michel Suignard Microsoft Corporation. Objectives. Worldwide single binary Multilingual DTP level on all writing systems Line breaking Font selection word breaking line justification. Challenges.

bconstance
Download Presentation

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Worldwide typography(and how to apply JIS-X-4051-1995 to Unicode) Michel SuignardMicrosoft Corporation

  2. Objectives • Worldwide single binary • Multilingual • DTP level on all writing systems • Line breaking • Font selection • word breaking • line justification

  3. Challenges • Asian typography is not as well known as Western typography • Conflicting requirements • Vertical versus horizontal layout • Latin word wrap off • Ideographic word wrap on • Size of the Unicode repertoire (35K and growing)

  4. JIS-X-4051 • First published in March 1993 • Does not address Unicode repertoire • Limited description of character classification • 2nd edition in October 1995 • Based on JIS-X- 221 (ISO 10646-1) • More detailed Character classification (20 classes) • Covers Line Breaking, Line composition rules, Ruby positioning, Horizontal in Vertical,…

  5. Issues with JIS-X-4051 • Still a subset of Unicode • Character class contents are overlapping,(relying on contextual information not available to General Purpose software) • Single behavior class • Half/Full width characters not covered (user-defined) • Not aligned with most font design(Narrow versus Wide symbols) • Lack some useful features (like line break analysis across white space)

  6. Character classification • Unicode space decomposed in Partitions (set of character ranges) • Each partition share a common behavior across all covered typographic rules • Partitions are mapped to classes specific to each rules (e.g. line breaking, font selection, etc…)

  7. Typical usage After behavior class Before behavior class

  8. Line breaking 何語を話しますか。「私は英語を話します。」 何語を話しますか。「私は英語を話します。」 • Kinsoku rules, to avoid this: or • Stricter rules for small kana (like in フェ) • Keep numeric expressions together, including postfix and prefix symbols • Allows French typography rules (no break between last word and ‘:;?!’, even if separated by a space character) • Disable Latin word wrap • Keep ideographic characters together

  9. Line breaking classes Partitions mapped into 15 classes: • 1. Opening characters • 2. Closing characters • 3. No start ideographic • 4. Exclamation/interrogation • 5. Inseparable • 6. Prefix • 7. Postfix • 8. Ideographic • 9. Numeral sequence • 10. Alpha space • 11. Alpha characters/symbols • 12. Glue Characters • 13. Slash • 14. Quotation characters • 15. Numeric separators

  10. Line breaking behavior table

  11. Width modification and auto-spacing • Width Modification (contextual kerning):( (text) )becomes((text)) • Auto-spacing (add space between ideographic text and Western or numeric text)漢字western text漢字becomes:漢字 western text 漢字

  12. Font selection scenario A new font is applied to a large multilingual selection of text. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. Assume we want to change the font of the English text, but still selecting the whole text: And we apply the ‘Haettenschweiler’ font to it, it is desirable to only affect the Latin text. あの映画は日本の映画ですか。Is that movie a Japanese movie?ええ、そうです。Yes, it is. It is similar situation when we want to apply an Asian face to the Japanese text (like HG) あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is.

  13. Font selection based on character code point and context • Because there are no global Unicode fonts(fonts usually covers a group of writing systems) • Language is an important context selector to determine appropriate font(CJK context, ASCII symbols, Narrow versus Wide Greek and Cyrillic characters) • Some writing systems require several glyphs per characters and are better handled by having specialized fonts(Arabic, Hindi) • A large number of punctuation are shared among writing systems with non shareable typeface (e.g. Period ‘.’ between Latin and Armenian)

  14. Ruby overhanging • Commonly used name to describe the association of pronunciation characters associated with base characters. • The Ruby sequence may be allowed to overhang on top of preceding or following the base characters as long as it doesn’t introduce confusion. • The classification allows to determine in which manner characters can be overhung: • No overhanging (e.g. CJK Ideographs), • Allowed only Before (e.g. Open quotes) • Allowed only After (e.g. Close quotes) • Allowed in both case (e.g. Hiragana)

  15. Conclusion / Findings • A detailed analysis of the Unicode repertoire along common behavior is a powerful tool to construct sophisticated typographical effects. • Typographic complexity should be expressed as much as possible in tables and properties, not in code. • Many behaviors are correlated, allowing the usage of a limited number of Unicode partitions for many behavior descriptions.

More Related