Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

Worldwide typography(and how to apply JIS-X-4051-1995 to Unicode) Michel SuignardMicrosoft Corporation

Objectives • Worldwide single binary • Multilingual • DTP level on all writing systems • Line breaking • Font selection • word breaking • line justification

Challenges • Asian typography is not as well known as Western typography • Conflicting requirements • Vertical versus horizontal layout • Latin word wrap off • Ideographic word wrap on • Size of the Unicode repertoire (35K and growing)

JIS-X-4051 • First published in March 1993 • Does not address Unicode repertoire • Limited description of character classification • 2nd edition in October 1995 • Based on JIS-X- 221 (ISO 10646-1) • More detailed Character classification (20 classes) • Covers Line Breaking, Line composition rules, Ruby positioning, Horizontal in Vertical,…

Issues with JIS-X-4051 • Still a subset of Unicode • Character class contents are overlapping,(relying on contextual information not available to General Purpose software) • Single behavior class • Half/Full width characters not covered (user-defined) • Not aligned with most font design(Narrow versus Wide symbols) • Lack some useful features (like line break analysis across white space)

Character classification • Unicode space decomposed in Partitions (set of character ranges) • Each partition share a common behavior across all covered typographic rules • Partitions are mapped to classes specific to each rules (e.g. line breaking, font selection, etc…)

Typical usage After behavior class Before behavior class

Line breaking 何語を話しますか。「私は英語を話します。」何語を話しますか。「私は英語を話します。」 • Kinsoku rules, to avoid this: or • Stricter rules for small kana (like in フェ) • Keep numeric expressions together, including postfix and prefix symbols • Allows French typography rules (no break between last word and ‘:;?!’, even if separated by a space character) • Disable Latin word wrap • Keep ideographic characters together

Line breaking classes Partitions mapped into 15 classes: • 1. Opening characters • 2. Closing characters • 3. No start ideographic • 4. Exclamation/interrogation • 5. Inseparable • 6. Prefix • 7. Postfix • 8. Ideographic • 9. Numeral sequence • 10. Alpha space • 11. Alpha characters/symbols • 12. Glue Characters • 13. Slash • 14. Quotation characters • 15. Numeric separators

Line breaking behavior table

Width modification and auto-spacing • Width Modification (contextual kerning):( (text) )becomes((text)) • Auto-spacing (add space between ideographic text and Western or numeric text)漢字western text漢字becomes:漢字 western text 漢字

Font selection scenario A new font is applied to a large multilingual selection of text. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. Assume we want to change the font of the English text, but still selecting the whole text: And we apply the ‘Haettenschweiler’ font to it, it is desirable to only affect the Latin text. あの映画は日本の映画ですか。Is that movie a Japanese movie?ええ、そうです。Yes, it is. It is similar situation when we want to apply an Asian face to the Japanese text (like HG) あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is.

Font selection based on character code point and context • Because there are no global Unicode fonts(fonts usually covers a group of writing systems) • Language is an important context selector to determine appropriate font(CJK context, ASCII symbols, Narrow versus Wide Greek and Cyrillic characters) • Some writing systems require several glyphs per characters and are better handled by having specialized fonts(Arabic, Hindi) • A large number of punctuation are shared among writing systems with non shareable typeface (e.g. Period ‘.’ between Latin and Armenian)

Ruby overhanging • Commonly used name to describe the association of pronunciation characters associated with base characters. • The Ruby sequence may be allowed to overhang on top of preceding or following the base characters as long as it doesn’t introduce confusion. • The classification allows to determine in which manner characters can be overhung: • No overhanging (e.g. CJK Ideographs), • Allowed only Before (e.g. Open quotes) • Allowed only After (e.g. Close quotes) • Allowed in both case (e.g. Hiragana)

Conclusion / Findings • A detailed analysis of the Unicode repertoire along common behavior is a powerful tool to construct sophisticated typographical effects. • Typographic complexity should be expressed as much as possible in tables and properties, not in code. • Many behaviors are correlated, allowing the usage of a limited number of Unicode partitions for many behavior descriptions.

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

Presentation Transcript

Typography

Unicode (and Java)

How to use Unicode on your computer

Welcome everybody

TYPOGRAPHY

Typography

Typography

Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode)

Typography

Compact Encodings of Unicode

Unicode Chapter No 5

Time resolved X-ray spectroscopy of NGC 4051

Unicode for Under Resourced Languages

Simple Ways to Improve Typography in Your Web Design

MACH2 WeSpeakMachine

How to Apply

Unicode for Under Resourced Languages

Typography

How to apply

Best Private Engineering Colleges in Kolkata