1 / 18

Dynamic Glyph Generation

Dynamic Glyph Generation. Based on variable length encoding schema. Yap Cheah Shen eForth Technology. Glyph & Typesetting Workshop Kyoto, 29Nov2003. Outline of Presentation. Morpheme: Latin vs. Han Latin text encoding Missing character in Chinese text Solution Implementation details

meriel
Download Presentation

Dynamic Glyph Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Glyph Generation Based on variable length encoding schema Yap Cheah Shen eForth Technology. Glyph & Typesetting Workshop Kyoto, 29Nov2003

  2. Outline of Presentation • Morpheme: Latin vs. Han • Latin text encoding • Missing character in Chinese text • Solution • Implementation details • Glyph decomposition database • Topological conversion of strokes • Automatic frame calculation • Integrating to existing OS • Other issue

  3. Morpheme: Latin vs. Han • Morpheme is the smallest meaningful unit in a language. • For Latin text, it is “word”. • For Chinese text, it is Hanzi or Kanji. • Representing a real-world idea, morpheme keeps changing from time to time • Morphemes form an open-set.

  4. Latin Text Encoding • Alphabets form a fix set of symbols. • All words can be represented as sequences of alphabets. • They are the ideal encoding units for Latin text; e.g., ASCII. • No “missing word” encoding problem.

  5. Missing Characters in Chinese Text • Not all existing Hanzi are encoded. • Hanzi are in an open-set , theoretically, historically and practically. • Wrong assumptions and designs of existing encoding schema. • Unending loop of assigning code point, OS update, new font, new input method table Industries are happy. (users suffer)

  6. Solution-1 • Parts or components as encoding unit. 日 月 金 木 水 火 土 人 心 手 口 女 艹 疒 犭 • Most characters can be represented by a finite set of basic parts. • Strokes are used to construct rarely used parts.( thousand of parts appear only once or twice)

  7. Solution -2 • A close-set of basic parts and strokes as encoding unit. • 3 Joining operator : horizontal , vertical, and enclosing. • 1 Shielding operator : for hiding stroke • Prefix notation : allowing recursive composition.

  8. Solution-3 • Ordinary CJK fix-length encoding schema, numeric value as character code. • Input method table • Convert input keystroke to character code. • Static Font file • Glyph data is pre-designed • Access glyph data by character code. • Text file • Sequence of character code.

  9. Solution-4 • Additional feature of variable length encoding CJK environment. • Input • Character can be sorted, filtered by parts. • Compatible with any existing input method. • Display • Font file stores commonly used characters and parts. • Generate glyph on the fly by glyph descriptive sequence. • Storage and data-exchange • Compatible with Unicode. • Ideographic description sequence.

  10. Dynamic Glyph Generator • Input: • Various type of Variable length descriptive character code sequence. • 構字式 of Academia Sinica • 組字式 of CBETA • Unicode ideographic descriptive characters • Output: display & print • True-type compatible outline • Rasterized bitmap. • Macromedia Flash, SVG • The Task: a layout problem, fitting a 1 dimensional sequence into a 2 dimensional square.

  11. Implementation -1 The system consists of 3 major parts • Glyph decomposition database • Courtesy of Prof. Hsieh from Academia Sinica, Taiwan http://www.sinica.edu.tw/~cdp/ • Outline of strokes and components • Beijing ZhongYi Co. professional outline font vendor. http://www.zhongyicts.com.cn/ • The eForth system: putting everything together, hardware-software co-engineering.

  12. Implementation-2 • Glyph decomposition database • All CJK glyph defined by Unicode 4.0 , 71000+ in total. • 549 basic parts, stroke sequence are preserved • 3996 total parts • Total parts frequency :165122 • Accumulated frequency: • Top 50 : 51389 = 31% • Top 200 : 87381 = 53% • Top 1000: 129393 = 78%

  13. Implementation-3 • Stroke are describe as a outline with skeletal line. • Both outline and skeletal line are Quadric Bezier curves. • Outline points are recalculated according to scaled- skeletal line. • Result: • Stroke data is highly reusable • Stroke weights are adjustable

  14. Implementation-4 • Automatic frame calculation • Algorithm of estimating the complexity of each parts, to decide the proportion of the part in result glyph. • 漁: 氵25%, 魚 70% , roughly. • 觀 : 雚 55%, 見 40%, roughly. • Result: • Clear glyph descriptive expressions • Search engine friendly • Human readable

  15. Integrating into existing OS/GUI • String manipulation library • Number of characters • -1 for operators, +1 for characters • Characters width • Graphic sub-system • drawing a text line (e.g. ExtTextOut) • Text handling widgets • Awareness of glyphs expression for caret, selection and delete/backspace.

  16. Other Issues • Quality of the glyph • Trade-off with space: More part outlines, better quality. • Speed of generation • No problem for IBM PC, glyph generation is rare. • For handheld device, Hardware acceleration is recommended.

  17. Examples ⿱ Vertical combination ⿰ Horizontal combination ⿴ enclosing – hide • 盟 = ⿰明皿 or ⿰⿱日月皿 • 李世民 = 民-5 hide 5th stroke • 玄燁 = 玄-5 • 丘-4 = U+20009

  18. Thank You

More Related