1 / 20

Review

Review. What is multilingual computing ? Bilingual , trilingual , vs. Multilingual What are the fundamental issues in multi-lingual computing? Representation of each language in a computer Ways to distinguish different scripts

layne
Download Presentation

Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review • What is multilingual computing? • Bilingual, trilingual, vs. Multilingual • What are the fundamental issues in multi-lingual computing? • Representation of each language in a computer • Ways to distinguish different scripts • How can a system be designed so that it can be used by different languages with minimal changes • How can a system be designed so that it can be used for multiple languages

  2. Characteristics of different scripts • What is a script? • What are the different types of scripts and examples of them ? • Token-based/Alphabet-based scripts, • phonetic based scripts, • Ideographs • What is a phonetic transcription system and examples of them? • What is Romanization?

  3. Characteristics of Chinese • Graphemics • Variant writing (e.g. 教 都) • Phonetics ( the sound,音) • Types of phonemes • Semantics (the meaning,義 ) • Independence of meaning

  4. Computer representation of characters • Selection of a finite set of characters → character set • Uniqueness → each character/symbol • Design of a coded character set → codeset • Uniqueness → each codepoint assignment • Different coding length → different codesets • What are the following terms mean? • Codepoint • Length of a codepoint • Code space • Size of a code space • Code range • Order of characters ( in a char. Set vs. a codeset)

  5. What are the different numerical notations? • Decimal notation • Binary notation • Hexadecimal notation • Scalar value • Characteristics of the ASCII codeset • What is the Row-cell notation? • What are character subsets and why? • Character set comparison operations • Codeset comparison operations • Character set • Codepoint assignment • Compatibility

  6. What is an encoding method and why do we need it? • What is the so called high-bit on scheme? • What are the characteristics of GB-2312? • No. of Rows, No. of columns → code space • Code range? • Major subsets? • Full characters vs. half characters • What are the characteristics of Big5 and Etan Big5? • Rows, columns → code space • Major subsets? • What are UDAs and VDAs for? • HKSCS

  7. Other codesets using high-bit on schemes? • Encodings using designation(指定)? • ISO 2022 • Extended Unix Code(EUC) • What is Charset registry and why? • Problems with different codesets? • Compatibility → wrong interpretation of data • Solutions: Codeset announcement(using designation) and conversion → conversion problems

  8. ISO 10646 and Unicode • What are the design principles of ISO 10646? • What are the different coding structures in ISO 10646? • What is the structure of UCS-4? • What is the characteristics of BMP? • What is the structure of BMP? • What is UCS-2? • What is the compatibility zone for? • What is the difference between ISO 10646 and Unicode? • Big Endian vs Little Endian notation: FEFF vs FFFE

  9. What is Extension A and Extension B? • Where were they coded? • What is Surrogate pairs, what is the need for surrogate pairs, and how does it work? • What is UTF, what is its purpose and how does UTF-8 work? • What is the difference between a character and a glyph? • What is the difference between multi-byte character and wide character ?

  10. Input Methods • What is an input method, why do we need it? • What are the different types of input methods? • What is a keyboard-based input method? • How to design an IM? • What is the basic requirement? • What are the limitations? • What information can be used in IM design? • Who are the main users? • Efficiency consideration? • What are the two types of IM? • Applicability and limitations • What is keyboard arrangement, why do we need it?

  11. Software L10N and I18N • What is L10N and why do we need it? • What is I18N and why do we need it? • What are the principles in I18N? • How to design I18N programs? • What is POSIX and what is its purpose? • What is the name of the POSIX facility for a specific region? • What are the components in a POSIX NLS package? • What is a locale and what are the classes in each locale?

  12. POSIX provides a set of interface functions, how are their behaviors defined and in where? • What are the major files in each locale? • If POSIX where never developed, can you still develop an I18N program on top of an operating system? • What is a symbolic name and where are they used? • How do we know the binary code of a symbolic name? • Programming using wide character data type vs multi-byte characters • What is collation and how does it work?

  13. Open systems • What is an open system? • Why do we want open systems? • What are the measurements of an open system? • What is an open specification? • What are the two types of portability issues? • What mechanisms can be used to improve portability or how can we write portable programs?

  14. Output • What are characters, glyphs and fonts? • What are their relationships and/or difference? • Internal representation vs. external representation • What is the difference of character box and bounding box? • Why should there are space between the character box and bounding box? • What does rendering mean? • What are the two different glyph/font representations

  15. What are the characteristics of bitmap fonts and outline fonts? • Representations, scaling (distortion), space requirement, compression • How to deal with distortion in the scaling of bitmap fonts? • Ad hoc smoothing algorithms • Smoothing spline and interpolation • Understanding of Bazier’s cubic curves • Control points and the equations • Why bitmap to outline conversion is needed? • How does erosion work?

  16. Unicode on different platforms • Unicode is supported on what platforms and in what forms? • Unix, Windows, Mac, Linux, • What is a code page? • Can Unicode be used if the operating system is not coded using Unicode? • Why would encoding needs to be specified when compiling a Java program? • What are the data structures supporting multi-byte and Unicode in Java?

  17. I18N vs. multilingual applications • What is the difference between an I18N program and a multilingual application? • Can a multilingual application be designed/implemented using I18N • What needs to be separately considered in the design of multilingual applications • What is the relationship between multi-lingual applications to Unicode?

  18. IDCs and the IDS • What are ideographic description characters(IDCs)? • Different types of IDCs • Why introducing IDCs? • What is a ideograph description sequence? • How is an IDS between expressed? • For a given character, is its IDS unique? • For a given IDS does it uniquely define a character?

  19. Information retrieval • Differences of IRS from Database system • Basic components of an IRS • What is the purpose of VSM? what are the data associated with a VSM? • What are the similarity functions for? • What is term selection for and methods to do term selection • What kinds of information can be used as weights for the VSM?

More Related