1 / 15

External Tools Not Only for ArabTeX Documents

External Tools Not Only for ArabTeX Documents. Karel Mokr y Otakar Smrz Faculty of Mathematics and Physics Charles University in Prague. … which include. ArabCode – nontrivial conversion of encoding standards of Arabic script

bpinson
Download Presentation

External Tools Not Only for ArabTeX Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. External Tools Not Only for ArabTeX Documents Karel MokryOtakar Smrz Faculty of Mathematics and Physics Charles University in Prague Processing of Arabic at FLM

  2. … which include • ArabCode – nontrivial conversion of encoding standards of Arabic script • ArabSpell – rule-driven spelling system suited especially for vocalized Arabic encoded in ArabTeX notation • acolor.sty – package for control over coloring in ArabTeX and LaTeX typesetting systems Processing of Arabic at FLM

  3. ArabTeX encoding concept • Lower ASCII, human-readable, rather phonetic • Algorithmic determination of several phenomena of Arabic script • Evaluation of context, parametric interpretation • Contemporary and historical orthography <iqra’ h_a_dAan-na.s.sa bi-intibAhiN> versus Aiqora>o h`*aA{ln~aS~a bi{notibaAhK Processing of Arabic at FLM

  4. Ordinary graphemic approach • Unicode / Unicode Transformation Format (UTF) with great descriptive scope Ux0639 / 0xD8 0xB9 (Arabic `ayn) 0000 0110 0011 1001/1101 10001011 1001 Ux004C / 0x4C (Latin L) 0000 0000 0100 1100/0100 1100 • Windows CP 1256, ISO 8859-6, ASMO 449 etc. • Buckwalter Transliteration using lower ASCII Processing of Arabic at FLM

  5. ArabCode solution • Set of subroutines and scripts in Perl • Complex ArabTeX  UTF / Unicode • Documented Unicode  UTF • Quite easy UTF / Unicode  Windows  ISO  ASMO  Buckwalter  etc. • CurrentlyArabTeX  Windows and Windows  UTF  ISO  ASMO  Buckwalter Processing of Arabic at FLM

  6. ArabCode method • Considering problem ArabTeX  UTF / Unicode • Present: • Regular expressions – system tool, fast and safe • Rules wired-in in the code – hard to maintain, inflexible … • Future: • Finite-state transducer – most adequate, use of own implementation may slow computation down • External grammar – clear and extensible rules Processing of Arabic at FLM

  7. ArabSpell motivation • Spell-checking of entries of human-edited lexical database • Supervision over misuse of notation, document consistency requirement • Trial and error way of teaching it • One version already applied to educational purpose documents and a book of Arabic proverbs Processing of Arabic at FLM

  8. ArabSpell novel concept • Separation of the definition of the language and the response from the spell-checking engine • Right Linear Grammar and convenient syntax source :<code>:<text>target <text> • Nondeterministic Finite Automaton and its construction from the grammar t “” t x t source e target :<code>: Processing of Arabic at FLM

  9. Grammar of Arabic syllable • Nonterm generative rules syllable :< "Unruly input!" >: [C][V][C+empty]syllable [C][V][C+empty] [C][ending] • Cluster definition rules … [C] :<>: <'><b><t><_t><^g><.h><_h><d><_d><r><z><s><^s><.s><.d><.t><.z><`><.g><f><q><k><l><m><n><h><w><y> [V] :<>: <a><i><u><A><I><U>:<>: Processing of Arabic at FLM

  10. … continuation <_a>:< "Dagger 'alif occurred." >: <aa>:< "Use <A> instead!" >: <iy>:< "Use <I> instead!" >: <uw>:< "Use <U> instead!" >: [ending]:< "Invalid ending?" >:<uN> <iN> <aN><aNY><Y>:<>:<aNA><UA><aW> <aWA>:< "Silent 'alif enforced." >: [empty]:<>:<> # see [C+empty] above • Multi-functionality of the :<>: operator Processing of Arabic at FLM

  11. ArabSpell features • Clusters enable eminent network optimization • Spelling :< Perl subroutines >: extend the class of languages beyond regular ones • Bracket matching, word repetition • Control over long-distance dependencies • Easy counting, e.g. word and sentence length • Reports in different language versions • Detailed yet flexible grammar for Arabic, models of other formalizable languages Processing of Arabic at FLM

  12. Using acolor.sty • Typesetting Arabic script in color with ArabTeX • Text marking, hide-and-check of diacritics • Primers, textbooks, educational purposes • Coloring commands combined with original ArabTeX vocalization control • No modification of the input data themselves Processing of Arabic at FLM

  13. … for any diacritics \coldia{red}\fullvocalize\accentshigh \nocolshadda\colother{blue}\vocalize \nocolall\colhamza{green}\vocalize Processing of Arabic at FLM

  14. … for other marking \nocolall\colbeginning{blue}\novocalize \nocolall\colshadda{white}\novocalize \colisolated{red}\vocalize\accentslow Processing of Arabic at FLM

  15. Acknowledgement • Arabic script displays in this presentation were typeset using the ArabTeX package for TeX and LaTeX by Prof. Dr. Klaus Lagally of the University of Stuttgart. Existence of this system has inspired our work principally. Processing of Arabic at FLM

More Related