Finding entries in an on line arabic dictionary
Download
1 / 29

Finding Entries in an On-line Arabic Dictionary - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Finding Entries in an On-line Arabic Dictionary. 27 May 2010 27 th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne. Arabic is not English.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Finding Entries in an On-line Arabic Dictionary' - mira-bond


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Finding entries in an on line arabic dictionary

Finding Entries in an On-line Arabic Dictionary

27 May 2010

27th Annual HCIL Symposium

Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne


Arabic is not english
Arabic is not English

  • Different sounds (e.g., voiceless uvular /q/, retroflex /l/, voiced velar fricative /gh/, glottal stop / ‘ /)

  • Different letters (‏مباريات)

  • Different morphology (templatic vs. affixative)

  • Written form doesn’t reflect spoken dialect

  • Keyboard has different layout/letters


Many informal texts diverge from modern standard arabic
Many informal texts diverge from Modern Standard Arabic

Texts differ from classroom Arabic in orthography, morphology, and lexical content.


Many informal texts diverge from modern standard arabic1
Many informal texts diverge from Modern Standard Arabic

Texts differ from classroom Arabic in orthography, morphology, and lexical content.

Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”



Many informal texts diverge from modern standard arabic2
Many informal texts diverge from Modern Standard Arabic

Texts differ from classroom Arabic in orthography, morphology, and lexical content.

Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”


Many informal texts diverge from modern standard arabic3
Many informal texts diverge from Modern Standard Arabic

Texts differ from classroom Arabic in orthography, morphology, and lexical content.

Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”


Many informal texts diverge from modern standard arabic4
Many informal texts diverge from Modern Standard Arabic

Texts differ from classroom Arabic in orthography, morphology, and lexical content.

Orthographic differences are based on dialect pronunciations, typographical errors,and ... “style.”



Morphologically complex
Morphologically Complex

* (the only forms listed in the dictionary)



The arabic keyboard makes difficult to detect typos likely1
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters are often visually similar


The arabic keyboard makes difficult to detect typos likely2
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters are often visually similar


The arabic keyboard makes difficult to detect typos likely3
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters are often visually similar


The arabic keyboard makes difficult to detect typos likely4
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters also often sound similar (with contrasts not found in English)


The arabic keyboard makes difficult to detect typos likely5
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters also often sound similar

(with contrasts subject to place-assimilation)


The arabic keyboard makes difficult to detect typos likely6
The Arabic keyboard makes difficult-to-detect typos likely

Adjacent letters also often sound similar

(particularly so in some dialect pronunciations)


Putting dym together
Putting DYM…? together

H

ح

  • A query is checked by composing a single-string finite state automaton (FSA) with:

    • weighted keyboard, visual, and sound-based FSTs

    • a dictionary FSA (with weights for dialect variants)

  • The n-best paths yielding unique strings are calculated

  • The corresponding strings are displayed to the user

visual

keyboard

sound-based

HARB, ?ARB, OARB, ....


Show non-verbs

Show verbs



Arabic is not english1
Arabic is not English!

  • One user interface for all languages will not work

  • We must customize the user interface to take into account the unique structure of each language



ad