1 / 19

machine translation the Wiki way

Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling

harper
Download Presentation

machine translation the Wiki way

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling Maskinoversettelse Tłumaczenie maszynowe Tradução automática Traducere automată Машинный перевод Maskinöversättning การแปลภาษาอัตโนมัติ 机器翻译 machine translationthe Wiki way Bittlingmayer Adam Mathias 27 February 2007 University of Washington LING 575 – Machine Translation

  2. machine translation the Wiki way introduction to Wikipedia technical details and editing low-density languages parallelness of corpora named entities other entities disambiguation categorization problems papers

  3. introduction to Wikipedia en.wikipedia.org

  4. introduction to Wikipedia en.wikipedia.org Wikipedia (IPA: /ˌwiːkiːˈpiːdi.ə/ or /ˌwɪːkiːˈpiːdi.ə/) is a multilingual, Web-based, free contentencyclopedia project. Wikipedia is written collaboratively by volunteers; its articles can be edited by anyone with access to the Web site.

  5. introduction to Wikipedia en.wikipedia.org the Wiki family lots of languages - unevenly distributed lots of topics – unevenly distributed growing fast respectability

  6. technical details and editing technical details structure layout content rules tags and templates redirect and disambiguation markup

  7. technical details and editing editing anyone locking and blocking disputes version control

  8. technical details and editing Fei_Xia example

  9. low-density languages predictably lacking X-English / English-X usually good using related languages

  10. parallelness of corpora degrees determinants of parallelness mapping

  11. named entities article titles abbreviations and acronyms place names company names personal names

  12. other entities events dates titles technical terms

  13. disambiguation

  14. categorization

  15. problems incompleteness inconsistency foreign words moving target

  16. papers monolingual semantics errors and reliability WordNet using Wikipedia’s structure multilingual named entities parallel sentence generation

  17. papers parallel sentence generation 1. compare with Babelfished version create aligned sentences with Babelfish pair off with best scoring sentence from the Wiki article 2. bootstrap from article titles create aligned sentences by replacing linked words with equivalent translate the rest by throwing shrinking N-grams into Wiki search pair off with best scoring sentence from the Wiki article

  18. conclusions seed or bootstrap with traditional methods fill holes with Wikipedia hybrid systems lots of research to be done

  19. questions general Chinese company names cn/hk/tw issues abbreviations/acronyms many languages with one writing system using links to find word divisions

More Related