Developing affordable technologies for resource-poor languages - PowerPoint PPT Presentation

developing affordable technologies for resource poor languages n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Developing affordable technologies for resource-poor languages PowerPoint Presentation
Download Presentation
Developing affordable technologies for resource-poor languages

play fullscreen
1 / 24
Developing affordable technologies for resource-poor languages
125 Views
Download Presentation
lovey
Download Presentation

Developing affordable technologies for resource-poor languages

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004

  2. dot = language AMTA 2002

  3. Motivation Resource-poor scenarios • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, health warnings, etc.) • Formalize a potentially endangered language Affordable technologies, such as • spell-checkers, • on-line dictionaries, • Machine Translation (MT) systems, • computer assisted tutoring AMTA 2002

  4. AVENUE Partners AMTA 2002

  5. Mapudungun for the Mapuche Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun AMTA 2002

  6. What’s Machine Translation (MT)? Japanese sentence Swahili sentence AMTA 2002

  7. Speech to Speech MT AMTA 2002

  8. Why Machine Translation for resource-poor (indigenous) languages? • Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) • Benefits include: • Better government access to indigenous communities (Epidemics, crop failures, etc.) • Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. • Language preservation • Civilian and military applications (disaster relief) AMTA 2002

  9. MT for resource-poor languages: Challenges • Minimal amount of parallel text (oral tradition) • Possibly competing standards for orthography/spelling • Often relatively few trained linguists • Access to native informants possible • Need to minimize development time and cost AMTA 2002

  10. Machine Translation Pyramid Interlingua interpretation Transfer rules Corpus-based methods generation analysis I saw you Yo vi tú AMTA 2002

  11. {VP,3} VP::VP : [VP NP] -> [VP NP] ( (X1::Y1) (X2::Y2) ((x2 case) = acc) ((x0 obj) = x2) ((x0 agr) = (x1 agr)) (y2 == (y0 obj)) ((y0 tense) = (x0 tense)) ((y0 agr) = (y1 agr))) AVENUE MT system overview V::V |: [stayed] -> [quedó] ((X1::Y1) ((x0 form) = stay) ((x0 actform) = stayed) ((x0 tense) = past-pp) ((y0 agr pers) = 3) ((y0 agr num) = sg)) \spa Una mujer se quedó en casa \map Kie domo mlewey ruka mew \eng One woman stayed at home. AMTA 2002

  12. Interactive and Automatic Refinement of Translation RulesOr: How to recycle corrections of MT output back into the MT system by adjusting and adapting the grammar and lexical rules

  13. Error correction by non-expert bilingual users AMTA 2002

  14. Interactive elicitation of MT errors Assumptions: • non-expert bilingual users can reliably detect and minimally correct MT errors, given: • SL sentence (I saw you) • TL sentence (Yo vi tú) • word-to-word alignments (I-yo, saw-vi, you-tú) • (context) • using an online GUI: the Translation Correction Tool (TCTool) Goal: • simplify MT correction task maximally AMTA 2002

  15. TranslationCorrectionTool Actions: AMTA 2002

  16. SL + best TL picked by user AMTA 2002

  17. Changing “grande” into “gran” AMTA 2002

  18. AMTA 2002

  19. AMTA 2002

  20. Automatic Rule Refinement Framework • Find best RR operations given a: • grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) AMTA 2002

  21. Types of RR operations • Grammar: • R0  R0 + R1 [=R0’ + contr] Cov[R0]  Cov[R0,R1] • R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] • R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] • Lexicon • Lex0  Lex0 + Lex1[=Lex0 + constr] • Lex0  Lex1[=Lex0 + constr] • Lex0  Lex1[Lex0 +  TLword] •   Lex1 (adding lexical item) AMTA 2002

  22. Questions & Discussion Thanks! AMTA 2002

  23. Formalizing Error Information Wi = error Wi’ = correction Wc = clue word Example: SL: the red car - TL: *el auto roja TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto AMTA 2002

  24. Finding Triggering Features Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. If  set is empty, need to postulate a new binary feature Delta function: AMTA 2002