1 / 17

Avenue Architecture

Elicitation. Morphology. Rule Learning. Run-Time System. Rule Refinement. Translation Correction Tool. Word-Aligned Parallel Corpus. Learning Module. INPUT TEXT. Run Time Transfer System. Learning Module. Learned Transfer Rules. Rule Refinement Module. Elicitation Corpus.

hanae-rivas
Download Presentation

Avenue Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elicitation Morphology Rule Learning Run-Time System Rule Refinement Translation Correction Tool Word-Aligned Parallel Corpus Learning Module INPUT TEXT Run Time Transfer System Learning Module Learned Transfer Rules Rule Refinement Module Elicitation Corpus Handcrafted rules Morphology Analyzer Decoder Elicitation Tool Lexical Resources OUTPUT TEXT Avenue Architecture

  2. Interactive and Automatic Refinement of translation Rules • Problem: Improve Machine Translation Quality. • Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar. • Approach: Automate post-editing efforts by feeding them back into the MT system. • Automatic refinement of translation rules that caused an error beyond post-editing. • Goal: Improve MT coverage and overall quality.

  3. Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Technical Challenges Automatic Evaluation of Refinement process Elicit minimal MT information from non-expert users

  4. Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Interactive elicitation of error information Error Typology for Automatic Rule Refinement (simplified) Missing word Extra word Wrong word order Incorrect word Wrong agreement

  5. TCTool (Demo) Interactive elicitation of error information Actions: • Add a word • Delete a word • Modify a word • Change word order

  6. Automatic Rule Adaptation NP DET N ADJ NP DET N ADJ NP DET ADJ N NP DET ADJ N Types of Refinement Operations 1. Refine a translation rule: R0  R1 (change R0 to make it more specific or more general) R0: una casa bonito a nice house R1: N gender = ADJ gender a nice house una casa bonita

  7. Automatic Rule Adaptation NP DET NADJ NP DET ADJ N NP DET ADJ N NP DET ADJN Types of Refinement Operations 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (add a new more specific rule) R0: una casa bonita a nice house R1: ADJ type: pre-nominal un gran artista a great artist

  8. Automatic Rule Adaptation A concrete example Error Information Elicitation error Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista correction clue word Refinement Operation Typology

  9. Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Finding Triggering Feature(s): (error word, corrected word) =   need to postulate a new binary feature: feat1 Blame assignment(from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> S,1 … NP,1 … NP,8 … Grammar

  10. Automatic Rule Adaptation Refining Rules • BifurcateNP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))

  11. Automatic Rule Adaptation Refining Lexical Entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +))

  12. Automatic Rule Adaptation Evaluating Improvement • Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: • Corrected Translation Sentence • Original Translation Sentence (labelled as incorrect by the user) un artista gran un gran artista un grande artista *un artista grande

  13. Automatic Rule Adaptation Evaluating Improvement • Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present: • Corrected Translation Sentence • Original Translation Sentence (labelled as incorrect by the user) *un artista gran un gran artista *un grande artista *un artista grande

  14. Challenges and future work • Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace • Order of corrections matters ~ explore rule interactions • Explore the space between batch mode and fully interactive system • Online TCTool always running to collect corrections from bilingual speakers  make it into a game with rewards for the best users

  15. Publications • Font Llitjós, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" EAMT 10th Annual Conference 30-31 May 2005, Budapest, Hungary.    • Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA.   • Font Llitjós, A., K. Probst and J.G. Carbonell . "Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". AMTA, 2004, Washington, USA.    • Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. LREC, 2004. Lisbon, Portugal.   

  16. QuechuaSpanish MT • V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier] • Intensive Quechua course in Centro Bartolome de las Casas (CBC) • Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

  17. Quechua  Spanish prototype MT system Stem Lexicon (semi-automatically generated): 753 lexical entries Suffix lexicon:21 suffixes (150 Cusihuaman) Quechua morphology analyzer 25 translation rules Spanish morphology generation module User-Studies: 10 sentences, 3 users (2 native, 1 non-native)

More Related