1 / 77

Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules. PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004. Outline. Introduction Related Work Technical Approach Interactive elicitation of error information A framework for automatic rule adaptation Preliminary Research

hunter
Download Presentation

Towards Interactive and Automatic Refinement of Translation Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Interactive and Automatic Refinement of Translation Rules PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004

  2. Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

  3. How to recycle corrections of MT output back into the system by adjusting and adapting the grammar and lexical rules

  4. The Problem General • MT output still requires post-editing. • Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. Avenue specific • Resource-poor scenarios: lack of manual grammar or very small initial grammar. • Need to validate elicitation corpus and automatically learned translation rules . Interactive and Automatic Rule Refinement

  5. Motivation General • Very costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages. Resource-poor scenarios • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.). • Preservation of language and culture. Interactive and Automatic Rule Refinement

  6. MT Output SL: Mary and Anna are falling TL: María y Ana están cayendo TL’: María y Ana se están cayendo SL: Gaudi was a great artist TL: Gaudi estaba un artista grande TL: Gaudi era un artista grande TL’: Gaudi era un gran artista SL: You saw the woman TL: Viste la mujer TL’: Viste a la mujer TL: Vió la mujer SL: I used my elbow to push the button TL: Usé mi codo que apretar el botón TL’: Usé mi codo para apretar el botón SL: We are building new bridges in the city TL: Nosotros estamos construyendo nuevo puentes dentro la ciudad TL’: Nosotros estamos construyendo nuevo puentes dentro de la ciudad Interactive and Automatic Rule Refinement

  7. Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar Interactive and Automatic Rule Refinement

  8. Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? Interactive and Automatic Rule Refinement

  9. Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning Interactive and Automatic Rule Refinement

  10. Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Interactive and Automatic Rule Refinement

  11. Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Bilingual users Interactive and Automatic Rule Refinement

  12. Avenue overview Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement

  13. Avenue overview: my thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement

  14. Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Interactive and Automatic Rule Refinement

  15. Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement

  16. Interactive and Automatic Rule Refinement

  17. Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

  18. Related Work • Post-editing to improve MT systems • minimal post-editing [Allen, 2003] • include user feedback in the MT loop [Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003] • MT error information and classification • [Flanagan, 1994], [White et al., 1994], [Allen 2003], [Niessen et al. 2000] Interactive and Automatic Rule Refinement

  19. Related Work++ • Rule Adaptation • POS tagging: [Lin et al., 1994] • parsing: [Lehman, 1989], [Brill, 2003] • NLU: [Gavaldà, 2000] • MT: [Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise [Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain. [Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL. Interactive and Automatic Rule Refinement

  20. Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

  21. Interactive elicitation of MT errors Assumptions: • non-expert bilingual users can reliably detect and minimally correct MT errors, given: • SL sentence (I saw you) • TL sentence (Yo vi tú) • word-to-word alignments (I-yo, saw-vi, you-tú) • (context) • using an online GUI: the Translation Correction Tool (TCTool) Goal: • simplify MT correction task maximally Interactive and Automatic Rule Refinement

  22. MT error typology for RR (simplified) • missing word • extra word • word order (local vs long-distance, word vs phrase, word change) • incorrect word (sense, form, selectional restrictions, idiom, ...) • agreement (missing constraint, extra agreement constraint) Interactive and Automatic Rule Refinement

  23. Outline • Motivation and Goals • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Work to Date • Proposed Research • Contributions and Open Questions Interactive and Automatic Rule Refinement

  24. Automatic Rule Refinement Framework • Find best RR operations given a: • grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement

  25. Types of RR operations • Grammar: • R0  R0 + R1 [=R0’ + contr] Cov[R0]  Cov[R0,R1] • R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] • R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] • Lexicon • Lex0  Lex0 + Lex1[=Lex0 + constr] • Lex0  Lex1[=Lex0 + constr] • Lex0  Lex0 + Lex1[Lex0 +  TLword] •   Lex1 (adding lexical item) bifurcate refine Interactive and Automatic Rule Refinement

  26. Formalizing Error Information Wi = error Wi’ = correction Wc = clue word Example: SL: the red car - TL: *el auto roja TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto need to agree Interactive and Automatic Rule Refinement

  27. Finding Triggering Features Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. If  set is empty, need to postulate a new binary feature Delta function: Interactive and Automatic Rule Refinement

  28. Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

  29. TCTool v0.1 Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement

  30. TCTool v0.1 specs Interactive elicitation of error information • First five translations from lattice produced by transfer engine. • Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections). • Provides translation correction and error classification help (static tutorial + error example page). • CGI scripts in PERL • Correction interface in JavaScript (Kenneth Sim and Patrick Milholland) Interactive and Automatic Rule Refinement

  31. 1st Eng2Spa user study Interactive elicitation of error information [LREC 2004] • Manual grammar: 12 rules + 442 lexical entries • MT error classification (v0.0): 9 linguistically-motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation • Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interactive and Automatic Rule Refinement

  32. Data Analysis Interactive elicitation of error information • Interested in high precision, even at the expense of lower recall • Users did not always fix a translation in the same way • Most of the time, when the final translation was not = gold standard, it was still correct or better (better stylistically) Interactive and Automatic Rule Refinement

  33. Rule Refinement Operations Automatic Rule Adaptation • Organized according to type of actions users can perform to correct a sentence with TCTool • And according to what error information is available (Wc, alignments, …) Interactive and Automatic Rule Refinement

  34. Automatic Rule Adaptation Interactive and Automatic Rule Refinement

  35. Rule Refinement Simulation I Automatic Rule Adaptation Change word order 1. Run SL sentence through the transfer engine Gaudí was a great artist 2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool. 3. Input user correction log file with transfer engine output to RR module  variable instantiation. 4. Determine appropriate RR operations that need to apply. 5. Modify grammar and lexicon by applying RR ops. 6. Run MT system again with refined grammar and lexicon. Interactive and Automatic Rule Refinement

  36. Automatic Rule Adaptation Interactive and Automatic Rule Refinement

  37. Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement

  38. Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement

  39. Automatic Rule Adaptation Interactive and Automatic Rule Refinement

  40. Input to RR module • User correction log file • Transfer engine output (+ parse tree): sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

  41. Variable instantiation from log file Correction Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement

  42. Retrieve relevant lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) N::N |: [artist] -> [artista] ((X1::Y1) ((x0 agr pers) = 3) ((x0 agr num) = sg) ((x0 form) = artist) ((x0 semtype) = human)) Interactive and Automatic Rule Refinement

  43. Add lexical entry for “gran” Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Even if we had morphological analyzer available, no difference between them: grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Lex0  Lex1[Lex0 +  TLword] Interactive and Automatic Rule Refinement

  44. Finding triggering feature(s) Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 Blame assignment: tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

  45. Refining the rules Wi = grande  POSi = ADJ =Y3, y3 Wc = artist  POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) Interactive and Automatic Rule Refinement

  46. Refining the rules {NP,1008} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ) ) Interactive and Automatic Rule Refinement

  47. Refining the lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement

  48. Done? Not yet • Right now we’ve just increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,1008}. • Need to restrict application of general rule to just post-nominal ADJ: R0  R1[=R0 + constr= -] = NP,8 (general rule)  R2[=R0’ + constr=c +] = NP,1008 (specific rule) Cov[R0]  Cov[R1,R2] Interactive and Automatic Rule Refinement

  49. Add blocking constraint {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ((y3 feat1) = - ) ) Interactive and Automatic Rule Refinement

  50. Refined MT output sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UNA ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,2:3 "UNA") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UN GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,0:3 "UN") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> tl: GAUDI ERA UNA GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,2:3 "UNA") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> … [same for estaba] Interactive and Automatic Rule Refinement

More Related