Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004

Outline • Introduction • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Preliminary Research • Proposed Research • Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

How to recycle corrections of MT output back into the system by adjusting and adapting the grammar and lexical rules

The Problem General • MT output still requires post-editing. • Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. Avenue specific • Resource-poor scenarios: lack of manual grammar or very small initial grammar. • Need to validate elicitation corpus and automatically learned translation rules . Interactive and Automatic Rule Refinement

Motivation General • Very costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages. Resource-poor scenarios • Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.). • Preservation of language and culture. Interactive and Automatic Rule Refinement

MT Output SL: Mary and Anna are falling TL: María y Ana están cayendo TL’: María y Ana se están cayendo SL: Gaudi was a great artist TL: Gaudi estaba un artista grande TL: Gaudi era un artista grande TL’: Gaudi era un gran artista SL: You saw the woman TL: Viste la mujer TL’: Viste a la mujer TL: Vió la mujer SL: I used my elbow to push the button TL: Usé mi codo que apretar el botón TL’: Usé mi codo para apretar el botón SL: We are building new bridges in the city TL: Nosotros estamos construyendo nuevo puentes dentro la ciudad TL’: Nosotros estamos construyendo nuevo puentes dentro de la ciudad Interactive and Automatic Rule Refinement

Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar Interactive and Automatic Rule Refinement

Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? Interactive and Automatic Rule Refinement

Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning Interactive and Automatic Rule Refinement

Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Interactive and Automatic Rule Refinement

Resource-poor scenarios • No e-data available (often spoken tradition) SMT or EBMT • No computational linguists to write a grammar So how can we even start to think about MT? • That’s what AVENUE is all about Elicitation Corpus + Automatic Rule Learning What do we usually have available in resource-poor scenarios? Bilingual users Interactive and Automatic Rule Refinement

Avenue overview Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement

Avenue overview: my thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Translation Correction Tool Learning Module Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Tool Elicitation Corpus Lexical Resources Lattice Interactive and Automatic Rule Refinement

Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Interactive and Automatic Rule Refinement

Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement

Interactive and Automatic Rule Refinement

Related Work • Post-editing to improve MT systems • minimal post-editing [Allen, 2003] • include user feedback in the MT loop [Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003] • MT error information and classification • [Flanagan, 1994], [White et al., 1994], [Allen 2003], [Niessen et al. 2000] Interactive and Automatic Rule Refinement

Related Work++ • Rule Adaptation • POS tagging: [Lin et al., 1994] • parsing: [Lehman, 1989], [Brill, 2003] • NLU: [Gavaldà, 2000] • MT: [Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise [Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain. [Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL. Interactive and Automatic Rule Refinement

Interactive elicitation of MT errors Assumptions: • non-expert bilingual users can reliably detect and minimally correct MT errors, given: • SL sentence (I saw you) • TL sentence (Yo vi tú) • word-to-word alignments (I-yo, saw-vi, you-tú) • (context) • using an online GUI: the Translation Correction Tool (TCTool) Goal: • simplify MT correction task maximally Interactive and Automatic Rule Refinement

MT error typology for RR (simplified) • missing word • extra word • word order (local vs long-distance, word vs phrase, word change) • incorrect word (sense, form, selectional restrictions, idiom, ...) • agreement (missing constraint, extra agreement constraint) Interactive and Automatic Rule Refinement

Outline • Motivation and Goals • Related Work • Technical Approach • Interactive elicitation of error information • A framework for automatic rule adaptation • Work to Date • Proposed Research • Contributions and Open Questions Interactive and Automatic Rule Refinement

Automatic Rule Refinement Framework • Find best RR operations given a: • grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’) such that TQ2 > TQ1 • Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) Interactive and Automatic Rule Refinement

Types of RR operations • Grammar: • R0  R0 + R1 [=R0’ + contr] Cov[R0]  Cov[R0,R1] • R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] • R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] • Lexicon • Lex0  Lex0 + Lex1[=Lex0 + constr] • Lex0  Lex1[=Lex0 + constr] • Lex0  Lex0 + Lex1[Lex0 +  TLword] •   Lex1 (adding lexical item) bifurcate refine Interactive and Automatic Rule Refinement

Formalizing Error Information Wi = error Wi’ = correction Wc = clue word Example: SL: the red car - TL: *el auto roja TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto need to agree Interactive and Automatic Rule Refinement

Finding Triggering Features Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. If  set is empty, need to postulate a new binary feature Delta function: Interactive and Automatic Rule Refinement

TCTool v0.1 Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order Interactive and Automatic Rule Refinement

TCTool v0.1 specs Interactive elicitation of error information • First five translations from lattice produced by transfer engine. • Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections). • Provides translation correction and error classification help (static tutorial + error example page). • CGI scripts in PERL • Correction interface in JavaScript (Kenneth Sim and Patrick Milholland) Interactive and Automatic Rule Refinement

1st Eng2Spa user study Interactive elicitation of error information [LREC 2004] • Manual grammar: 12 rules + 442 lexical entries • MT error classification (v0.0): 9 linguistically-motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect wordandno translation • Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interactive and Automatic Rule Refinement

Data Analysis Interactive elicitation of error information • Interested in high precision, even at the expense of lower recall • Users did not always fix a translation in the same way • Most of the time, when the final translation was not = gold standard, it was still correct or better (better stylistically) Interactive and Automatic Rule Refinement

Rule Refinement Operations Automatic Rule Adaptation • Organized according to type of actions users can perform to correct a sentence with TCTool • And according to what error information is available (Wc, alignments, …) Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Interactive and Automatic Rule Refinement

Rule Refinement Simulation I Automatic Rule Adaptation Change word order 1. Run SL sentence through the transfer engine Gaudí was a great artist 2. Input SL sentence and up to 5 alternative translation with alignments to Translation Correction Tool. 3. Input user correction log file with transfer engine output to RR module  variable instantiation. 4. Determine appropriate RR operations that need to apply. 5. Modify grammar and lexicon by applying RR ops. 6. Run MT system again with refined grammar and lexicon. Interactive and Automatic Rule Refinement

Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement

Input to RR module • User correction log file • Transfer engine output (+ parse tree): sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

Variable instantiation from log file Correction Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Interactive and Automatic Rule Refinement

Retrieve relevant lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) N::N |: [artist] -> [artista] ((X1::Y1) ((x0 agr pers) = 3) ((x0 agr num) = sg) ((x0 form) = artist) ((x0 semtype) = human)) Interactive and Automatic Rule Refinement

Add lexical entry for “gran” Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) Even if we had morphological analyzer available, no difference between them: grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Lex0  Lex1[Lex0 +  TLword] Interactive and Automatic Rule Refinement

Finding triggering feature(s) Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 Blame assignment: tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Interactive and Automatic Rule Refinement

Refining the rules Wi = grande  POSi = ADJ =Y3, y3 Wc = artist  POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) Interactive and Automatic Rule Refinement

Refining the rules {NP,1008} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ) ) Interactive and Automatic Rule Refinement

Refining the lexical entries ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = +)) Interactive and Automatic Rule Refinement

Done? Not yet • Right now we’ve just increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,1008}. • Need to restrict application of general rule to just post-nominal ADJ: R0  R1[=R0 + constr= -] = NP,8 (general rule)  R2[=R0’ + constr=c +] = NP,1008 (specific rule) Cov[R0]  Cov[R1,R2] Interactive and Automatic Rule Refinement

Add blocking constraint {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ((y3 feat1) = - ) ) Interactive and Automatic Rule Refinement

Refined MT output sl: Gaudi was a great artist tl: GAUDI ERA UN ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UNA ARTISTA GRANDE tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,2:3 "UNA") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> tl: GAUDI ERA UN GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,0:3 "UN") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> tl: GAUDI ERA UNA GRAN ARTISTA tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,1008 (DET,2:3 "UNA") (ADJ,6:4 "GRAN") (N,4:5 "ARTISTA") ) ) ) )> … [same for estaba] Interactive and Automatic Rule Refinement

Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules

Presentation Transcript

Towards Low Resolution Refinement

Towards Interactive and Automatic Refinement of Translation Rules

Automatic Translation of Nominal Compound into Hindi

Automatic Translation of Human Languages

Towards automatic coin classification

Automatic Language Translation

Rules Based Machine Translation

Voice Translation Rules

On translation units and automatic processing

Automatic Abstraction Refinement for GSTE

Towards Automatic Verification of Safety Architectures

Automatic Rectangular Refinement of Affine Hybrid Automata

Towards Intelligent and Interactive Networks

Towards Automatic File Systems

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages

The Rules of Modelling Automatic Generation of Constraint Programs through Refinement

Automatic Generation of Interactive Talking Books.

Robust Interactive Image Segmentation with Automatic Boundary Refinement

Towards Abstraction Refinement in TVLA

Toward Automatic Parallel Adaptive Mesh Refinement

The Rules of Modelling Automatic Generation of Constraint Programs through Refinement

Automatic Abstraction Refinement for GSTE