Caroline Lavecchia , Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France

Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine Translation based on SimulatedAnnealing LREC 2008 Marrakech 29 May 2008

Outline • Statistical Machine Translation (SMT) • Concept of inter-lingual triggers • Our SMT system based on inter-lingual triggers • Word-based approach • Phrase-based approach using Simulated Annealing algorithm (SA) • Experiments • Conclusion LREC 2008 Marrakech 29 May 2008

Introduction Approaches Statistical Machine Translation Introduction: • Given a source sentence S , find the best target sentence T* which maximizes the probability P(T|S) Noisy channel approach T*= argmaxT P(T|S) T* = argmaxTP(T)* P(S|T) Language model Translation Model LREC 2008 Marrakech 29 May 2008

Introduction Approaches Statistical Machine Translation Approaches: • Word-based approach • Translation process is done word-by-word • IBM models (Brown et al., 1993) • Phrase-based approach (Och et al., 1999), (Yamada and Knight, 2001), (Marcu and Wong, 2002) • Better MT system quality • Advantages: • Explicitly models lexical units Ex: rat de bibliothèque  bookworm • Easily captures local reordering Ex: Tour Eiffel  Eiffel tower LREC 2008 Marrakech 29 May 2008

A new translation model based on inter-lingual triggers: • Current translation models are complex • Their estimation needs a lot of time and memory We propose a new translation model based on a simple concept: the triggers. LREC 2008 Marrakech 29 May 2008

Review of triggers Inter-lingual triggers Concept of inter-lingual triggers Triggers in statisticallanguagemodeling: • A trigger is a set composed of a word and its best correlated words. • Triggers are determined by computing Mutual Information (MI) between words on a monolingual corpus. • Gary Kasparov is a chess champion • In statisticallanguagemodeling, triggers allow to enhance the probability of triggeredwordsgiven a triggeringword. LREC 2008 Marrakech 29 May 2008

Review of triggers Inter-lingual triggers Concept of inter-lingual triggers Inter-lingual triggers: Gary Kasparov is a chesschampion Gary Kasparov est un champion d’échecs We hope to find possible translations of samong the set of its triggered target units t1, …, tn • An inter-lingual trigger is a set composed of a source unit s and its best correlated target units: • t1, …, tn. • Inter-lingual triggers are determined by computing Mutual Information (MI) between units on a bilingual aligned corpus. LREC 2008 Marrakech 29 May 2008

Review of triggers Inter-lingual triggers Concept of inter-lingual triggers 1-To-1 triggers: n-To-m triggers: Source: Gary Kasparov est un champion d’échecs Target: Gary Kasparov is a chess champion 1 source word triggers 1target word. nsource words trigger mtarget words. LREC 2008 Marrakech 29 May 2008

SMT based on inter-lingual triggers • How to make good use of inter-lingual triggers in order to estimate a translation model? • Word-based translation model using 1-To-1 triggers • Phrase-based translation model using n-To-m triggers LREC 2008 Marrakech 29 May 2008

SMT based on inter-lingual triggers Word-based translation model Word-based translation model using1-To-1 triggers • For each source word, we keep its k best 1-To-1 triggers. We hope this constitute its potential translations. • Translation model • We assign to each inter-lingual trigger a probability calculated as follow: LREC 2008 Marrakech 29 May 2008

SMT based on inter-lingual triggers Phrase-based translation model Phrase-based translation model using on n-To-m triggers • Motivations: • Most methods for learning phrase translations requirewordalignments • All phrase pairs that are consistent withthiswordalignment are collected phrases with no linguistic motivation noisy phrases LREC 2008 Marrakech 29 May 2008

SMT based on inter-lingual triggers Phrase-basedtranslation model Method for learning phrase translation: Extract phrases from the source corpus Determine potential translations of the source phrases by using n-To-m triggers Start with 1-To-1 triggers to set a baseline MT system Select an optimal subset of n-To-m triggers by Simulated Annealing algorithm LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation Phrase extraction: • Iterativeprocesswhich selects phrases by groupingwordswithhighMutual Information. (Zitouni et al., 2003) • Onlythosewhichimprove the perplexity on the source corpus are kept. → pertinent source phrases LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation Learning potential phrase translation: • A source phrase can be translated by different target sequences of variable sizes. • Assumption: each source phrase of l words can be translated by a sequence of j target words • where j Є [l-Δl, l+Δl] • For each source phrase of length l, potential translations are: • sets of n-To-m triggers with n = l andm Є [l-Δl, l+ Δl] LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation Example: • Source phrase: • porter plainte (l=2) • We assume thatporter plaintecanbetranslated by sequences of 1, 2 or 3 targetwords (Δl=1). LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation General case: • All source phrases associatedwithitskpotential translations constitute the set of n-To-m triggers. • We have to select amongn-To-m triggers pertinent translations and discardnoisyones. • Our problem: find a optimal subset of phrase translations whichleads to the best MT performance Unreasonnable to try all possibilities!!  Proposed method: use Simulated Annealing algorithm LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation SimulatedAnnealing: Initial temperature Initial configuration Pertub the configuration Accept new configuration Technique applied to find an optimal solution to a combinatorial problem no yes Update current configuration Adjust temperature Terminate search Stop yes no LREC 2008 Marrakech 29 May 2008

Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Method for learning phrase translation Algorithmapplied to SMT: • Start with a high temperature T and a baseline word-based MT system using 1-To-1 triggers • do • Perturb the system from state i to state j by randomly adding a subset of n-To-m triggers into the currrent SMT system • Evaluate the performance of the new system (Ej) • If (Ej>Ei) then move from state i to state j Otherwise accepte state j with a probability random(P)<e(Ei-Ej)/T with P Є[0-1] Until the performance of our SMT system stops increasing • Decrease the temperature and go to step 2 until the performance of the system stops increasing LREC 2008 Marrakech 29 May 2008

New system Initial system Text input Text output decoder Bleucurrent  Bleunew Bleuinitial Bleunew > Bleucurrent Bleunew ≤ Bleucurrent Language model Translation model n-To-m triggers 1-To-1 triggers Pertubation of the current system Subset of n-to-m trigers LREC 2008 Marrakech 29 May 2008

Corpora Tuning step Evaluation Experiments Subtitlecopora: • SubtitleparallelcorporabuiltusingDynamic Time Wrappingalgorithm (Lavecchia et al., 2007) LREC 2008 Marrakech 29 May 2008

Corpora Tuning step Evaluation Experiments SA algorithmparameters: • 1-To-1 triggers: all source words associated with its best 50 target words • n-To-m triggers: • 15860 source phrases • all source phrases associated with its 30 best n-To-1, n-To-2 and n-To-3 inter-lingual triggers • Initial temperature:10-4 • System perturbation: adding 10 potential translations of 10 source phrases LREC 2008 Marrakech 29 May 2008

Corpora Tuningstep Evaluation Experiments Initial system: Text input Text output Pharaoh decoder Word translation model Language model(1) • Trigram model • (Brown et al., 1993) LREC 2008 Marrakech 29 May 2008

Corpora Tuning step Evaluation Experiments Final system: Text input Text output Pharaoh decoder Phrase translation model Language model(1) • Trigram model • (Och, 2002) LREC 2008 Marrakech 29 May 2008

Corpora Tuning step Evaluation Experiments Evaluation of the final system: Lead of n-to-m triggers on 1-to-1 triggers not corroborated on the test corpus Explanations: - Over-fitting due to poor amount of data - Corpora of different movie styles Impact of over-fitting more important on the state-of-the-art systems. LREC 2008 Marrakech 29 May 2008

Conclusion and future work: • A new method for learning phrase translations • Extract source phrases • Find phrase translations using inter-lingual triggers • Select the pertinent ones using SA algorithm • advantages: no word alignment + more pertinent phrase translations • Experiments on movie subtitle corpora • More robust on sparse data than a state-of-the-art approach • Better translation quality in terms of Bleu score (+7pts dev., +4pts test) • Improvement of our system • Classify movies • Integrate linguistic knowledge in the translation process •  Considering inter-lingual triggers not only on word surface forms LREC 2008 Marrakech 29 May 2008

Caroline Lavecchia , Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France