CSA4050: Advanced Techniques in NLP

CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT Statistical MT

Statistical Translation • Robust • Domain independent • Extensible • Does not require language specialists • Uses noisy channel model of translation Statistical MT

Noisy Channel ModelSentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence Statistical MT

The Problem of Translation • Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e. find S that maximises P(S|T) • By Bayes' theorem P(S|T) = P(S) x P(T|S) P(T) whose denominator is independent of S. • Hence it suffices to maximise P(S) x P(T|S) Statistical MT

A Statistical MT System S T Source Language Model Translation Model P(S) * P(T|S) = P(S|T) T S Decoder Statistical MT

The Three Components of a Statistical MT model • Method for computing language model probabilities (P(S)) • Method for computing translation probabilities (P(S|T)) • Method for searching amongst source sentences for one that maximisesP(S) * P(T|S) Statistical MT

A Simple Alignment Based Translation Model Assumption: target sentence is generated from the source sentence word-by-word S: John loves Mary T: Jean aime Marie Statistical MT

Sentence Translation Probability • According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words. • P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary) Statistical MT

More Realistic Example The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant Statistical MT

Some Further Parameters • Word Translation Probability:P(t|s) • Fertility: the number of words in the target that are paired with each source word: (0 – N) • Distortion: the difference in sentence position between the source word and the target word: P(i|j,l) Statistical MT

Searching • Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *) • Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) * Statistical MT

Parameter Estimation • In general - large quantities of data • For language model, we need only source language text. • For translation model, we need pairs of sentences that are translations of each other. • Use EM Algorithm (Baum 1972) to optimize model parameters. Statistical MT

Experiment 1 (Brown et. al. 1990) • Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language. • Considered 9,000 most common words in each language. • Assumptions (initial parameter values) • each of the 9000 target words equally likely as translations of each of the source words. • each of the fertilities from 0 to 25 equally likely for each of the 9000 source words • each target position equally likely given each source position and target length Statistical MT

French Probability le .610 la .178 l’ .083 les .023 ce .013 il .012 de .009 à .007 que .007 Fertility Probability 1 .871 0 .124 2 .004 English: the Statistical MT

French Probability pas .469 ne .460 non .024 pas du tout .003 faux .003 plus .002 ce .002 que .002 jamais .002 Fertility Probability 2 .758 0 .133 1 .106 English: not Statistical MT

French Probability bravo .992 entendre .005 entendu .002 entends .001 Fertility Probability 0 .584 1 .416 English: hear Statistical MT

Bajada 2003/4 • 400 sentence pairs from Malta/EU accession treaty • Three different types of alignment • Paragraph (precision 97% recall 97%) • Sentence (precision 91% recall 95%) • Word: 2 translation models • Model 1: distortion independent • Model 2: distortion dependent Statistical MT

Bajada 2003/4 Statistical MT

Experiment 2 • Perform translation using 1000 most frequent words in the English corpus. • 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary. • 117,000 pairs of sentences completely covered by both vocabularies. • Parameters of English language model from 570,000 sentences in English part. Statistical MT

Experiment 2 contd • 73 French sentences tested from elsewhere in corpus. Results were classified as • Exact – same as actual translation • Alternate – same meaning • Different – legitimate translation but different meaning • Wrong – could not be intepreted as a translation • Ungrammatical – grammatically deficient • Corrections to the last three categories were made and keystrokes were counted Statistical MT

Results Statistical MT

Results - Discussion • According to Brown et. al., system performed successfully 48% of the time (first three categories). • 776 keystrokes needed to repair 1916 keystrokes to generate all 73 translations from scratch. • According to authors, system therefore reduces work by 60%. Statistical MT

Bibliography • Statistical MTBrown et. al., A Statistical Approach to MT, Computational Linguistics 16.2, 1990 pp79-85 (search “ACL Anthology”) Statistical MT

CSA4050: Advanced Techniques in NLP