Machine translation
1 / 72

Machine Translation - PowerPoint PPT Presentation

  • Uploaded on

Machine Translation. Om Damani (Ack: Material taken from JurafskyMartin 2 nd Ed., Brown et. al. 1993). State of the Art. The spirit is willing but the flesh is weak. English-Russian Translation System. Дух охотно готов но плоть слаба. Russian-English Translation System.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Machine Translation' - sloane-bryan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Machine translation

Machine Translation

Om Damani

(Ack: Material taken from JurafskyMartin 2nd Ed., Brown et. al. 1993)

State of the art
State of the Art

The spirit is willing but the flesh is weak

English-Russian Translation System

Дух охотно готов но плоть слаба

Russian-English Translation System

The vodka is good, but the meat is rotten

Babelfish: Spirit is willingly ready but flesh it is weak

Google: The spirit is willing but the flesh is week

State of the art english hindi march 19 2009
State of the Art (English-Hindi) – March 19, 2009

The spirit is willing but the flesh is weak

Google English-Hindi Translation System

आत्मा पर शरीर दुर्बल है

Google Hindi-English Translation System

Spirit on the flesh is weak

Is state of the art english hindi so bad
Is State of the Art (English-Hindi) so bad

Is state of the art so bad

Google English-Hindi Translation System

कला की हालत इतनी खराब है

Google Hindi-English Translation System

The state of the art is so bad

State of the english hindi translation is not so bad
State of the english-hindi translation is not so bad

State of the english hindi translation is not so bad

Google English-Hindi Translation System

राज्य के अंग्रेज़ी हिन्दी अनुवाद का इतना बुरा नहीं है

Google Hindi-English Translation System

State of the English translation of English is not so bad

OK. Maybe it is __ bad.

State of the english hindi translation is not so bad1
State of the English-Hindi translation is not so bad

State of the English Hindi translation is not so bad

Google English-Hindi Translation System

राज्य में अंग्रेजी से हिंदी अनुवाद का इतना बुरा नहीं है

राज्य के अंग्रेज़ी हिन्दी अनुवाद का इतना बुरा नहीं है

Google Hindi-English Translation System

English to Hindi translation in the state is not so bad

OK. Maybe it is __ __ bad.

Direct transfer limitations
Direct Transfer: Limitations

कई बंगाली कवियों ने इस भूमि के गीत गाए हैं

Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

Morph: कई बंगाली कवि-PL,OBL ने इस भूमि के गीत {गाए है}-PrPer,Pl

Kai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl

Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl

Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl

Final: Many Bengali poets of this land songs have sung

Many Bengali poets have sung songs of this land

Syntax transfer analysis transfer generation
Syntax Transfer (Analysis-Transfer-Generation)

Here phrases NP, VP etc. can be arbitrarily large

Syntax transfer limitations
Syntax Transfer Limitations

He went to Patna -> Vah Patna gaya

He went to Patil -> Vah Patil ke pas gaya

Translation of went depends on the semantics of the object of went

Fatima eats salad with spoon – what happens if you change spoon

Semantic properties need to be included in transfer rules

– Semantic Transfer

Interlingua based transfer

















Interlingua Based Transfer

For this, you contact the farmers of Manchar region or of Khatav taluka.

In theory: N analysis and N transfer modules in stead of N2

In practice: Amazingly complex system to tackle N2 language pairs

Machine translation
Difficulties in Translation – Language Divergence(Concepts from Dorr 1993, Text/Figures fromDave, Parikh and Bhattacharyya 2002)

Constituent Order

Prepositional Stranding

Null Subject

Conflational Divergence

Categorical Divergence

Lost in translation we are talking mostly about syntax not semantics or pragmatics
Lost in Translation: We are talking mostly about syntax, not semantics, or pragmatics

Image from

You: Could you give me a glass of water

Robot: Yes.

….wait..wait..nothing happens..wait…

…Aha, I see…

You: Will you give me a glass of water


CheckPoint semantics, or pragmatics

  • State of the Art

  • Different Approaches

  • Translation Difficulty

  • Need for a novel approach

Statistical machine translation most ridiculous idea ever
Statistical Machine Translation: Most ridiculous idea ever semantics, or pragmatics

Consider all possible partitions of a sentence.

For a given partition,

Consider all possible translations of each part.

Consider all possible combinations of all possible translations

Consider all possible permutations of each combination

And somehow select the best partition/translation/permutation

कई बंगाली कवियोंने इस भूमिकेगीत गाए हैं

Kai Bangalikaviyon ne isbhoomi ke geet gaayehain

To thisspacehave sung songsofmany poets from Bangal

How many combinations are we talking about
How many combinations are we talking about semantics, or pragmatics

Number of choices for a N word sentence

N=20 ??

Number of possible chess games

How do we get the phrase table

इसके लिए semantics, or pragmaticsआप मंचरक्षेत्र के किसानों सॆसंपर्क कीजिए

Forthis youcontact thefarmersof Manchar region

How do we get the Phrase Table

Collect large amount of bi-lingual parallel text.

For each sentence pair,

Consider all possible partitions of both sentences

For a given partition pair,

Consider all possible mapping between parts (phrases) on two side

Somehow assign the probability to each phrase pair

Data sparsity problems in creating phrase table
Data Sparsity Problems in Creating Phrase Table semantics, or pragmatics

Sunil is eating mangoe -> Sunil aam khata hai

Noori is eating banana -> Noori kela khati hai

Sunil is eating banana ->

We need examples of everyone eating everything !!

We want to figure out that eating can be either khata hai or khati hai

And let Language Model select from ‘Sunil kela khata hai’ and

‘Sunil kela khati hai’

Select well-formed sentences among all candidates using LM

Formulating the problem
Formulating the Problem semantics, or pragmatics

. A language model to compute P(E)

. A translation model to compute P(F|E)

. A decoder, which is given F and produces the most probable E

P f e vs p e f
P(F|E) vs. P(E|F) semantics, or pragmatics

P(F|E) is the translation probability – we need to look at the generation

process by which <F,E> pair is obtained.

Parts of F correspond to parts of E. With suitable independence assumptions,

P(F|E) measures whether all parts of E are covered by F.

E can be quite ill-formed.

It is OK if {P(F|E) for an ill-formed E} is greater than the {P(F|E) for a well formed

E}. Multiplication by P(E) should hopefully take care of it.

We do not have that luxury in estimating P(E|F) directly – we will need to

ensure that well-formed E score higher.

Summary: For computing P(F|E), we may make several independence

assumptions that are not valid. P(E) compensated for that.

We need to estimate P(It is raining| बारिश हो रही है) vs.P(rain is happening| बारिश हो रही है)

P(बारिश हो रही है|It is raining) = .02

P(बरसात आ रही है| It is raining) = .03

P(बारिश हो रही है|rain is happening) = .420

CheckPoint semantics, or pragmatics

  • From a parallel corpus, generate probabilistic phrase table

  • Give a sentence, generate various candidate translations using the phrase table

  • Evaluate the candidates using Translation and Language Models

What is the meaning of probability of translation
What is the meaning of Probability of Translation semantics, or pragmatics

  • What is the meaning of P(F|E)

  • By Magic: you simply know P(F|E) for every (E,F)pair – counting in a parallel corpora

  • Or, each word in E generates one word of F, independent of every other word in E or F

  • Or, we need a ‘random process’ to generate F from E

  • A semantic graph G is generated from E and F is generated from G

    • We are no better off. We now have to estimate P(G|E) and P(F|G) for various G and then combine them – How?

    • We may have a deterministic procedure to convert E to G, in which case we still need to estimate P(F|G)

  • A parse tree TE is generated from E; TE is transformed to TF; finally TF is converted into F

    • Can you write the mathematical expression

The generation process
The Generation Process semantics, or pragmatics

  • Partition: Think of all possible partitions of the source language

  • Lexicalization: For a give partition, translate each phrase into the foreign language

  • Spurious insertion: add foreign words that are not attributable to any source phrase

  • Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries

    Try writing the probability expression for the generation process

    We need the notion of alignment

Generation example alignment
Generation Example: Alignment semantics, or pragmatics

Alignment semantics, or pragmatics

A function from target position to source position:

The alignment sequence is: 2,3,4,5,6,6,6

Alignment function A: A(1) = 2, A(2) = 3 ..

A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..

To allow spurious insertion, allow alignment with word 0 (NULL)

No. of possible alignments:


Ibm model 1 generative process
IBM Model 1: Generative Process semantics, or pragmatics

Ibm model 1 basic formulation
IBM Model 1: Basic Formulation semantics, or pragmatics

Ibm model 1 details
IBM Model 1: Details semantics, or pragmatics

  • No assumptions. Above formula is exact.

  • Choosing length: P(J|E) = P(J|E,I) = P(J|I) =

  • Choosing Alignment: all alignments equiprobable

  • Translation Probability

Hmm alignment
HMM Alignment semantics, or pragmatics

  • All alignments are not equally likely

  • Can you guess what properties does an alignment have

  • Alignments tend to be locality preserving – neighboring words tend to get aligned together

  • We would like P(aj) to depend on aj-1

Hmm alignment details
HMM Alignment: Details semantics, or pragmatics

  • P(F,A|J,E) decomposed as P(A|J,E)*P(F|A,J,E) in Model 1

  • Now we will decompose it differently

    • (J is implict, not mentioned in conditional expressions)

  • Alignment Assumption (Markov): Alignment probability of Jth word P(aj) depends only on the alignment of the previous word aj-1

  • Translation assumption: probability of the foreign word fj depends only on the aligned English word eaj

Computing the alignment probability
Computing the Alignment Probability semantics, or pragmatics

  • P(aj|aj-1, I) is written as P(i|i’, I)

  • Assume - probability does not depend on absolute word positions but on the jump-width (i-i’) between words: P (4 | 6, 17) = P (5 | 7, 17)

    • Note: Denominator counts are collected over sentences of all lengths. But sum is performed over only those jump-widths relevant to (i,i’) – For I’=6: -5 to 11 is relevant

Hmm model example
HMM Model - Example semantics, or pragmatics

P(F,A|E) = P(J=10|I=9)*P(2|start,9)*P(इसके|this)*P(-1|2,9)


Enhancing the hmm model
Enhancing the HMM model semantics, or pragmatics

  • Add NULL words in the English to which foreign words can align

  • Condition the alignment on the word class of the previous English word

  • Other suggestions ??

  • What is the problem in making more realistic assumptions

    • How to estimate the parameters of the model

Checkpoint semantics, or pragmatics

  • Generative Process is important for computing probability expressions

  • Model1 and HMM model

  • What about Phrase Probabilities

Training alignment models
Training Alignment Models semantics, or pragmatics

  • Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities:

    • t(f|e) for Model 1

    • lexicon probability P(f|e) and alignment probability P(ai|ai-1,I) for the HMM model

  • How will you compute these probabilities if all you have is a parallel corpora

Intuition interdependence of probabilities
Intuition : Interdependence of Probabilities semantics, or pragmatics

  • If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable

  • If you were given alignments with probabilities then you can compute translation probabilities

  • Looks like a chicken and egg problem

  • Can you write equations expressing one in terms of other

Computing alignment probabilities
Computing Alignment Probabilities semantics, or pragmatics

  • Align. Prob. In terms of trans. Prob. :


  • Compute P(A) in terms of P(A,F)

    • Note: Prior Prob. for all Alignments are equal. We are interested in posterior probabilities.

  • Can you specify translation prob. in terms of align. prob.

Computing translation probabilities
Computing Translation probabilities semantics, or pragmatics




P(संपर्क | contact) =


= (.5*1+.3*1+.9*0)/(.5*3+.3*2+.9*1)


What if alignments had probabilities

Note: It is not .7*1/3 + .5*1/2 + .9*0 ??

Computing translation probabilities maximum likelihood estimate
Computing Translation Probabilities – semantics, or pragmaticsMaximum Likelihood Estimate

Expectation maximization em algorithm
Expectation Maximization (EM) Algorithm semantics, or pragmatics

  • Used when we want maximum likelihood estimate of the parameters of

  • a model when the model depends on hidden variables

  • In present case, parameters are Translation Probabilities, and hidden

  • Variables are alignment probabilities

Init: Start with an arbitrary estimate of parameters

E-step: compute the expected value of hidden variables

M-Step: Recompute the parameters that maximize the likelihood of

data given the expected value of the hidden variables from E-step

Working out alignments for a simplified model 1
Working out alignments for a simplified Model 1 semantics, or pragmatics

  • Ignore the NULL words

  • Assume that every english word aligns with some foreign word (just to reduce the number of alignments for the illustration)

Example of em
Example of EM semantics, or pragmatics

Green house

Casa verde

The house

La case

Init: Assume that any word can generate any word with equal prob:

P(la|house) = 1/3

E step
E-Step semantics, or pragmatics


M step
M-Step semantics, or pragmatics

E step again
E-Step again semantics, or pragmatics





Repeat till convergence

Computing translation probabilities in model 1
Computing Translation Probabilities in Model 1 semantics, or pragmatics

  • E-M algo is fine, but it requires exponential computation

    • For each alignment we recompute alignment probability

    • Translation probability is computed from all alignment probabilities

  • We need efficient algo

Checkpoint semantics, or pragmatics

  • Use of EM algorithm for estimating phrase probabilities under IBM Model-1

  • An example

  • And an efficient algorithm

Generating bi directional alignments
Generating Bi-directional Alignments semantics, or pragmatics

  • Existing models only generate uni-directional alignments

  • Combine two uni-directional alignments to get many-to-many bi-directional alignments

Eng hindi alignment
Eng-Hindi Alignment semantics, or pragmatics

Hindi eng alignment
Hindi-Eng Alignment semantics, or pragmatics

Combining alignments
Combining Alignments semantics, or pragmatics


P=2/3=.67, R=2/7=.3



A different heuristic from moses site
A Different Heuristic from Moses-Site semantics, or pragmatics


neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1))

alignment = intersect(e2f,f2e);

GROW-DIAG(); FINAL(e2f); FINAL(f2e);


iterate until no new points added

for english word e = 0 ... en

for foreign word f = 0 ... fn

if ( e aligned with f )

for each neighboring point ( e-new, f-new ):

if (( e-new, f-new ) in union( e2f, f2e ) and

( e-new not aligned and f-new not aligned ))

add alignment point ( e-new, f-new )


for english word e-new = 0 ... en

for foreign word f-new = 0 ... fn

if ( ( ( e-new, f-new ) in alignment a) and

( e-new not aligned or f-new not aligned ) )

add alignment point ( e-new, f-new )

Proposed Changes:

After growing diagonal

Align the shorter sentence first

And use alignments only from

corresponding directional alignment

Generating phrase alignments
Generating Phrase Alignments semantics, or pragmatics

premier beach vacation


a premier beach vacation destination

एकप्रमुख समुद्र-तटीयगंतव्य है

Phrase alignment probabilities
Phrase Alignment Probabilities semantics, or pragmatics

  • We have been dealing with just one sentence pair.

  • In fact, we have been dealing with just one alignment – the most probable alignment

  • Such a alignment can easily have mistakes, and generate garbage phrases

  • Compute phrase alignment probabilities over entire corpus

Ibm model 3
IBM Model 3 semantics, or pragmatics

Model 1 Generative story

  • Model 1 story seems bizarre

  • Who will first chose the sentence length and then align and then generate

  • A more likely case is

  • - generate translation for each word and then reorder

Model 3 generative story
Model 3 Generative Story semantics, or pragmatics

Model 3 formula
Model 3 Formula semantics, or pragmatics

  • Ignore NULL for a moment

  • Choosing Fertility:

  • Generating words:

  • Aligning words:

Generating spurious words
Generating Spurious Words semantics, or pragmatics

  • Instead of using n(2|NULL) or n(1|NULL)

  • With probability p1, generate a spurious word every time a valid word is generated

  • Ensures that longer sentences generate more spurious words

Machine translation

इसके लिए आप मंचर क्षेत्र के किसानों सॆसंपर्क कीजिए

For this you contact the farmers of Manchar region

Machine translation

इसके लिए क्षेत्र के किसानों सॆआप मंचरक्षेत्र के किसानों सॆसंपर्क कीजिए

Forthis youcontact thefarmersof Manchar region

Machine translation

इसके लिए किसानों सॆ क्षेत्र के किसानों सॆमिलिये

For this you contact the farmers

Machine translation

इसके लिए आप मंचर क्षेत्र के किसानों सॆसंपर्क कीजिए

For this you contact the farmers of Manchar region

Ochney03 heuristic intution
OchNey03 Heuristic: Intution क्षेत्र के किसानों सॆ

  • Decide the intersection

  • Extend it by adding alignments from the union if both the words in union alignment are not already aligned in the final alignment

  • Then add an alignment only if:

    • It already has an adjacent alignment in the final alignment, and,

    • Adding it will not cause any final alignment to have both horizontal and vertical neighbors as final alignments

Smt example
SMT Example क्षेत्र के किसानों सॆ

कई बंगाली कवियोंने इसभूमिकेगीत गाए हैं

Kai Bangalikaviyon ne isbhoomi ke geet gaayehain

To thisspacehave sung songsofmany poets from Bangal

Translation model notations
Translation Model - Notations क्षेत्र के किसानों सॆ

  • F: f1, f2,..,fJ ; E: e1, e2,..eJ

  • P(F|E) not same as P (f1..fJ | e1..eI)

    • What is P(फातिमा चावल खाती है| Fatima eats rice)

  • P(F|E) = P (J, f1..fJ | I, e1..eI)

    • We explicitly mention I and J only when needed

  • We will work with above formulation instead of the alternative

    • P(F|E) = P (w1(F)=f1..wJ(F)=fJ w[J+1](F)=$ | …)