1 / 34

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT). Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011. Language Divergence Theory: Lexico -Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002).

elvin
Download Presentation

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 17– Alignment in SMT) Pushpak BhattacharyyaCSE Dept., IIT Bombay 14th Feb, 2011

  2. Language Divergence Theory: Lexico-Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002) • Conflational divergence • F: vomir; E: to be sick • E: stab; H: churaa se maaranaa (knife-with hit) • S: Utrymningsplan; E: escape plan • Structural divergence • E: SVO; H: SOV • Categorial divergence • Change is in POS category (many examples discussed) • Head swapping divergence • E: Prime Minister of India; H: bhaaratkepradhaanmantrii (India-of Prime Minister) • Lexical divergence • E: advise; H: paraamarshdenaa (advice give): Noun Incorporation- very common Indian Language Phenomenon

  3. Language Divergence Theory: Syntactic Divergences • Constituent Order divergence • E: Singh, the PM of India, will address the nation today; H: bhaaratkepradhaanmantrii, singh, … (India-of PM, Singh…) • Adjunction Divergence • E: She will visit here in the summer; H: vahyahaagarmiimeMaayegii (she here summer-in will come) • Preposition-Stranding divergence • E: Who do you want to go with?; H: kisakesaathaapjaanaachaahate ho? (who with…) • Null Subject Divergence • E: I will go; H: jaauMgaa (subject dropped) • Pleonastic Divergence • E: It is raining; H: baarish ho rahiihaai (rain happening is: no translation of it)

  4. Alignment • Completely aligned • Your answer is right • Votre response est just • Problematic alignment • We first met in Paris • Nous nous sommes rencontres pour la premiere fois a Paris

  5. The Statistical MT model: notation • Source language: F • Target Language: E • Source language sentence: f • Target language sentence: e • Source language word: wf • Target language word: we

  6. The Statistical MT model To translate f: • Assume that all sentences in E are translations of f with some probability! • Choose the translation with the highest probability

  7. SMT Model • What is a good translation? • Faithful to source • Fluent in target faithfulness fluency

  8. Language Modeling • Task to find P(e) (assigning probabilities to sentences)

  9. Language Modeling: The N-gram approximation • Probability of the word given the previous N-1 words • N=2: bigram approximation • N=3: trigram approximation • Bigram approximation:

  10. Translation Modeling • Task: to find P(f|e) • Cannot use the counts of f and e • Approximate: P(f|e) using the product of word translation probabilities (IBM model 1) Problem: How to calculate word translation probabilities? Note: We do not have counts – training corpus is sentence-aligned, not word-aligned

  11. Word-alignment example (1) (2) (3) (4) Ram has an apple रामके पासएकसेबहै (1) (2)(3) (4) (5) (6)

  12. Expectation Maximization for the translation model

  13. Expectation-Maximization algorithm • Start with uniform word translation probabilities • Use these probabilities to find the counts (fractional) • Use these new counts to recompute the word translation probabilities • Repeat the above steps till values converge Works because of the co-occurrence of words that are actually translations It can be proven that EM converges

  14. The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:

  15. The translation probabilities in IBM Model 1

  16. English-French example of alignment • Completely aligned • Your1 answer2 is3 right4 • Votre1 response2 est3 just4 • Alignment: 11, 22, 33, 44 • Problematic alignment • We1 first2 met3 in4 Paris5 • Nous1 nous2 sommes3 rencontres4 pour5 la6 premiere7 fois8 a9 Paris10 • Alignment: 1(1,2) , 2(5,6,7,8) , 34 , 49 , 510 • Fertilty?: yes

  17. English three rabits a b rabbits of Grenoble bc d French troislapins x y (2) lapins de Grenoble xy z EM for word alignment from sentence alignment: example

  18. Initial Probabilities: each cell denotes t(a w), t(a x) etc.

  19. The counts in IBM Model 1 Works by maximizing P(f|e) over the entire corpus For IBM Model 1, we get the following relationship:

  20. Example of expected count C[aw; (a b)(w x)] t(aw) = ------------------------- X #(a in ‘a b’) X #(w in ‘w x’) t(aw)+t(ax) 1/4 = ----------------- X 1 X 1= 1/2 1/4+1/4

  21. “counts”

  22. Revised probability: example trevised(a w) 1/2 = ---------------------------------------- (½+1/2 +0+0 )(a b)( w x) +(0+0+0+0 )(b c d) (x y z)

  23. Revised probabilities table

  24. “revised counts”

  25. Re-Revised probabilities table Continue until convergence; notice that (b,x) binding gets progressively stronger

  26. Another Example A four-sentence corpus: a b ↔ x y (illustrated book ↔ livreillustrie) b c ↔ x z (book shop ↔ livremagasin) Assuming no null alignments. Possible alignments: a b a b b c b c x y x y x z x z

  27. Iteration 1

  28. Iteration 2

  29. Normalized probabilities: after iteration 2

  30. Normalized probabilities: after iteration 3

  31. Translation Model: Exact expression • Five models for estimating parameters in the expression [2] • Model-1, Model-2, Model-3, Model-4, Model-5 Choose the length of foreign language string given e Choose alignment given e and m Choose the identity of foreign word given e, m, a

  32. Proof of Translation Model: Exact expression ; marginalization ; marginalization m is fixed for a particular f, hence

  33. Model-1 • Simplest model • Assumptions • Pr(m|e) is independent of m and e and is equal to ε • Alignment of foreign language words (FLWs) depends only on length of English sentence = (l+1)-1 • l is the length of English sentence • The likelihood function will be • Maximize the likelihood function constrained to

  34. Model-1: Parameter estimation • Using Lagrange multiplier for constrained maximization, the solution for model-1 parameters • λe : normalization constant; c(f|e; f,e) expected count;δ(f,fj) is 1 if f & fj are same, zero otherwise. • Estimate t(f|e) using Expectation Maximization (EM) procedure

More Related