1 / 13

CS224N Section 2: PA2 EM

Outline for today. Interactive Session ! Brief Review of MTExamplesBrief EM review. Statistical Machine Translation. P(e|f) = P(f|e)*P(e)/P(f)maxe P(e|f) = maxe (P(f|e)*P(e))Language Models (P(e)) help alleviate shortcomings of P(f|e). Concepts. Translation Probabilities (t)Distortion Probabilities (d)Fertility ()NULL .

sani
Download Presentation

CS224N Section 2: PA2 EM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011

    2. Outline for today Interactive Session ! Brief Review of MT Examples Brief EM review

    3. Statistical Machine Translation P(e|f) = P(f|e)*P(e)/P(f) maxe P(e|f) = maxe (P(f|e)*P(e)) Language Models (P(e)) help alleviate shortcomings of P(f|e)

    4. Concepts Translation Probabilities (t) Distortion Probabilities (d) Fertility () NULL

    5. PA2 Requirements Nave Model IBM Model 1 IBM Model 2 Integration with Decoder

    6. IBM Model 1 Simplest of the IBM models Does not consider word order (bag-of-words approach) Does not model one-to-many alignments Computationally inexpensive Useful for parameter estimations that are passed on to more elaborate models

    7. IBM Model 1 We only learn the translation probabilities.

    8. IBM Model 1 Steps Initialize the probabilities uniformly. E-Step M Step Calculate Repeat until convergence Lets do an example

    9. IBM Model 2 In model two we learn translation probabilities and also distortion probabilities.

    10. IBM Model 2 IBM Model 2 tries to learn the alignment probabilities in addition to the translation probabilities. The alignment probabilities are handled at an abstract level, by grouping alignment pairs into buckets. Let the number of buckets be N (indexed from 0:N-1) For a pair , let n = ,the pair is placed is bucket n if n<N-1 or in the Nth bucket if n>=N.

    11. IBM Model 2 In Model 2, during the EM step we also collect fractional counts of each bucket and subsequently normalize the same to have a true probability distribution. Many possible implementations Variable number of Buckets Signed Buckets Hand Fixed Weights

    12. EM Revisited Similar to k-means Soft Count v/s Hard Counts http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html

    13. Tips Start Early Read Knights Tutorial Plan your approach before you start

    14. Questions ?

More Related