1 / 20

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture # 35: Word Alignment Models (cont.). Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.

sabin
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #35: Word Alignment Models (cont.) Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.

  2. Announcements • Project #4 • Your insights into treebank grammars? • Project #5 • Model 2 discussed today! • Propose-your-own • Reminder: No presentation, unless you really want to give one! • Check the schedule • Plan enough time to succeed! • Don’t get or stay blocked. • Get your questions answered early. • Get the help you need to keep moving forward. • No late work accepted after the last day of instruction.

  3. Announcements (2) • Project Report: • Early: Wednesday • Due: Friday • Homework 0.4 • Due: today • Reading Report #14 • Phrase-based MT paper • Due: next Monday (online again)

  4. EM Revisited • What are the four steps of the Expectation Maximization (EM) algorithm? • Think of document clustering and/or training IBM Model 1! • What are the two primary purposes of EM?

  5. Objectives • Observe problems with IBM Model 1 • Model ordering issues as IBM Model 2!

  6. “Monotonic Translation” Japan shaken by two new quakes NULL Le Japon secoué par deux nouveaux séismes How would you implement a monotone decoder?(to translate the French)

  7. MT System • You could now build a simple MT system using: • English language model • English to French alignment model (IBM Model 1) • Canadian Hansard data • Monotone Decoder • Greedy • Or Viterbi

  8. IBM Model 1 Target: Source:

  9. One-to-Many Alignments But there are other problems to think about as the following examples will show:

  10. Problem: Many-to-One Alignments

  11. Problem: Many-to-Many Alignments

  12. Problem: Local Order Change Japanisat the junction of four tectonic plates Le Japon est au confluent de quatre plaques tectoniques “Distortions”

  13. Problem: More Distortions The earthquake killed 39 and wounded 3,183. Le tremblement de terre a fait 39 morts et 3,183 blessés.

  14. Insights • How to include “distortion” in the model? • How to prefer nearby distortions over long-distance distortions?

  15. IBM Model 2 • Reminder: Model 1 • Could model distortions without any strong assumptions about where they occuras a distribution over target language positions: • Could build a model as a distribution over distortion distances:

  16. Matrix View of an Alignment

  17. Preference for the Diagonal • But alignments for some language pairs tend to the diagonal in general: • Can use a normal distribution for the distortion model

  18. EM for Model 2 • Model 2 Parameters: • Translation probabilities: • Distortion parameters: • Initialize with Model 1 • Initialize as uniform • E-step: For each pair of sentences : • For each French position 1. Calculate posterior over English positions : 2. Increment count of word with word by these amounts: 3. Similarly, for each English position , update:

  19. EM for Model 2 (cont.) • M-step: • Re-estimate by normalizing these counts • one conditional distribution for each context • Re-estimateby normalizing the earlier counts • one conditional distribution per word e • Iterate until convergence of or a handful of times • See the directions for Project #5 on the course wiki for a more detailed version of this EM algorithm, including implementation tips.

  20. Next • Even better alignment models • Evaluating alignment models • Evaluating translation end-to-end!

More Related