CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #35: Word Alignment Models (cont.) Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.

Announcements • Project #4 • Your insights into treebank grammars? • Project #5 • Model 2 discussed today! • Propose-your-own • Reminder: No presentation, unless you really want to give one! • Check the schedule • Plan enough time to succeed! • Don’t get or stay blocked. • Get your questions answered early. • Get the help you need to keep moving forward. • No late work accepted after the last day of instruction.

Announcements (2) • Project Report: • Early: Wednesday • Due: Friday • Homework 0.4 • Due: today • Reading Report #14 • Phrase-based MT paper • Due: next Monday (online again)

EM Revisited • What are the four steps of the Expectation Maximization (EM) algorithm? • Think of document clustering and/or training IBM Model 1! • What are the two primary purposes of EM?

Objectives • Observe problems with IBM Model 1 • Model ordering issues as IBM Model 2!

“Monotonic Translation” Japan shaken by two new quakes NULL Le Japon secoué par deux nouveaux séismes How would you implement a monotone decoder?(to translate the French)

MT System • You could now build a simple MT system using: • English language model • English to French alignment model (IBM Model 1) • Canadian Hansard data • Monotone Decoder • Greedy • Or Viterbi

IBM Model 1 Target: Source:

One-to-Many Alignments But there are other problems to think about as the following examples will show:

Problem: Many-to-One Alignments

Problem: Many-to-Many Alignments

Problem: Local Order Change Japanisat the junction of four tectonic plates Le Japon est au confluent de quatre plaques tectoniques “Distortions”

Problem: More Distortions The earthquake killed 39 and wounded 3,183. Le tremblement de terre a fait 39 morts et 3,183 blessés.

Insights • How to include “distortion” in the model? • How to prefer nearby distortions over long-distance distortions?

IBM Model 2 • Reminder: Model 1 • Could model distortions without any strong assumptions about where they occuras a distribution over target language positions: • Could build a model as a distribution over distortion distances:

Matrix View of an Alignment

Preference for the Diagonal • But alignments for some language pairs tend to the diagonal in general: • Can use a normal distribution for the distortion model

EM for Model 2 • Model 2 Parameters: • Translation probabilities: • Distortion parameters: • Initialize with Model 1 • Initialize as uniform • E-step: For each pair of sentences : • For each French position 1. Calculate posterior over English positions : 2. Increment count of word with word by these amounts: 3. Similarly, for each English position , update:

EM for Model 2 (cont.) • M-step: • Re-estimate by normalizing these counts • one conditional distribution for each context • Re-estimateby normalizing the earlier counts • one conditional distribution per word e • Iterate until convergence of or a handful of times • See the directions for Project #5 on the course wiki for a more detailed version of this EM algorithm, including implementation tips.

Next • Even better alignment models • Evaluating alignment models • Evaluating translation end-to-end!

CS 479, section 1: Natural Language Processing