170 likes | 253 Views
Learn about statistical approaches to translation, IBM Model 1, and word alignment in natural language processing. Understand Expectation Maximization algorithm steps and training translation models. Dive into decoding and improving translation systems.
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #34: Machine Translation, Word Alignment Models Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for all of the materials used in this lecture.
Announcements • Project #4 • Note the clarification about horizontal markovization *order 2* in the instructions • Project #5 • Help session: today at 4pm in CS Conference Room (3350 TMCB) • Propose-your-own • Keep moving forward • Project Report: • Early: Wednesday after Thanksgiving • Due: Friday after Thanksgiving • Homework 0.4 • See end of lecture • Due: Monday
Quiz – take 2 • What are the four steps of the Expectation Maximization (EM) algorithm? • Think of the document clustering example, if that helps • What is the primary purpose of EM?
Objectives • Understand the role of alignment in statistical approaches to translation • Understand statistical word alignment • Define IBM Model 1, and understand how to train it using EM
The Coding View • “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” • Warren Weaver (1955:18, quoting a letter he wrote in 1947)
Learning Correspondence • What would you do, if I asked you to align these two strings? • What if I asked you to learn a translation lexicon from this pair? • What is the missing data here?
What if you had more pairs? 1. 2. 3.
MT System Components Language Model Translation Model channel source P(e) e f P(f|e) observed best decoder e* f e* = argmax P(e|f) = argmaxP(f|e)P(e) e e Finds an English translation that is both fluent and semantically faithful to the original foreign language.
Simple MT • The components of a simple MT system: • You already know about the LM • Word-alignment based Translation Models (TMs) • IBM models 1 and 2 – Assignment #0.4 and Project #5! • A simple decoder • Next few classes, as time permits • More complex word-level and phrase-level TMs • More sophisticated decoders
What might a model of look like? A Word-Level TM? How to estimate this? What can go wrong here?
A Word-Level TM? • Can we break down the granularity of the model even further to overcome the trouble posed by sparsity?
IBM Model 1 (Brown et al., 93) • Alignment: a hidden vector specifying which English source is responsible for each French target word NULL
IBM Model 1 (Brown et al., 93) • Alignment: a hidden vector specifying which English source is responsible for each French target word NULL How do we get from here? NULL
EM for Model 1 • Model 1 Parameters: • Translation probabilities: • Start with uniform (or ), including • Top: Initialize for all words and . • (E-step) For each pair of sentences in the parallel corpus: • For each French position • For each English position , • Calculate the posterior probability: • Increment count of word with word by these amounts (as “partial counts”):
EM for Model 1 (part 2) • (M-step) • For each word that appears in at least one , (including null) • For each that appears in at least one • Re-estimate by normalizing the count: • Repeat at step “Top:” until • convergence of Or • a pre-specified number of times • Result: a “translation table” for each value of e
Assignment #0.4 Objective: To work with and understand IBM Model 1 and EM • Data: • I like it | Me gusta • You like it | Tegusta • Result: • An IBM Model 1 translation table for each English word , including NULL • See course wiki for details
Next • Trouble with Model 1 • Improvement to Model 1: Model 2! • Happy Thanksgiving!