1 / 17

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture #34: Machine Translation, Word Alignment Models. Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for all of the

zizi
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #34: Machine Translation, Word Alignment Models Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for all of the materials used in this lecture.

  2. Announcements • Project #4 • Note the clarification about horizontal markovization *order 2* in the instructions • Project #5 • Help session: today at 4pm in CS Conference Room (3350 TMCB) • Propose-your-own • Keep moving forward • Project Report: • Early: Wednesday after Thanksgiving • Due: Friday after Thanksgiving • Homework 0.4 • See end of lecture • Due: Monday

  3. Quiz – take 2 • What are the four steps of the Expectation Maximization (EM) algorithm? • Think of the document clustering example, if that helps • What is the primary purpose of EM?

  4. Objectives • Understand the role of alignment in statistical approaches to translation • Understand statistical word alignment • Define IBM Model 1, and understand how to train it using EM

  5. The Coding View • “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” • Warren Weaver (1955:18, quoting a letter he wrote in 1947)

  6. Learning Correspondence • What would you do, if I asked you to align these two strings?   • What if I asked you to learn a translation lexicon from this pair? • What is the missing data here?

  7. What if you had more pairs? 1.   2.   3.  

  8. MT System Components Language Model Translation Model channel source P(e) e f P(f|e) observed best decoder e* f e* = argmax P(e|f) = argmaxP(f|e)P(e) e e Finds an English translation that is both fluent and semantically faithful to the original foreign language.

  9. Simple MT • The components of a simple MT system: • You already know about the LM • Word-alignment based Translation Models (TMs) • IBM models 1 and 2 – Assignment #0.4 and Project #5! • A simple decoder • Next few classes, as time permits • More complex word-level and phrase-level TMs • More sophisticated decoders

  10. What might a model of look like? A Word-Level TM? How to estimate this? What can go wrong here?

  11. A Word-Level TM? • Can we break down the granularity of the model even further to overcome the trouble posed by sparsity?

  12. IBM Model 1 (Brown et al., 93) • Alignment: a hidden vector specifying which English source is responsible for each French target word NULL

  13. IBM Model 1 (Brown et al., 93) • Alignment: a hidden vector specifying which English source is responsible for each French target word NULL How do we get from here? NULL

  14. EM for Model 1 • Model 1 Parameters: • Translation probabilities: • Start with uniform (or ), including • Top: Initialize for all words and . • (E-step) For each pair of sentences in the parallel corpus: • For each French position • For each English position , • Calculate the posterior probability: • Increment count of word with word by these amounts (as “partial counts”):

  15. EM for Model 1 (part 2) • (M-step) • For each word that appears in at least one , (including null) • For each that appears in at least one • Re-estimate by normalizing the count: • Repeat at step “Top:” until • convergence of Or • a pre-specified number of times • Result: a “translation table” for each value of e

  16. Assignment #0.4 Objective: To work with and understand IBM Model 1 and EM • Data: • I like it | Me gusta • You like it | Tegusta • Result: • An IBM Model 1 translation table for each English word , including NULL • See course wiki for details

  17. Next • Trouble with Model 1 • Improvement to Model 1: Model 2! • Happy Thanksgiving!

More Related