1 / 29

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture #37: Decoding in Machine Translation. Lecture content by Eric Ringger, Dan Klein of UC Berkeley, and Uli Germann formerly of ISI.

iain
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #37: Decoding in Machine Translation Lecture content by Eric Ringger, Dan Klein of UC Berkeley, and UliGermann formerly of ISI.

  2. Announcements • Check the schedule • Plan enough time to succeed! • Don’t get or stay blocked. • Get your questions answeredearly. • Get the help you need to keep moving forward. • No late work accepted after the last day of instruction. • Project Report: • Early: Friday • Due: Monday • Reading Report #14 • Phrase-based MT paper • Due: next Wednesday(online again)

  3. Objectives • Understand the complexity of decoding in translation • Discuss possible decoders • Discuss metrics for end-to-end MT

  4. Decoding • Goal: Find a good translation • Use word alignment models and language models • Brown et al. (1993) left out the decoding algorithm! • They left that for an IBM patent (1995)

  5. Start with Bag “Generation” 1. as as give me please possible response soon your 2. disadvantages let me mention now of some the

  6. Bag “Generation” (Decoding)

  7. TSP Reduced to Bag Generation • Imagine bag generation with a bigram LM: • Words are vertices • Edge weights are • Valid sentences are Hamiltonian paths • Not the best news for word-based MT!

  8. Why “Decoding”? Decoding: Inferring something useful from a signal. Often (always in CS 479) by means of optimization.

  9. TSP Reduced to Translation Source: French CE NE EST PAS CLAIR . Target: English • Rules: • Visit all French cities in any legal order. • Produce one English word for each city OR Produce a word on the border for bordered cities. • Apply your LM and TM

  10. Solution Source: French Target: English

  11. Ingredients • What must a decoder do? • Optimize: find the best translation, according to your models • Fluency: Choose target word order • Fidelity: For each source word(s), choose target word • If decoding is state-space search, what does the state space consist of? • Partial (e.g., left) prefixes and whole sentences • How should we search through that space? • Incremental state construction • Prefer high scoring states

  12. Simple Decoders • Simplest possible decoder: • Enumerate sentences, score each with TM and LM • Monotone decoder: • Greedy monotone: assign each French word it’s most likely English translation • Given , let • Notice: only uses TM! • Better monotone: Viterbi • Uses LM and TM

  13. Stack Decoding (IBM) • More commonly called: A* • One “stack” (priority queue) per • candidate sentence length, or • subset of input words • Employ heuristic estimates for completion cost • (Which admissible heuristics have people tried?) • Beam Search • Limit size of priority queues

  14. Other Algorithms • Dynamic programming: possible if we make assumptions about the set of allowable permutations • Integer Linear Programming (IP) • Complete • Optimal

  15. Greedy Search Decoder • Start: with monotone decoding • Apply operators: • Change a word translation • Change two word translations • Insert a word into the English (zero-fertile French word) • Remove a word from the English (null-generated French word) • Swap two adjacent English words • Pick highest scoring state, according to LM and TM. • Do hill-climbing (move to higher scoring states) • Stop when no improvement is to be had • Could do annealing or genetic algorithm or something even smarter

  16. Greedy Decoding Example Given French source sentence

  17. Greedy Decoding Example Start with English hypothesis from Monotone decoder Given French source sentence

  18. Greedy Decoding Example Think of this as the start state for the Greedy Decoder

  19. Greedy Decoding Example Think of this as the start state for the Greedy Decoder It has many possible child states, determined by possible operators

  20. Greedy Decoding Example One of those child states has the best score according to LM and TM

  21. Greedy Decoding Example Here’s the best child state The numbers denote English positions. And the operator that produced this state

  22. Greedy Decoding Example Here’s the best child state in the third generation

  23. Greedy Decoding Example

  24. Greedy Decoding Example

  25. Experimental Results • From [Germann et al., 2001]

  26. Major Focus of Ongoing Research • Better models • Keeping phrases together • Using syntactic constraints to keep words syntactically close and coherent

  27. Evaluating Translation • How do we measure translation quality? • It’s difficult! • Option: ask human judges • Option: use reference translations • NIST • BLEU • others

  28. Bleu • Mean n-gram precision • Multiple reference sentences • Used everywhere now • WANTED: Better metrics

  29. Next • Phrase-based Translation • Last lecture! • Co-reference resolution • Could do bonus lecture sometime 1:1

More Related