1 / 37

Advanced Signal Processing 05/06 Reinisch Bernhard

Statistical Machine Translation Phrase Based Model. Advanced Signal Processing 05/06 Reinisch Bernhard. Overview. The quality of the MT systems have improved with the use of phrase translation Phrases from word-based alignments Syntactic phrases Phrases from phrase alignments

Download Presentation

Advanced Signal Processing 05/06 Reinisch Bernhard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Machine Translation Phrase Based Model Advanced Signal Processing 05/06 Reinisch Bernhard

  2. Overview • The quality of the MT systems have improved with the use of phrase translation • Phrases from word-based alignments • Syntactic phrases • Phrases from phrase alignments • IBM word-based statistical MT systems enhanced with phrase translation • Best to extract phrase translations pairs? • Evaluation Framework / Outcome

  3. Word based approaches • Try to model word-to-word correspondences • Models are often restricted • source word -> exactly one target word • Hidden Markov models in speech recognition • Enhanced to “One-to-many” alignment model • Solve lexical problems like • “Zahnarzttermin” -> “dentist’s appointment” • Order of words will be changed

  4. Statistical machine translation (1) • argmax … search/decoding problem (generation of the output sentence) • Pr(e1) … language model • Pr(f1|e1) … translation model

  5. Statistical machine translation (2) Taken from [2]

  6. Learning translation lexica • Following describes methods for learning single-word and phrase-based translation lexica • Statistical alignment models • Used for learning word alignments • Symmetrization • Bilingual phrases • Alignment templates

  7. Statistical alignment models (1) • In the alignment model • A “hidden” parameter is introduced a • a describes the mapping from source position j to target position aj • “a” is represented as a matrix with binary values • 1 entry … words are aligned • 0 entry … words are not aligned • source word -> no target word (empty word eo)

  8. Statistical alignment models (2) • In general the model depends on a set of unknown parameters • Exist several different specific statistical alignment models • First compute word alignments i.e. model 4 • Train this hidden parameters θ • Alignment with highest probability • called Viterbi alignment

  9. Symmetrization (1) • Baseline alignment model (i.e. model 4) does not allow multiple target words • “Zahnarzttermin” -> “dentist’s appointment” • Outcome should be such alignment matrix Taken from [2]

  10. Symmetrization (2) • To solve this problem • Training in both directions • For a sentence pair -> two Viterbi alignments • Now both alignments tables A1 and A2 have to combined (symmetized) • Simple union of both tables (some refined methods) • Result then is used to train single word based translation lexica

  11. Symmetrization (2) • By computing for relative frequencies using: • N(e|f) … how many times e and f are aligned • N(f) … how many time the word f occurs

  12. Bilingual phrases • Now we need an algorithm that relationships between whole phrases of source sentence m and target sentence n • “phrase extract” algorithm and take as input alignment matrix A Taken from [2]

  13. Alignment templates (1) • A more systematic approach • Considers whole phrases • Whole group of adjacent words in the source • maps to a whole group of words in the target • The context of words have greater influence • The changes of word order can be learned • The Idea is to model two different alignment levels • Word level alignments • Phrase level alignments

  14. Alignment templates (2) • Alignments templates z • “F”… source class sequence • “E”…target class sequence • “A”… describes the alignment between source and target • “F” and “E” are classes • The advantage is a better generalization

  15. Alignment templates (3) Taken from [2]

  16. Alignment templates (4) • For the training we need the probability of applying an alignment template • The “phrase extraction” have to be modified • Can be estimated by relative frequencies • Finished the “Learning translation lexica”-task

  17. Translation model (1) • For notation we decompose the sentences • f1J…source sentence • e1I…target sentence • sequence of phrases (k=1,…,K) • Further considerations (only one segmentation)

  18. Translation model (2) • The model have to allow reordering of the phrases

  19. Translation model (3) Taken from [2]

  20. Translation model (4) Taken from [2]

  21. Alignment template approach results • Evaluation of the approach by a translation task (“Verbmobil Task”) • Additional preprocessing • word-joinings • word-splitting Taken from [2] Taken from [2]

  22. Alignment template approach conclusions • Overall we see a better performance • So it is important to model word groups in source and target language • By using two abstraction levels • Phrase level alignments • Word level alignments • -> greater influence of the context and can be learned explicitly

  23. Syntactic phrases (1) • A collection of all phrase pairs will also include non-intuitive phrases • “Okay, the”, “house the”, etc… • Intuitively such phrases do not help • Restricting to syntactically motivated phrases • The idea of syntactic trees and phrases as subtrees

  24. Syntactic phrases (2) • The input sentence is preprocessed by a syntactic parser • Different operations will be performed on each node • reordering child nodes • inserting extra words at each node • translating leaf words

  25. Syntactic phrases (3) Taken from [4]

  26. Syntactic phrases (4) Taken from [6]

  27. Syntactic phrases (5) • Reordering • Every given child sequence has a probability of reordering (N nodes -> N! pos. reorderings) • The probability of reordering is given by the model (table etc) • Inserting • Extra word can be inserted (left/right) • Another table for insert probability • Translating • Operation is applied to every leaf • Assumption that this operation only depends on the word itself

  28. Experiments • Now we have three models • [1] build a system to compare them and measure performance under different aspects • Weighting syntactic phrases • Maximum phrase length • Setup • Free corpus Europarl • German to English • Performance measured using BLEU score

  29. Comparison of core methods • AP… template alignment • M4 … IBM Model 4 for word based translation • Syn … syntactic phrases • Training corpus size [sentences] Taken from [1] Taken from [1]

  30. Weighting syntactic phrases (1) • The restriction on syntactic phrases is harmful, because too many phrases are eliminated • Intuitively that can not be • Improvements in data collection, during translation, penalizing • Results suggest • Collection of only syntactically phrases • Performance not better • But smaller table sizes

  31. Weighting syntactic phrases (2) • Example: • “es gibt” literally translates in “it gives” but really means “there is” • Not syntactic relationship • Also “with regard to”, “note that” syntactically complex but easy translation

  32. Maximum phrase length • How long do phrases have to be to achieve high performance? • All experiments with “Phrases from word-based alignments” approach Taken from [1] Taken from [1]

  33. Simpler Underlying word-based models (1) • The core of this framework is IBM model 4 for collecting phrase pairs • Model 4 is computationally expensive, parameters problems (approximations) • What about IBM models 1-3 • Faster and easier to implement • Model 1 and 2 compute word alignments efficiently

  34. Simpler Underlying word-based models (2) • How much is performance affected, if the base word alignment on these simpler methods? • M1 worst performance • But M2 & M3 provide similar performance to the M4 model Taken from [1]

  35. Conclusions • Intuitively phrase bases approaches gives better performance than word-based approaches • Also experiments show us that • “straight forward” forward syntax based models have disadvantages • The “best” outcome with small word phrases • Phrase extraction and the alignment heuristic have a great influence

  36. References • [1] Philipp Koehn, Franz Josef Och, Daniel Marcu; Statistical Phrase-Based Translation • [2] Franz Josef Och, Hermann Ney; The Alignment Template Approach to Statistical Machine Translation • [3] Franz Josef Och, Christoph Tillmann, Hermann Ney; Improved Alignment Models for Statistical Machine Translation • [4] Kenji Yamada, Kevin Knight; A Syntax-based Translation Model • [5] Daniel Marcu, William Wong; A Phrase-Based, Joint Probability Model for Statistical Machine Translation • [6] Amitabha Mukerjee, Ankit Soni and Achla M. Raina; Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora • [7] www.sbox.tugraz.at/home/b/brein/061120_TranslationModelPhraseBased.zip

  37. Statistical Machine Translation Phrase Based Models Advanced Signal Processing 05/06 Reinisch Bernhard

More Related