1 / 56

Human Language Technology

Human Language Technology. Statistical MT. Approaching MT. There are many different ways of approaching the problem of MT. The choice of approach is complex and depends upon: Task requirements Human resources Linguistic resources. Criterial Issues.

meris
Download Presentation

Human Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Language Technology Statistical MT Statistical MT

  2. Approaching MT • There are many different ways of approaching the problem of MT. • The choice of approach is complex and depends upon: • Task requirements • Human resources • Linguistic resources Statistical MT

  3. Criterial Issues • Do we want a translation system for one language pair or for many language pairs? • Can we assume a constrained vocabulary or do we need to deal with arbitrary text? • What resources exist for the languages that we are dealing with? • How long will it take us to develop the resources and what human resources? Statistical MT

  4. Parallel Data • Lots of translated text available: 100s of million words of translated text for some language pairs • a book has a few 100,000s words • an educated person may read 10,000 words a day • 3.5 million words a year • 300 million a lifetime • Computers can see more translated text than humans read in a lifetime • Machine can learn how to translate foreign languages. [Koehn 2006] Statistical MT

  5. Statistical Translation • Robust • Domain independent • Extensible • Does not require language specialists • Does requires parallel texts • Uses noisy channel model of translation Statistical MT

  6. Noisy Channel ModelSentence Translation (Brown et. al. 1990) target sentence sourcesentence sentence Statistical MT

  7. Statistical Modelling S • Observation = T • Learn P(T|S) from a parallel corpus • Not sufficient data to estimate P(T|S) directly [from Koehn 2006] T Statistical MT

  8. The Problem of Translation • Given a sentence T of the target language, seek the source sentence S from which a translator produced T, i.e. find S that maximises P(T|S) • By Bayes' theorem P(T|S) = P(S) x P(S|T)/ P(T) whose denominator is independent of S. • Hence it suffices to maximise P(S) x P(S|T) Statistical MT

  9. The Three Components of a Statistical MT model • Method for computing language model probabilities P(S) • Method for computing translation probabilities (P(S|T)) • Method for searching amongst source sentences for one that maximisesP(S) * P(T|S) Statistical MT

  10. A Statistical MT System S T Source Language Model Translation Model P(S) * P(S|T) = P(T|S) T S Decoder Statistical MT

  11. Three Kinds of Model Statistical MT

  12. Language Models based on N-Grams of Words • GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1)) • TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2)) • BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1)) Statistical MT

  13. Word-Based Translation Models • Translation process is decomposed into smaller steps • Each is tied to words • Based on IBM Models [Brown et al., 1993] [from Koehn 2006] Statistical MT

  14. SOURCE the cat sleeps the dog sleeps the horse eats TARGET le chat dort le chien dort le cheval mange Word TM derived from Bitext Statistical MT

  15. le chat dort(T)/the cat sleeps(S) Statistical MT

  16. le chien dort(T)/the dog sleeps(S) Statistical MT

  17. le cheval mange/the horse eats P(t|s) 3/9 1/9 1/9 1/9 2/9 1/9 Statistical MT

  18. Parameter Estimation • Based on counting occurrences within monolingual and bilingual data. • For language model, we need only source language text. • For translation model, we need pairs of sentences that are translations of each other. • Use EM (Expectation Maximisation) Algorithm (Baum 1972) to optimize model parameters. Statistical MT

  19. EM Algorithm • Word Alignments:for sentence pair ("a b c", "x y z")are formed from arbitrary pairings from the two sentences and include:(a.x,b.y,c.z), (a.z,b.y,c.x), etc. • There is a large number of possible alignments, since we also allow, e.g.(ab.x,0.y,c.z), Statistical MT

  20. EM Algorithm • Make initial estimate of parameters. This can be used to compute the probability of any possible word alignment. • Re-estimateparameters by ranking each possible alignment by its probability according to initial guess. • Repeated iterations assign ever greater probability to the set of sentences actually observed. Algorithm leads to a local maximum of the probability of observed sentence pairs as a function of the model parameters Statistical MT

  21. Parameters forIBM Translation Model • Word Translation Probability, P(t|s)probability that source word s is translated as target word t. • Fertility P(n|s)probability that source word s is translated by n target words (25 ≥ n≥0). • Distortion:P(i|j,l)probability that source word at position j is translated by target word at position i in target sentence of length l. Statistical MT

  22. Experiment 1 (Brown et. al. 1990) • Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language. • Considered 9,000 most common words in each language. • Assumptions (initial parameter values) • each of the 9000 target words equally likely as translations of each of the source words. • each of the fertilities from 0 to 25 equally likely for each of the 9000 source words • each target position equally likely given each source position and target length Statistical MT

  23. French Probability le .610 la .178 l’ .083 les .023 ce .013 il .012 de .009 à .007 que .007 Fertility Probability 1 .871 0 .124 2 .004 English: the Statistical MT

  24. French Probability pas .469 ne .460 non .024 pas du tout .003 faux .003 plus .002 ce .002 que .002 jamais .002 Fertility Probability 2 .758 0 .133 1 .106 English: not Statistical MT

  25. French Probability bravo .992 entendre .005 entendu .002 entends .001 Fertility Probability 0 .584 1 .416 English: hear Statistical MT

  26. Sentence Translation Probability • Given translation model for words, we can compute translation probability of sentence taking parameters into account. • P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(1,John) * P(1|1,3) *P(aime|loves) * P(1,loves) * P(2|2,3) *P(Marie|Mary) * P(1,Mary) * P(3|3,3) Statistical MT

  27. Flaws in Word-Based Translation • Model handles many:one P(ttt|s) but not one:many P(t|sss) translations e.g. • Zeitmangel erschwertdas Problem . • lack of time makes more difficultthe problem . • Correct translation: Lack of time makes the problem more difficult. • MT output: Time makes the problem. [from Koehn 2006] Statistical MT

  28. Flaws Word-Based Translation (2) • Phrasal Translation: P(ttt|ssss) e.g. erübrigt sich /there is no point in • Eine Diskussion erübrigt sich demnach . • a discussion is made unnecessary itself therefore . • Correct translation: Therefore, there is no point in a discussion. • MT output: A debate turned therefore . [from Koehn 2006] Statistical MT

  29. Flaws in Word BasedTranslation (3) Syntactic transformations • Example Object/subject reordering • Den Vorschlag lehnt die Kommission abthe proposal rejects the commission off • Correct translation: The commission rejects the proposal . • MT output: The proposal rejects the commission. [from Koehn 2006] Statistical MT

  30. Phrase Based Translation Models • Foreign input is segmented in phrases. • Phrases are any sequence of words, not necessarily linguistically motivated. • Each phrase is translated into English • Phrases are reordered. [from Koehn 2006] Statistical MT

  31. Syntax Based Translation Models Statistical MT

  32. Word Based Decoding: searching for the best translation (Brown 1990) • Maintain list of hypotheses. • Initial hypothesis: (Jean aime Marie | *) • Search proceeds iteratively. • At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) * • Parenthesised numbers indicate corresponding position in target sentence Statistical MT

  33. Phrase-Based Decoding • Build translation left to right • select foreign word(s) to be translated • find English phrase translation • add English phrase to end of partial translation [Koehn 2006] Statistical MT

  34. Decoding Process • one to many translation (N.B. unlike word based) [Koehn 2006] Statistical MT

  35. Decoding Process • many to one translation [Koehn 2006] Statistical MT

  36. Decoding Process • translation finished [Koehn 2006] Statistical MT

  37. Hypothesis Expansion • Start with empty hypothesis • e: no English words • f: no foreign words covered • p: probability 1 [Koehn 2006] Statistical MT

  38. Hypothesis Expansion Statistical MT

  39. Hypothesis Expansion • further hypothesis expansion [Koehn 2006] Statistical MT

  40. Decoding Process • adding more hypotheses leads to explosion of search space. [Koehn 2006] Statistical MT

  41. Hypothesis Recombination • Sometimes different choices of hypothesis lead to the same translation result. • Such paths can be combined. [Koehn 2006] Statistical MT

  42. Hypothesis Recombination • Drop weaker path • Keep pointer from weaker path [Koehn 2006] Statistical MT

  43. Pruning • Hypothesis recombination is not sufficient • Heuristically discard weak hypotheses early • Organize Hypothesis in stacks, e.g. by • same foreign words covered • same number of foreign words covered (Pharaoh does this) • same number of English words produced • Compare hypotheses in stacks, discard bad ones • histogram pruning: keep top n hypotheses in each stack (e.g., n=100) • threshold pruning: keep hypotheses that are at most times the cost of best hypothesis in stack (e.g., = 0.001) Statistical MT

  44. Hypothesis Stacks • Organization of hypothesis into stacks • here: based on number of foreign words translated • during translation all hypotheses from one stack are expanded • expanded Hypotheses are placed into stacks • one to many translation [Koehn 2006] Statistical MT

  45. Comparing Hypotheses covering Same Number of Foreign Words • Hypothesis that covers easy part of sentence is preferred • Need to consider future cost of uncovered parts • Should take account of one to many translation [Koehn 2006] Statistical MT

  46. Future Cost Estimation • Use future cost estimates when pruning hypotheses • For each uncovered contiguous span: • look up future costs for each maximal contiguous uncovered span • add to actually accumulated cost for translation option for pruning [Koehn 2006] Statistical MT

  47. Pharoah • A beam search decoder for phrase-based models • works with various phrase-based models • beam search algorithm • time complexity roughly linear with input length • good quality takes about 1 second per sentence • Very good performance in DARPA/NIST Evaluation • Freely available for researchers http://www.isi.edu/licensed-sw/pharaoh/ • Coming soon: open source version of Pharaoh Statistical MT

  48. Pharoah Demo % echo ’das ist ein kleines haus’ | pharaoh -f pharaoh.ini > out Pharaoh v1.2.9, written by Philipp Koehn a beam search decoder for phrase-based statistical machine translation models (c) 2002-2003 University of Southern California (c) 2004 Massachusetts Institute of Technology (c) 2005 University of Edinburgh, Scotland loading language model from europarl.srilm loading phrase translation table from phrase-table, stored 21, pruned 0, kept 21 loaded data structures in 2 seconds reading input sentences translating 1 sentences.translated 1 sentences in 0 seconds [3mm] % cat out this is a small house Statistical MT

  49. Brown Experiment 2 • Perform translation using 1000 most frequent words in the English corpus. • 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary. • 117,000 pairs of sentences completely covered by both vocabularies. • Parameters of English language model from 570,000 sentences in English part. Statistical MT

  50. Experiment 2 contd • 73 French sentences tested from elsewhere in corpus. Results were classified as • Exact – same as actual translation • Alternate – same meaning • Different – legitimate translation but different meaning • Wrong – could not be intepreted as a translation • Ungrammatical – grammatically deficient • Corrections to the last three categories were made and keystrokes were counted Statistical MT

More Related