1 / 102

Direct MT, Example-based MT, Statistical MT

Direct MT, Example-based MT, Statistical MT. Issues in Machine Translation. Orthography Writing from left-to-right vs right-to-left Character sets (alphabetic, logograms, pictograms) Segmentation into word/word-like units Morphology Lexical : Word senses

jaegar
Download Presentation

Direct MT, Example-based MT, Statistical MT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Direct MT, Example-based MT, Statistical MT

  2. Issues in Machine Translation • Orthography • Writing from left-to-right vs right-to-left • Character sets (alphabetic, logograms, pictograms) • Segmentation into word/word-like units • Morphology • Lexical: Word senses • bank  “river bank”, “financial institution” • Syntactic: Word order • Subject-verb-object  subject-object-verb • Semantic: meaning • “ate pasta with a spoon”, “ate pasta with marinara”, “ate pasta with John” • Pragmatic: world knowledge • “Can you pass me the salt?” • Social: conversational norms • pronoun usage depends on the conversational partner • Cultural: idioms and phrases • “out of the ballpark”, “came from leftfield” • Contextual • In addition for Speech Translation • Prosody: JOHN eats bananas: John EATS bananas; John eats BANANAS • Pronunciation differences • Speech recognition errors • In a multilingual environment • Code Switching: Use of linguistic apparatus of one language to express ideas in another language.

  3. MT Approaches: Different levels of meaning transfer Interlingua Semantic Interpretation Semantic Generation Depth of Analysis Syntactic Structure Syntactic Structure Transfer-based MT Syntactic Generation Parsing Target Source Direct MT

  4. Direct Machine Translation • Words are replaced using a dictionary • Some amount of morphological processing • Word reordering is limited • Quality depends on the size of the dictionary, closeness of languages Spanish : ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : 1 3 4 5 7 0 6 • English :I need to make a collect call • Japanese :私は コレクト コールを かける 必要があります • Alignment : 1 5 0 3 0 2 4

  5. Translation Memory • Idea is to reuse translations that were done in the past • Useful for technical terminology • Ideally used in a sub-language translation • System helps in matching new instances against previously translated instances • Choices are presented to a human translator through a GUI • Human translator selects and “stitches” the available options to cover the source language sentence • If no match is found, the translator introduces a new translation pair into the translation memory. • Pros: • Maintains consistency in translation across multiple translators • Improves efficiency of translation process • Issues: How is the matching done? • Word level matching, morphological root matching • Determines robustness of the translation memory

  6. ALIGNMENT (transfer) MATCHING (analysis) RECOMBINATION (generation) Exact match (direct translation) Target Source Example-based MT • Translation-by-analogy: • A collection of source/target text pairs • A matching metric • An word or phrase-level alignment • Method for recombination • ATR EBMT System (E. Sumita, H. Iida, 1991); CMU Pangloss EBMT (R. Brown, 1996)

  7. Example run of EBMT English-Japanese Examples in the Corpus: • He buys a notebook Kare wa noto o kau • I read a book on international politics Watashi wa kokusai seiji nitsuite kakareta hon o yomu Translation Input:He buysa book on international politics Translation Output:Kare wakokusai seiji nitsuite kakareta hono kau • Challenge: Finding a good matching metric • He bought a notebook • A book was bought • I read a book on world politics

  8. Variations in EBMT • Database of Sentence Aligned corpus • Analysis of the SL • Depends on how the database is stored • Full sentences, sentence fragments, tree fragments • Matching metric: idea is to arrive at a semantic closeness • Exact match • N-gram match • Fuzzy match • Similarity-based match • Matching with variables • Regeneration of the TL • Depends on how the database produces the output

  9. Issues in EBMT • Parallel corpora • Granularity of examples • Size of example-base • Does accuracy improve by growing example-base? • Suitability of examples • Diversity and consistency of examples • Contradictory examples • Exceptional examples (a) Watashi wa komputa o kyoyosuru I share the use of a computer (b) Watashi wa kuruma o tsukau I use a car (c) Watashi wa dentaku o shiyosuru I share the use of a calculator I use a calculator

  10. Issues in EBMT • How are examples stored? • Context-based examples • “OK” depends on dialog context; • “wakarimashita (I understand)”; • “iidesu yo (I agree)” • or “ijo desu (lets change the subject)” • Annotated tree structures • Eg. Kanojo wa kami ga nagai (She has long hair) • Trees with linking nodes • Multi-level lattices with typographic, orthographic, lexical, syntactic and other information. • Pos information, predicate-argument, chunks, dependency trees • Generalized Examples • Tokenize Dates, Names, cities, gender, number, tense are replaced by generalized tokens • Precision-Recall tradeoff • A continuum from plain strings to context sensitive rules

  11. Issues in EBMT • String based • Sochira ni okeru  We will send it to you • Sochira wa jimukyoku desu  This is the office • Generalized String • X o onegai shimasu  may I speak to the X • X o onegai shimasu  please give me the X • Template Format • N1 N2 N3  N2’ N3’ for N1’ • (N1 = sanka “participation”, N2 = moshikomi “application” N3=yoshi “form”) • Distance in a thesaurus is used to select the method.

  12. Issues in EBMT • Matching: • Metric used to measure the similarity of the SL input to the SLs in the example database. • Exact Character-based matching • Edit-distance based matching • Word-based matching • Thesaurus similarity/Wordnet based similarity • A man eats vegetables  Hito wa yasai o taberu • Acid eats metal  san wa kinzoku o okasu • He eats potatoes  kare wa jagaimo o taberu • Sulphuric acid eats iron  Ryusan wa tetsu o okasu • Thesaurus free similarity matching based on distributional clustering • Annotated word-based matching • POS based matching • Relaxation techniques • Exact match  with dels and insertions  word-order differences morphological variants  POS differences

  13. Matching in EBMT (contd) • Structure-based Matching • Tree-based edit distance • Case-frame based matching • Partial matching • Not entire input need match with the example database • Chunks, substrings, fragments can match • Assembling the TL output is more challenging.

  14. Adaptability and Recombination in EBMT • Problem: • a. Identify which portion of the associated translation corresponds to the matched portion of the source text (Adaptability) • b. Recombining the portions in an appropriate manner. • Alignment: can be done using statistical techniques or using bilingual dictionaries. • Boundary friction problem: For English-Japanese, translations of noun phrases can be reused independent of them being subjects or objects. • The handsome boy entered the room • The handsome boy ate his breakfast • I saw the handsome boy • Not in German: • Der schone Junge aB seine Fruhstuck • Ich sah den schonen Jungen

  15. Adaptability • Example-retrieval can be scored on two counts: • the closeness of the match between the input text and the example, and • the adaptability of the example, on the basis of the relationship between the representations of the example and its translation. • Use the Offset Command to increase the spacing between the shapes. • a. Use the Offset Command to specify the spacing between the shapes. • b. Mit der Option Abstand legen Sie den Abstand zwischen den Formen fest. • a. Use the Save Option to save your changes to disk. • b. Mit der Option Speichern können Sie ihre Anderungen auf Diskette speichern.

  16. Recombination options are ranked using n-gram model a. Ich sah den schönen Jungen. b. * Ich sah der schöne Junge.

  17. Flavors of EBMT • EBMT used as a component in an MT system which also has more traditional elements: • EBMT may be used • in parallel with these other “engines”, • or just for certain classes of problems • when some other component cannot deliver a result. • EBMT may be better suited to some kinds of applications than others. • Dividing line between EBMT and so-called “traditional” rule-based approaches may not be obvious.

  18. When to apply EBMT • When one of the following conditions holds true for a linguistic phenomenon, [rule-based] MT is less suitable than EBMT. • (a) Translation rule formation is difficult. • (b) The general rule cannot accurately describe [the] phenomen[on] because it represents a special case. • (c) Translation cannot be made in a compositional way from target words.

  19. Learning translation patterns • Kare wa kuruma o kuji de ateru. • HE topic CAR obj LOTTERY inst STRIKES • Lit. ‘He strikes a car with the lottery.’ • He wins a car as a prize in the lottery. • Learn pattern (c) from to correct (a) to be like (b)

  20. Generation of Translation Templates • “Two phase” EBMT methodology: “learning” of templates (i.e. transfer rules) from a corpus. • Parse the translation pairs; align the syntactic units with the help of a bilingual dictionary. • Generalized by replacing the coupled units with variables marked for syntactic category. • a. X[NP] no nagasa wa saidai 512 baito de aru.  The maximum length of X[NP] is 512 bytes. • b. X[NP] no nagasa wa saidai Y[N] baito de aru.  The maximum length of X[NP] is Y[N] bytes. • Any coupled unit pair can be replaced by variables. Refine templates which give rise to a conflict • a. play baseball  yakyu o suru • b. play tennis  tenisu o suru • c. play X[NP]!X[NP] o suru • a. play the piano  piano o hiku • b. play the violin  baiorin o hiku • c. play X[NP]!X[NP] o hiku • “refined” by the addition of “semantic categories” • a. play X[NP/sport]  X[NP] o suru • b. play X[NP/instrument]  X[NP] o hiku • Also, automatic generalization techniques from paired strings

  21. Statistical Machine Translation • Can all the steps of EBMT technique be induced from a parallel corpus? • What are the parameters of such a model? • What are the components of SMT? Slides adapted from Dorr and Monz, Knight, Schafer and Smith

  22. Word-Level Alignments • Given a parallel sentence pair we can link (align) words or phrases that are translations of each other: • Where do we get the sentence pairs from?

  23. Parallel Resources • Newswire: DE-News (German-English), Hong-Kong News, Xinhua News (Chinese-English), • Government: Canadian-Hansards (French-English), Europarl (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portugese, Spanish, Swedish), UN Treaties (Russian, English, Arabic, . . . ) • Manuals: PHP, KDE, OpenOffice (all from OPUS, many languages) • Web pages: STRAND project (Philip Resnik)

  24. Sentence Alignment • If document De is translation of document Df how do we find the translation for each sentence? • The n-th sentence in De is not necessarily the translation of the n-th sentence in document Df • In addition to 1:1 alignments, there are also 1:0, 0:1, 1:n, and n:1 alignments • Approximately 90% of the sentence alignments are 1:1

  25. Sentence Alignment (c’ntd) • There are several sentence alignment algorithms: • Align (Gale & Church): Aligns sentences based on their character length (shorter sentences tend to have shorter translations then longer sentences). Works astonishingly well • Char-align: (Church): Aligns based on shared character sequences. Works fine for similar languages or technical domains • K-Vec (Fung & Church): Induces a translation lexicon from the parallel texts based on the distribution of foreign-English word pairs.

  26. Computing Translation Probabilities • Given a parallel corpus we can estimate P(e | f) The maximum likelihood estimation of P(e | f) is: freq(e,f)/freq(f) • Way too specific to get any reasonable frequencies! Vast majority of unseen data will have zero counts! • P(e | f ) could be re-defined as: • Problem: The English words maximizing • P(e | f ) might not result in a readable sentence

  27. Decoding • The decoder combines the evidence from P(e) and P(f | e) to find the sequence e that is the best translation: • The choice of word e’ as translation of f’ depends on the translation probability P(f’ | e’) and on the context, i.e. other English words preceding e’

  28. Noisy Channel Model for Translation

  29. Translation Modeling • Determines the probability that the foreign word f is a translation of the English word e • How to compute P(f | e) from a parallel corpus? • Statistical approaches rely on the co-occurrence of e and f in the parallel data: If e and f tend to co-occur in parallel sentence pairs, they are likely to be translations of one another

  30. Finding Translations in a Parallel Corpus • Into which foreign words f, . . . , f’ does e translate? • Commonly, four factors are used: • How often do e and f co-occur? (translation) • How likely is a word occurring at position i to translate into a word occurring at position j? (distortion) For example: English is a verb-second language, whereas German is a verb-final language • How likely is e to translate into more than one word? (fertility) For example: defeated can translate into eine Niederlage erleiden • How likely is a foreign word to be spuriously generated? (null translation)

  31. Translation Model? Generative approach: Mary did not slap the green witch Source-language morphological analysis Source parse tree Semantic representation Generate target structure Maria no dió una botefada a la bruja verde

  32. Translation Model? Generative story: Mary did not slap the green witch Source-language morphological analysis Source parse tree Semantic representation Generate target structure What are all the possible moves and their associated probability tables? Maria no dió una botefada a la bruja verde

  33. The Classic Translation ModelWord Substitution/Permutation [IBM Model 3, Brown et al., 1993] Generative approach: Mary did not slap the green witch n(3|slap) Mary not slap slap slap the green witch P-Null Mary not slap slap slap NULL the green witch t(la|the) Maria no dió una botefada a la verde bruja d(j|i) Maria no dió una botefada a la bruja verde Probabilities can be learned from raw bilingual text.

  34. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … All word alignments equally likely All P(french-word | english-word) equally likely

  35. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … “la” and “the” observed to co-occur frequently, so P(la | the) is increased.

  36. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … “house” co-occurs with both “la” and “maison”, but P(maison | house) can be raised without limit, to 1.0, while P(la | house) is limited because of “the” (pigeonhole principle)

  37. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … settling down after another iteration

  38. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … • Inherent hidden structure revealed by EM training! • For details, see: • “A Statistical MT Tutorial Workbook” (Knight, 1999). • “The Mathematics of Statistical Machine Translation” (Brown et al, 1993) • Software: GIZA++

  39. Statistical Machine Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … P(juste | fair) = 0.411 P(juste | correct) = 0.027 P(juste | right) = 0.020 … Possible English translations, to be rescored by language model new French sentence

  40. IBM Models 1–5 • Model 1: Bag of words • Unique local maxima • Efficient EM algorithm (Model 1–2) • Model 2: General alignment: • Model 3: fertility: n(k | e) • No full EM, count only neighbors (Model 3–5) • Deficient (Model 3–4) • Model 4: Relative distortion, word classes • Model 5: Extra variables to avoid deficiency

  41. IBM Model 1 • Given an English sentence e1 . . . el and a foreign sentence f1 . . . fm • We want to find the ’best’ alignment a, where a is a set pairs of the form {(i , j), . . . , (i’, j’)}, • 0<= i , i’ <= l and 1<= j , j’<= m • Note that if (i , j), (i’, j) are in a, then i equals i’, i.e. no many-to-one alignments are allowed • Note we add a spurious NULL word to the English sentence at position 0 • In total there are (l + 1)m different alignments A • Allowing for many-to-many alignments results in (2l)m possible alignments A

  42. IBM Model 1 • Simplest of the IBM models • Does not consider word order (bag-of-words approach) • Does not model one-to-many alignments • Computationally inexpensive • Useful for parameter estimations that are passed on to more elaborate models

  43. IBM Model 1 • Translation probability in terms of alignments: • where: • and:

  44. IBM Model 1 • We want to find the most likely alignment: • Since P(a | e) is the same for all a: • Problem: We still have to enumerate all alignments

  45. IBM Model 1 • Since P(fj | ei) is independent from P(fj’ | ei’) we can find the maximum alignment by looking at the individual translation probabilities only • Let , then for each aj: • The best alignment can computed in a quadratic number of steps: (l+1 x m)

  46. Computing Model 1 Parameters • How to compute translation probabilities for model 1 from a parallel corpus? • Step 1: Determine candidates. For each English word e collect all foreign words f that co-occur at least once with e • Step 2: Initialize P(f | e) uniformly, i.e. P(f | e) = 1/(no of co-occurring foreign words)

  47. Computing Model 1 Parameters • Step 3: Iteratively refine translation probablities: • 1 for n iterations • 2 set tc to zero • 3 for each sentence pair (e,f) of lengths (l,m) • 4 for j=1 to m • 5 total=0; • 6 for i=1 to l • 7 total += P(fj | ei); • 8 for i=1 to l • 9 tc(fj | ei) += P(fj | ei)/total; • 10 for each word e • 11 total=0; • 12 for each word f s.t. tc(f | e) is defined • 13 total += tc(f | e); • 14 for each word f s.t. tc(f | e) is defined • 15 P(f | e) = tc(f | e)/total;

  48. IBM Model 1 Example • Parallel ‘corpus’: the dog :: le chien the cat :: le chat • Step 1+2 (collect candidates and initialize uniformly): P(le | the) = P(chien | the) = P(chat | the) = 1/3 P(le | dog) = P(chien | dog) = P(chat | dog) = 1/3 P(le | cat) = P(chien | cat) = P(chat | cat) = 1/3 P(le | NULL) = P(chien | NULL) = P(chat | NULL) = 1/3

  49. IBM Model 1 Example • Step 3: Iterate • NULL the dog :: le chien • j=1 total = P(le | NULL)+P(le | the)+P(le | dog)= 1 tc(le | NULL) += P(le | NULL)/1 = 0 += .333/1 = 0.333 tc(le | the) += P(le | the)/1 = 0 += .333/1 = 0.333 tc(le | dog) += P(le | dog)/1 = 0 += .333/1 = 0.333 • j=2 total = P(chien | NULL)+P(chien | the)+P(chien | dog)=1 tc(chien | NULL) += P(chien | NULL)/1 = 0 += .333/1 = 0.333 tc(chien | the) += P(chien | the)/1 = 0 += .333/1 = 0.333 tc(chien | dog) += P(chien | dog)/1 = 0 += .333/1 = 0.333

  50. IBM Model 1 Example • NULL the cat :: le chat • j=1 total = P(le | NULL)+P(le | the)+P(le | cat)=1 tc(le | NULL) += P(le | NULL)/1 = 0.333 += .333/1 = 0.666 tc(le | the) += P(le | the)/1 = 0.333 += .333/1 = 0.666 tc(le | cat) += P(le | cat)/1 = 0 +=.333/1 = 0.333 • j=2 total = P(chien | NULL)+P(chien | the)+P(chien | dog)=1 tc(chat | NULL) += P(chat | NULL)/1 = 0 += .333/1 = 0.333 tc(chat | the) += P(chat | the)/1 = 0 += .333/1 = 0.333 tc(chat | cat) += P(chat | dog)/1 = 0 += .333/1 = 0.333

More Related