1 / 55

Introduction to MT

Introduction to MT. Ling 580 Fei Xia Week 1: 1/03/06. Outline. Course overview Introduction to MT Major challenges Major approaches Evaluation of MT systems Overview of word-based SMT. Course overview. General info. Course website:

naava
Download Presentation

Introduction to MT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06

  2. Outline • Course overview • Introduction to MT • Major challenges • Major approaches • Evaluation of MT systems • Overview of word-based SMT

  3. Course overview

  4. General info • Course website: • Syllabus (incl. slides and papers): updated every week. • Message board • ESubmit • Office hour: Fri: 10:30am-12:30pm. • Prerequisites: • Ling570 and Ling571. • Programming: C or C++, Perl is a plus. • Introduction to probability and statistics

  5. Expectations • Reading: • Papers are online • Finish reading before class. Bring your questions to class. • Grade: • Leading discussion (1-2 papers): 50% • Project: 40% • Class participation: 10% • No quizzes, exams

  6. Leading discussion • Indicate your choice via EPost by Jan 8. • You might want to read related papers. • Make slides with PowerPoint. • Email me your slides by 3:30am on the Monday before your presentation. • Present the paper in class and lead the discussion: 40-50 minutes.

  7. Project • Details will be available soon. • Project presentation: 3/7/06 • Final report: due on 3/12/06 • Pongo account will be ready soon.

  8. Introduction to MT

  9. A brief history of MT (Based on work by John Hutchins) • Before the computer: In the mid 1930s, a French-Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’. • The pioneers (1947-1954): the first public MT demo was given in 1954 (by IBM and Georgetown University). • The decade of optimism (1954-1966): ALPAC (Automatic Language Processing Advisory Committee) report in 1966: "there is no immediate or predictable prospect of useful machine translation."

  10. A brief history of MT (cont) • The aftermath of the ALPAC report (1966-1980): a virtual end to MT research • The 1980s: Interlingua, example-based MT • The 1990s: Statistical MT • The 2000s: Hybrid MT

  11. Where are we now? • Huge potential/need due to the internet, globalization and international politics. • Quick development time due to SMT, the availability of parallel data and computers. • Translation is reasonable for language pairs with a large amount of resource. • Start to include more “minor” languages.

  12. What is MT good for? • Rough translation: web data • Computer-aided human translation • Translation for limited domain • Cross-lingual IR • Machine is better than human in: • Speed: much faster than humans • Memory: can easily memorize millions of word/phrase translations. • Manpower: machines are much cheaper than humans • Fast learner: it takes minutes or hours to build a new system. Erasable memory  • Never complain, never get tired, …

  13. Major challenges in MT

  14. Translation is hard • Novels • Word play, jokes, puns, hidden messages • Concept gaps: go Greek, bei fen • Other constraints: lyrics, dubbing, poem, …

  15. Major challenges • Getting the right words: • Choosing the correct root form • Getting the correct inflected form • Inserting “spontaneous” words • Putting the words in the correct order: • Word order: SVO vs. SOV, … • Unique constructions: • Divergence

  16. Lexical choice • Homonymy/Polysemy: bank, run • Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … • Coding (Concept  lexeme mapping) differences: • More distinction in one language: e.g., kinship vocabulary. • Different division of conceptual space:

  17. Choosing the appropriate inflection • Inflection: gender, number, case, tense, … • Ex: • Number: Ch-Eng: all the concrete nouns: ch_book  book, books • Gender: Eng-Fr: all the adjectives • Case: Eng-Korean: all the arguments • Tense: Ch-Eng: all the verbs: ch_buy  buy, bought, will buy

  18. Inserting spontaneous words • Function words: • Determiners: Ch-Eng: ch_book  a book, the book, the books, books • Prepositions: Ch-Eng: … ch_November  … in November • Relative pronouns: Ch-Eng: … ch_buy ch_book de ch_person  the person who bought /book/ • Possessive pronouns: Ch-Eng: ch_he ch_raise ch_hand  He raised his hand(s) • Conjunction: Eng-Ch: Although S1, S2  ch_although S1, ch_but S2 • …

  19. Inserting spontaneous words (cont) • Content words: • Dropped argument: Ch-Eng: ch_buy le ma  Has Subj bought Obj? • Chinese First name: Eng-Ch: Jiang …  ch_Jiang ch_Zemin … • Abbreviation, Acronyms: Ch-Eng: ch_12 ch_big  the 12th National Congress of the CPC (Communist Party of China) • …

  20. Major challenges • Getting the right words: • Choosing the correct root form • Getting the correct inflected form • Inserting “spontaneous” words • Putting the words in the correct order: • Word order: SVO vs. SOV, … • Unique construction: • Structural divergence

  21. Word order • SVO, SOV, VSO, … • VP + PP  PP VP • VP + AdvP  AdvP + VP • Adj + N  N + Adj • NP + PP  PP NP • NP + S  S NP • P + NP  NP + P

  22. “Unique” Constructions • Overt wh-movement: Eng-Ch: • Eng: Why do you think that he came yesterday? • Ch: you why think he yesterday come ASP? • Ch: you think he yesterday why come? • Ba-construction: Ch-Eng • She ba homework finish ASP  She finished her homework. • He ba wall dig ASP CL hole  He digged a hole in the wall. • She ba orange peel ASP skin  She peeled the orange’s skin.

  23. Translation divergences • Source and target parse trees (dependency trees) are not identical. • Example: I like Mary  S: Marta me gusta a mi (‘Mary pleases me’) • More discussion next time.

  24. Major approaches

  25. How humans do translation? • Learn a foreign language: • Memorize word translations • Learn some patterns: • Exercise: • Passive activity: read, listen • Active activity: write, speak • Translation: • Understand the sentence • Clarify or ask for help (optional) • Translate the sentence Training stage Translation lexicon Templates, transfer rules Reinforced learning? Reranking? Decoding stage Parsing, semantics analysis? Interactive MT? Word-level? Phrase-level? Generate from meaning?

  26. What kinds of resources are available to MT? • Translation lexicon: • Bilingual dictionary • Templates, transfer rules: • Grammar books • Parallel data, comparable data • Thesaurus, WordNet, FrameNet, … • NLP tools: tokenizer, morph analyzer, parser, …  More resources for major languages, less for “minor” languages.

  27. Major approaches • Transfer-based • Interlingua • Example-based (EBMT) • Statistical MT (SMT) • Hybrid approach

  28. The MT triangle Meaning (interlingua) Synthesis Analysis Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT word Word

  29. Transfer-based MT • Analysis, transfer, generation: • Parse the source sentence • Transform the parse tree with transfer rules • Translate source words • Get the target sentence from the tree • Resources required: • Source parser • A translation lexicon • A set of transfer rules • An example: Mary bought a book yesterday.

  30. Transfer-based MT (cont) • Parsing: linguistically motivated grammar or formal grammar? • Transfer: • context-free rules? A path on a dependency tree? • Apply at most one rule at each level? • How are rules created? • Translating words: word-to-word translation? • Generation: using LM or other additional knowledge? • How to create the needed resources automatically?

  31. Interlingua • For n languages, we need n(n-1) MT systems. • Interlingua uses a language-independent representation. • Conceptually, Interlingua is elegant: we only need n analyzers, and n generators. • Resource needed: • A language-independent representation • Sophisticated analyzers • Sophisticated generators

  32. Interlingua (cont) • Questions: • Does language-independent meaning representation really exist? If so, what does it look like? • It requires deep analysis: how to get such an analyzer: e.g., semantic analysis • It requires non-trivial generation: How is that done? • It forces disambiguation at various levels: lexical, syntactic, semantic, discourse levels. • It cannot take advantage of similarities between a particular language pair.

  33. Example-based MT • Basic idea: translate a sentence by using the closest match in parallel data. • First proposed by Nagao (1981). • Ex: • Training data: • w1 w2 w3 w4  w1’ w2’ w3’ w4’ • w5 w6 w7  w5’ w6’ w7’ • w8 w9  w8’ w9’ • Test sent: • w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’

  34. EMBT (cont) • Types of EBMT: • Lexical (shallow) • Morphological / POS analysis • Parse-tree based (deep) • Types of data required by EBMT systems: • Parallel text • Bilingual dictionary • Thesaurus for computing semantic similarity • Syntactic parser, dependency parser, etc.

  35. EBMT (cont) • Word alignment: using dictionary and heuristics  exact match • Generalization: • Clusters: dates, numbers, colors, shapes, etc. • Clusters can be built by hand or learned automatically. • Ex: • Exact match: 12 players met in Paris last Tuesday  12 Spieler trafen sich letzen Dienstag in Paris • Templates: $num players met in $city $time $num Spieler trafen sich $time in $city

  36. Statistical MT • Basic idea: learn all the parameters from parallel data. • Major types: • Word-based • Phrase-based • Strengths: • Easy to build, and it requires no human knowledge • Good performance when a large amount of training data is available. • Weaknesses: • How to express linguistic generalization?

  37. Comparison of resource requirement

  38. Hybrid MT • Basic idea: combine strengths of different approaches: • Syntax-based: generalization at syntactic level • Interlingua: conceptually elegant • EBMT: memorizing translation of n-grams; generalization at various level. • SMT: fully automatic; using LM; optimizing some objective functions. • Types of hybrid HT: • Borrowing concepts/methods: • SMT from EBMT: phrase-based SMT; Alignment templates • EBMT from SMT: automatically learned translation lexicon • Transfer-based from SMT: automatically learned translation lexicon, transfer rules; using LM • … • Using two MTs in a pipeline: • Using transfer-based MT as a preprocessor of SMT • Using multiple MTs in parallel, then adding a re-ranker.

  39. Evaluation of MT

  40. Evaluation • Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. • Human evaluation: accuracy, fluency, … • Problem: expensive, slow, subjective, non-reusable. • Automatic measures: • Edit distance • Word error rate (WER), Position-independent WER (PER) • Simple string accuracy (SSA), Generation string accuracy (GSA) • BLEU

  41. Edit distance • The Edit distance (a.k.a. Levenshtein distance) is defined as the minimal cost of transforming str1 into str2, using three operations (substitution, insertion, deletion). • Use DP and the complexity is O(m*n).

  42. WER, PER, and SSA • WER (word error rate) is edit distance, divided by |Ref|. • PER (position-independent WER): same as WER but disregards word ordering • SSA (Simple string accuracy) = 1 - WER • Previous example: • Sys: w1 w2 w3 w4 • Ref: w1 w3 w2 • Edit distance = 2 • WER=2/3 • PER=1/3 • SSA=1/3

  43. Generation string accuracy (GSA) Example: Ref: w1 w2 w3 w4 Sys: w2 w3 w4 w1 Del=1, Ins=1  SSA=1/2 Move=1, Del=0, Ins=0  GSA=3/4

  44. BLEU • Proposal by Papineni et. al. (2002) • Most widely used in MT community. • BLEU is a weighted average of n-gram precision (pn) between system output and all references, multiplied by a brevity penalty (BP).

  45. N-gram precision • N-gram precision: the percent of n-grams in the system output that are correct. • Clipping: • Sys: the the the the the the • Ref: the cat sat on the mat • Unigram precision: • Max_Ref_count: the max number of times a ngram occurs in any single reference translation.

  46. N-gram precision i.e. the percent of n-grams in the system output that are correct (after clipping).

  47. Brevity Penalty • For each sent si in system output, find closest matching reference ri (in terms of length). • Longer system output is already penalized by the n-gram precision measure.

  48. An example • Sys: The cat was on the mat • Ref1: The cat sat on a mat • Ref2: There was a cat on the mat • Assuming N=3 • p1=5/6, p2=3/5, p3=1/4, BP=1  BLEU=0.50 • What if N=4?

  49. Summary • Course overview • Major challenges in MT • Choose the right words (root form, inflection, spontaneous words) • Put them in right positions (word order, unique constructions, divergences)

  50. Summary (cont) • Major approaches • Transfer-based MT • Interlingua • Example-based MT • Statistical MT • Hybrid MT • Evaluation of MT systems • Edit distance • WER, PER, SSA, GSA • BLEU

More Related