1 / 22

Learning Translation Lexicons from Comparable Corpora

Learning Translation Lexicons from Comparable Corpora. Ling 575 Presentation, Ankit K. Srivastava. Comparable Corpora. Definition Examples Applications in Machine Translation. Translation Lexicon. Definition Examples How to learn a TL? State-of-the-Art. Link to Paper. Primary Paper.

len
Download Presentation

Learning Translation Lexicons from Comparable Corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LearningTranslationLexiconsfrom ComparableCorpora Ling 575 Presentation, Ankit K. Srivastava

  2. Comparable Corpora • Definition • Examples • Applications in Machine Translation Translation Lexicon - Ankit

  3. Translation Lexicon • Definition • Examples • How to learn a TL? • State-of-the-Art Translation Lexicon - Ankit

  4. Link to Paper Primary Paper July 2002 “Learning a Translation Lexicon from Monolingual Corpora” Philipp KOEHN & Kevin KNIGHT Translation Lexicon - Ankit

  5. Contents of KoehnKnight2002 • Introduction • Clues • Experiments • Conclusion Translation Lexicon - Ankit

  6. KK02 - Introduction SWOT analysis of SMT Translation Lexicon - Ankit

  7. KK02 - Introduction Objective Generally “Build a translation lexicon solely from monolingual corpora.” Specifically “Automatically generate a one-to-one mapping of German & English nouns.” Translation Lexicon - Ankit

  8. KK02 - Introduction Data & Evaluation Metrics CORPORA • ENG: Wall Street Journal, 1990-1992 • GER: German News Wire, 1995-1996 VERIFY • Bilingual Lexicon of 9,206 German & 10,645 English nouns Translation Lexicon - Ankit

  9. Contents of KoehnKnight2002 • Introduction • Clues • Experiments • Conclusion Translation Lexicon - Ankit

  10. KK02 - Clues Find mappings in corpora • Identical words • Similar spelling • Similar context • Similar words • Frequent words Translation Lexicon - Ankit

  11. KK02 - Clues Identical Words • To build a seed lexicon • Words have identical spellings -eg OR • Words adapted through well-established transformation rules –eg • Both these strategies were used to find 976 + 363 word mappings Translation Lexicon - Ankit

  12. KK02 - Clues Identical Words Identical translations & Length of the word Mappings of words >= length 6 results in 622 word pairs or 96% total accuracy Translation Lexicon - Ankit

  13. KK02 - Clues Similar Spelling • Common language roots & Adopted words • Different from non-verbatim words above • Cognates • Greedy fashion • Longest Common Subsequence Ratio • Limitations • Other approaches Translation Lexicon - Ankit

  14. KK02 - Clues Similar Spelling # letters common in sequence Length of the longer word SIM = Translation Lexicon - Ankit

  15. KK02 - Clues Context • Similar Context window based on frequency of context words in surrounding positions. • Context paradigm is a 3 STEP process. • Step 2 = chicken-egg => SEED • If NO SEED then HI TIME COMPLEXITY • This approach • Other approaches Translation Lexicon - Ankit

  16. KK02 - Clues Context – Greedy Example Translation Lexicon - Ankit

  17. KK02 - Clues Similarity • Similar words in one language are similar in another language • Example: days of the week • Strategies to measure word similarity Translation Lexicon - Ankit

  18. KK02 - Clues Frequency • In comparable corpora, same concepts used with similar frequency • Not sequential Order • Ratio of word frequencies normalized by corpus sizes. Translation Lexicon - Ankit

  19. Contents of KoehnKnight2002 • Introduction • Clues • Experiments • Conclusion Translation Lexicon - Ankit

  20. KK02 - Experiments Testing Grounds • Greedy search preferred to O(n!) possible traversals. • Evaluation 1: # correct word-pair mappings • Evaluation 2: Against a word-level translation Translation Lexicon - Ankit

  21. Contents of KoehnKnight2002 • Introduction • Clues • Experiments • Conclusion Translation Lexicon - Ankit

  22. KK02 - Conclusion Remarks • Identical words • Similar spelling • Similar context • Similar words • Frequent words Translation Lexicon - Ankit

More Related