Learning translation lexicons from comparable corpora l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Learning Translation Lexicons from Comparable Corpora PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on
  • Presentation posted in: General

Learning Translation Lexicons from Comparable Corpora. Ling 575 Presentation, Ankit K. Srivastava. Comparable Corpora. Definition Examples Applications in Machine Translation. Translation Lexicon. Definition Examples How to learn a TL? State-of-the-Art. Link to Paper. Primary Paper.

Download Presentation

Learning Translation Lexicons from Comparable Corpora

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning translation lexicons from comparable corpora l.jpg

LearningTranslationLexiconsfrom ComparableCorpora

Ling 575 Presentation,

Ankit K. Srivastava


Comparable corpora l.jpg

Comparable Corpora

  • Definition

  • Examples

  • Applications in Machine Translation

Translation Lexicon - Ankit


Translation lexicon l.jpg

Translation Lexicon

  • Definition

  • Examples

  • How to learn a TL?

  • State-of-the-Art

Translation Lexicon - Ankit


Primary paper l.jpg

Link to Paper

Primary Paper

July 2002

“Learning a Translation Lexicon from Monolingual Corpora”

Philipp KOEHN & Kevin KNIGHT

Translation Lexicon - Ankit


Contents of koehnknight2002 l.jpg

Contents of KoehnKnight2002

  • Introduction

  • Clues

  • Experiments

  • Conclusion

Translation Lexicon - Ankit


Kk02 introduction l.jpg

KK02 - Introduction

SWOT analysis of SMT

Translation Lexicon - Ankit


Kk02 introduction7 l.jpg

KK02 - Introduction

Objective

Generally

“Build a translation lexicon solely from monolingual corpora.”

Specifically

“Automatically generate a one-to-one mapping of German & English nouns.”

Translation Lexicon - Ankit


Kk02 introduction8 l.jpg

KK02 - Introduction

Data & Evaluation Metrics

CORPORA

  • ENG: Wall Street Journal, 1990-1992

  • GER: German News Wire, 1995-1996

VERIFY

  • Bilingual Lexicon of 9,206 German & 10,645 English nouns

Translation Lexicon - Ankit


Contents of koehnknight20029 l.jpg

Contents of KoehnKnight2002

  • Introduction

  • Clues

  • Experiments

  • Conclusion

Translation Lexicon - Ankit


Kk02 clues l.jpg

KK02 - Clues

Find mappings in corpora

  • Identical words

  • Similar spelling

  • Similar context

  • Similar words

  • Frequent words

Translation Lexicon - Ankit


Kk02 clues11 l.jpg

KK02 - Clues

Identical Words

  • To build a seed lexicon

  • Words have identical spellings -egOR

  • Words adapted through well-established transformation rules –eg

  • Both these strategies were used to find 976 + 363 word mappings

Translation Lexicon - Ankit


Kk02 clues12 l.jpg

KK02 - Clues

Identical Words

Identical translations & Length of the word

Mappings of words >= length 6 results in 622 word pairs or 96% total accuracy

Translation Lexicon - Ankit


Kk02 clues13 l.jpg

KK02 - Clues

Similar Spelling

  • Common language roots & Adopted words

  • Different from non-verbatim words above

  • Cognates

  • Greedy fashion

  • Longest Common Subsequence Ratio

  • Limitations

  • Other approaches

Translation Lexicon - Ankit


Kk02 clues14 l.jpg

KK02 - Clues

Similar Spelling

# letters common in sequence

Length of the longer word

SIM =

Translation Lexicon - Ankit


Kk02 clues15 l.jpg

KK02 - Clues

Context

  • Similar Context window based on frequency of context words in surrounding positions.

  • Context paradigm is a 3 STEP process.

  • Step 2 = chicken-egg => SEED

  • If NO SEED then HI TIME COMPLEXITY

  • This approach

  • Other approaches

Translation Lexicon - Ankit


Kk02 clues16 l.jpg

KK02 - Clues

Context – Greedy Example

Translation Lexicon - Ankit


Kk02 clues17 l.jpg

KK02 - Clues

Similarity

  • Similar words in one language are similar in another language

  • Example: days of the week

  • Strategies to measure word similarity

Translation Lexicon - Ankit


Kk02 clues18 l.jpg

KK02 - Clues

Frequency

  • In comparable corpora, same concepts used with similar frequency

  • Not sequential Order

  • Ratio of word frequencies normalized by corpus sizes.

Translation Lexicon - Ankit


Contents of koehnknight200219 l.jpg

Contents of KoehnKnight2002

  • Introduction

  • Clues

  • Experiments

  • Conclusion

Translation Lexicon - Ankit


Kk02 experiments l.jpg

KK02 - Experiments

Testing Grounds

  • Greedy search preferred to O(n!) possible traversals.

  • Evaluation 1: # correct word-pair mappings

  • Evaluation 2: Against a word-level translation

Translation Lexicon - Ankit


Contents of koehnknight200221 l.jpg

Contents of KoehnKnight2002

  • Introduction

  • Clues

  • Experiments

  • Conclusion

Translation Lexicon - Ankit


Kk02 conclusion l.jpg

KK02 - Conclusion

Remarks

  • Identical words

  • Similar spelling

  • Similar context

  • Similar words

  • Frequent words

Translation Lexicon - Ankit


  • Login