learning translation lexicons from comparable corpora
Download
Skip this Video
Download Presentation
Learning Translation Lexicons from Comparable Corpora

Loading in 2 Seconds...

play fullscreen
1 / 22

Learning Translation Lexicons from Comparable Corpora - PowerPoint PPT Presentation


  • 159 Views
  • Uploaded on

Learning Translation Lexicons from Comparable Corpora. Ling 575 Presentation, Ankit K. Srivastava. Comparable Corpora. Definition Examples Applications in Machine Translation. Translation Lexicon. Definition Examples How to learn a TL? State-of-the-Art. Link to Paper. Primary Paper.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning Translation Lexicons from Comparable Corpora' - len


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning translation lexicons from comparable corpora

LearningTranslationLexiconsfrom ComparableCorpora

Ling 575 Presentation,

Ankit K. Srivastava

comparable corpora
Comparable Corpora
  • Definition
  • Examples
  • Applications in Machine Translation

Translation Lexicon - Ankit

translation lexicon
Translation Lexicon
  • Definition
  • Examples
  • How to learn a TL?
  • State-of-the-Art

Translation Lexicon - Ankit

primary paper

Link to Paper

Primary Paper

July 2002

“Learning a Translation Lexicon from Monolingual Corpora”

Philipp KOEHN & Kevin KNIGHT

Translation Lexicon - Ankit

contents of koehnknight2002
Contents of KoehnKnight2002
  • Introduction
  • Clues
  • Experiments
  • Conclusion

Translation Lexicon - Ankit

kk02 introduction
KK02 - Introduction

SWOT analysis of SMT

Translation Lexicon - Ankit

kk02 introduction7
KK02 - Introduction

Objective

Generally

“Build a translation lexicon solely from monolingual corpora.”

Specifically

“Automatically generate a one-to-one mapping of German & English nouns.”

Translation Lexicon - Ankit

kk02 introduction8
KK02 - Introduction

Data & Evaluation Metrics

CORPORA

  • ENG: Wall Street Journal, 1990-1992
  • GER: German News Wire, 1995-1996

VERIFY

  • Bilingual Lexicon of 9,206 German & 10,645 English nouns

Translation Lexicon - Ankit

contents of koehnknight20029
Contents of KoehnKnight2002
  • Introduction
  • Clues
  • Experiments
  • Conclusion

Translation Lexicon - Ankit

kk02 clues
KK02 - Clues

Find mappings in corpora

  • Identical words
  • Similar spelling
  • Similar context
  • Similar words
  • Frequent words

Translation Lexicon - Ankit

kk02 clues11
KK02 - Clues

Identical Words

  • To build a seed lexicon
  • Words have identical spellings -eg OR
  • Words adapted through well-established transformation rules –eg
  • Both these strategies were used to find 976 + 363 word mappings

Translation Lexicon - Ankit

kk02 clues12
KK02 - Clues

Identical Words

Identical translations & Length of the word

Mappings of words >= length 6 results in 622 word pairs or 96% total accuracy

Translation Lexicon - Ankit

kk02 clues13
KK02 - Clues

Similar Spelling

  • Common language roots & Adopted words
  • Different from non-verbatim words above
  • Cognates
  • Greedy fashion
  • Longest Common Subsequence Ratio
  • Limitations
  • Other approaches

Translation Lexicon - Ankit

kk02 clues14
KK02 - Clues

Similar Spelling

# letters common in sequence

Length of the longer word

SIM =

Translation Lexicon - Ankit

kk02 clues15
KK02 - Clues

Context

  • Similar Context window based on frequency of context words in surrounding positions.
  • Context paradigm is a 3 STEP process.
  • Step 2 = chicken-egg => SEED
  • If NO SEED then HI TIME COMPLEXITY
  • This approach
  • Other approaches

Translation Lexicon - Ankit

kk02 clues16
KK02 - Clues

Context – Greedy Example

Translation Lexicon - Ankit

kk02 clues17
KK02 - Clues

Similarity

  • Similar words in one language are similar in another language
  • Example: days of the week
  • Strategies to measure word similarity

Translation Lexicon - Ankit

kk02 clues18
KK02 - Clues

Frequency

  • In comparable corpora, same concepts used with similar frequency
  • Not sequential Order
  • Ratio of word frequencies normalized by corpus sizes.

Translation Lexicon - Ankit

contents of koehnknight200219
Contents of KoehnKnight2002
  • Introduction
  • Clues
  • Experiments
  • Conclusion

Translation Lexicon - Ankit

kk02 experiments
KK02 - Experiments

Testing Grounds

  • Greedy search preferred to O(n!) possible traversals.
  • Evaluation 1: # correct word-pair mappings
  • Evaluation 2: Against a word-level translation

Translation Lexicon - Ankit

contents of koehnknight200221
Contents of KoehnKnight2002
  • Introduction
  • Clues
  • Experiments
  • Conclusion

Translation Lexicon - Ankit

kk02 conclusion
KK02 - Conclusion

Remarks

  • Identical words
  • Similar spelling
  • Similar context
  • Similar words
  • Frequent words

Translation Lexicon - Ankit

ad