Machine translation challenges and approaches
Sponsored Links
This presentation is the property of its rightful owner.
1 / 49

Machine Translation: Challenges and Approaches PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Invited Lecture CS 4705: Introduction to Natural Language Processing Fall 2004. Machine Translation: Challenges and Approaches. Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University. Sounds like Faulkner?.  Faulkner  Machine Translation.

Download Presentation

Machine Translation: Challenges and Approaches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Invited LectureCS 4705: Introduction to Natural Language Processing Fall 2004

Machine Translation:Challenges and Approaches

Nizar HabashPost-doctoral Fellow

Center for Computational Learning Systems

Columbia University


Sounds like Faulkner?

 Faulkner

 Machine Translation

 Faulkner

 Machine Translation

http://www.ee.ucla.edu/~simkin/sounds_like_faulkner.html


Progress in MTStatistical MT example

Form a talk by Charles Wayne, DARPA


Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

  • MT Evaluation


Why (Machine) Translation?

  • Languages in the world

    • 6,800 living languages

    • 600 with written tradition

    • 95% of world populationspeaks 100 languages

  • Translation Market

    • $8 Billion Global Market

    • Doubling every five years

  • (Donald Barabé, invited talk, MT Summit 2003)


Why Machine Translation?

  • Full Translation

    • Domain specific

      • Weather reports

  • Machine-aided Translation

    • Translation dictionaries

    • Translation memories

    • Requires post-editing

  • Cross-lingual NLP applications

    • Cross-language IR

    • Cross-language Summarization


Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

    • Orthographic variations

    • Lexical ambiguity

    • Morphological variations

    • Translation divergences

  • MT Paradigms

  • MT Evaluation


Multilingual Challenges

  • Orthographic Variations

    • Ambiguous spelling

      • كتب الاولاد اشعاراكَتَبَ الأوْلادُ اشعَاراً

    • Ambiguous word boundaries

  • Lexical Ambiguity

    • Bank  بنك (financial) vs. ضفة(river)

    • Eat  essen (human) vs. fressen (animal)


conj

noun

article

plural

Multilingual Challenges Morphological Variations

  • Affixation vs. Root+Pattern

  • Tokenization


Multilingual ChallengesTranslation Divergences

  • How languages map semantics to syntax

  • 35% of sentences in TREC El Norte Corpus (Dorr et al 2002)

  • Divergence Types

    • Categorial (X tener hambre  X be hungry) [98%]

    • Conflational (X dar puñaladas a Z  X stab Z) [83%]

    • Structural (X entrar en Y  X enter Y)[35%]

    • Head Swapping (X cruzar Y nadando  X swim across Y)[8%]

    • Thematic (X gustar a Y  Y like X)[6%]


Translation Divergences

conflation

ليس

be

etre

ا نا

هنا

I

not

here

Je

ne

pas

ici

لست هنا

I-am-not here

I am not here

Je nesuispas ici

I notbenot here


Translation Divergencescategorial, thematic and structural

*

be

tener

*

ا نا

بردان

I

cold

Yo

frio

קר

ל

אני

انا بردان

I cold

I am cold

tengo frio

I-have cold

קר לי

cold for-me


اسرع

انا

عبور

سباحة

swim

نهر

I

across

quickly

river

Translation Divergenceshead swap and categorial

I swam across the river quickly

اسرعت عبور النهر سباحة

I-sped crossing the-river swimming


חצה

swim

אני

את

ב

ב

I

across

quickly

נהר

שחיה

מהירות

river

Translation Divergenceshead swap and categorial

חציתי את הנהר בשחיה במהירות

I-crossed obj river in-swim speedily

I swam across the river quickly


اسرع

חצה

انا

عبور

سباحة

swim

אני

את

ב

ב

نهر

I

across

quickly

נהר

שחיה

מהירות

river

Translation Divergences head swap and categorial

verb

verb

noun

noun

verb

noun

noun

prep

adverb


car

possessed-by

mom

Translation DivergencesOrthography+Morphology+Syntax

mom’s car

妈妈的车

mamade che

سيارة ماما

sayyAratmama

la voituredemaman


Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

    • Gisting / Transfer / Interlingua

    • Statistical / Symbolic / Hybrid

    • Practical Considerations

  • MT Evaluation


Gisting

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


MT ApproachesGisting Example

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.


Gisting

Transfer

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


butter

Y

X

MT ApproachesTransfer Example

  • Transfer Lexicon

    • Map SL structure to TL structure

poner

:subj

:mod

:obj

:subj

:obj

mantequilla

en

X

:obj

Y

X puso mantequilla en Y

X buttered Y


Gisting

Transfer

Interlingua

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


MT ApproachesInterlingua Example: Lexical Conceptual Structure

(Dorr, 1993)


Gisting

Transfer

Interlingua

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Dictionaries/Parallel Corpora

Transfer Lexicons

Interlingual Lexicons

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


MT ApproachesStatistical vs. Symbolic

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


MT ApproachesNoisy Channel Model

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf


MT ApproachesIBM Model (Word-based Model)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf


MT ApproachesStatistical vs. Symbolic vs. Hybrid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


MT ApproachesStatistical vs. Symbolic vs. Hybrid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


poner

lay locate place put render set stand

:subj

:mod

:obj

:subj

:mod

mantequilla

en

Maria

:obj

on in into at

butter bilberry

Maria

pan

:obj

bread loaf

MT ApproachesHybrid Example: GHMT

  • Generation-Heavy Hybrid Machine Transaltion

  • Lexical transfer but NO structural transfer

Maria puso la mantequilla en el pan.

:obj


MT ApproachesHybrid Example: GHMT

  • LCS-driven Expansion

  • Conflation Example

[CAUSE GO]

[CAUSE GO]

PUTV

BUTTERV

Agent

Agent

Goal

Goal

Theme

MARIA

MARIA

BUTTERN

BREAD

BREAD

CategorialVariation


put

lay

render

into

butter

on

butter

at

butter

Maria

Maria

Maria

bread

loaf

loaf

butter

bread

bread

Maria

butter

Maria

MT ApproachesHybrid Example: GHMT

  • Structural Overgeneration


buy

John

car

a

red

MT ApproachesHybrid Example: GHMTTarget Statistical Resources

  • Structural N-gram Model

    • Long-distance

    • Lexemes

  • Surface N-gram Model

    • Local

    • Surface-forms

John

bought

a

red

car


MT ApproachesHybrid Example: GHMTLinearization &Ranking

Maria buttered the bread -47.0841

Maria butters the bread -47.2994

Maria breaded the butter -48.7334

Maria breads the butter -48.835

Maria buttered the loaf -51.3784

Maria butters the loaf -51.5937

Maria put the butter on bread -54.128


MT ApproachesPractical Considerations

  • Resources Availability

    • Parsers and Generators

      • Input/Output compatability

    • Translation Lexicons

      • Word-based vs. Transfer/Interlingua

    • Parallel Corpora

      • Domain of interest

      • Bigger is better

  • Time Availability

    • Statistical training, resource building


MT ApproachesResource Poverty

  • No Parser?

  • No Translation Dictionary?

  • Parallel Corpus

    • Align with rich language

      • Extract dictionary

      • Parse rich side

        • Infer parses

          • Build a statistical parser


Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

  • MT Evaluation


MT Evaluation

  • More art than science

  • Wide range of Metrics/Techniques

    • interface, …, scalability, …, faithfulness, ... space/time complexity, … etc.

  • Automatic vs. Human-based

    • Dumb Machines vs. Slow Humans


MT Evaluation Metrics

(Church and Hovy 1993)

  • System-based MetricsCount internal resources: size of lexicon, number of grammar rules, etc.

    • easy to measure

    • not comparable across systems

    • not necessarily related to utility


MT Evaluation Metrics

  • Text-based Metrics

    • Sentence-based Metrics

      • Quality: Accuracy, Fluency, Coherence, etc.

      • 3-point scale to 100-point scale

    • Comprehensibility Metrics

      • Comprehension, Informativeness,

      • x-point scales, questionnaires

      • most related to utility

      • hard to measure


MT Evaluation Metrics

  • Text-based Metrics (cont’d)

    • Amount of Post-Editing

      • number of keystrokes per page

      • not necessarily related to utility

  • Cost-based Metrics

    • Cost per page

    • Time per page


Human-based Evaluation ExampleAccuracy Criteria


Human-based Evaluation ExampleFluency Criteria


Fluency vs. Accuracy

FAHQ

MT

conMT

Prof.

MT

Fluency

Info.

MT

Accuracy


Automatic Evaluation ExampleBleu Metric

  • Bleu

    • BiLingual Evaluation Understudy (Papineni et al 2001)

    • Modified n-gram precision with length penalty

    • Quick, inexpensive and language independent

    • Correlates highly with human evaluation

    • Bias against synonyms and inflectional variations


Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep irately

drab emerald concepts sleep furiously

colorless immature thoughts nap angrily


Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideassleepfuriously

Gold Standard References

all dull jade ideassleep irately

drab emerald concepts sleepfuriously

colorless immature thoughts nap angrily

Unigram precision = 4/5


Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideas sleep furiously

colorless green ideas sleep furiously

colorless greenideas sleepfuriously

colorless green ideassleep furiously

Gold Standard References

all dull jade ideassleep irately

drab emerald concepts sleepfuriously

colorless immature thoughts nap angrily

Unigram precision = 4 / 5 = 0.8

Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n

= (0.8╳ 0.5)½ = 0.6325  63.25


  • Login