Machine translation challenges and approaches
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Machine Translation: Challenges and Approaches PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Invited Lecture CS 4705: Introduction to Natural Language Processing Fall 2004. Machine Translation: Challenges and Approaches. Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University. Sounds like Faulkner?.  Faulkner  Machine Translation.

Download Presentation

Machine Translation: Challenges and Approaches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Machine translation challenges and approaches

Invited LectureCS 4705: Introduction to Natural Language Processing Fall 2004

Machine Translation:Challenges and Approaches

Nizar HabashPost-doctoral Fellow

Center for Computational Learning Systems

Columbia University


Sounds like faulkner

Sounds like Faulkner?

 Faulkner

 Machine Translation

 Faulkner

 Machine Translation

http://www.ee.ucla.edu/~simkin/sounds_like_faulkner.html


Progress in mt statistical mt example

Progress in MTStatistical MT example

Form a talk by Charles Wayne, DARPA


Road map

Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

  • MT Evaluation


Why machine translation

Why (Machine) Translation?

  • Languages in the world

    • 6,800 living languages

    • 600 with written tradition

    • 95% of world populationspeaks 100 languages

  • Translation Market

    • $8 Billion Global Market

    • Doubling every five years

  • (Donald Barabé, invited talk, MT Summit 2003)


Why machine translation1

Why Machine Translation?

  • Full Translation

    • Domain specific

      • Weather reports

  • Machine-aided Translation

    • Translation dictionaries

    • Translation memories

    • Requires post-editing

  • Cross-lingual NLP applications

    • Cross-language IR

    • Cross-language Summarization


Road map1

Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

    • Orthographic variations

    • Lexical ambiguity

    • Morphological variations

    • Translation divergences

  • MT Paradigms

  • MT Evaluation


Multilingual challenges

Multilingual Challenges

  • Orthographic Variations

    • Ambiguous spelling

      • كتب الاولاد اشعاراكَتَبَ الأوْلادُ اشعَاراً

    • Ambiguous word boundaries

  • Lexical Ambiguity

    • Bank  بنك (financial) vs. ضفة(river)

    • Eat  essen (human) vs. fressen (animal)


Multilingual challenges morphological variations

conj

noun

article

plural

Multilingual Challenges Morphological Variations

  • Affixation vs. Root+Pattern

  • Tokenization


Multilingual challenges translation divergences

Multilingual ChallengesTranslation Divergences

  • How languages map semantics to syntax

  • 35% of sentences in TREC El Norte Corpus (Dorr et al 2002)

  • Divergence Types

    • Categorial (X tener hambre  X be hungry) [98%]

    • Conflational (X dar puñaladas a Z  X stab Z) [83%]

    • Structural (X entrar en Y  X enter Y)[35%]

    • Head Swapping (X cruzar Y nadando  X swim across Y)[8%]

    • Thematic (X gustar a Y  Y like X)[6%]


Machine translation challenges and approaches

Translation Divergences

conflation

ليس

be

etre

ا نا

هنا

I

not

here

Je

ne

pas

ici

لست هنا

I-am-not here

I am not here

Je nesuispas ici

I notbenot here


Translation divergences categorial thematic and structural

Translation Divergencescategorial, thematic and structural

*

be

tener

*

ا نا

بردان

I

cold

Yo

frio

קר

ל

אני

انا بردان

I cold

I am cold

tengo frio

I-have cold

קר לי

cold for-me


Translation divergences head swap and categorial

اسرع

انا

عبور

سباحة

swim

نهر

I

across

quickly

river

Translation Divergenceshead swap and categorial

I swam across the river quickly

اسرعت عبور النهر سباحة

I-sped crossing the-river swimming


Translation divergences head swap and categorial1

חצה

swim

אני

את

ב

ב

I

across

quickly

נהר

שחיה

מהירות

river

Translation Divergenceshead swap and categorial

חציתי את הנהר בשחיה במהירות

I-crossed obj river in-swim speedily

I swam across the river quickly


Translation divergences head swap and categorial2

اسرع

חצה

انا

عبور

سباحة

swim

אני

את

ב

ב

نهر

I

across

quickly

נהר

שחיה

מהירות

river

Translation Divergences head swap and categorial

verb

verb

noun

noun

verb

noun

noun

prep

adverb


Translation divergences orthography morphology syntax

car

possessed-by

mom

Translation DivergencesOrthography+Morphology+Syntax

mom’s car

妈妈的车

mamade che

سيارة ماما

sayyAratmama

la voituredemaman


Road map2

Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

    • Gisting / Transfer / Interlingua

    • Statistical / Symbolic / Hybrid

    • Practical Considerations

  • MT Evaluation


Mt approaches mt pyramid

Gisting

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches gisting example

MT ApproachesGisting Example

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.


Mt approaches mt pyramid1

Gisting

Transfer

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches transfer example

butter

Y

X

MT ApproachesTransfer Example

  • Transfer Lexicon

    • Map SL structure to TL structure

poner

:subj

:mod

:obj

:subj

:obj

mantequilla

en

X

:obj

Y

X puso mantequilla en Y

X buttered Y


Mt approaches mt pyramid2

Gisting

Transfer

Interlingua

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches interlingua example lexical conceptual structure

MT ApproachesInterlingua Example: Lexical Conceptual Structure

(Dorr, 1993)


Mt approaches mt pyramid3

Gisting

Transfer

Interlingua

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches mt pyramid4

Dictionaries/Parallel Corpora

Transfer Lexicons

Interlingual Lexicons

MT ApproachesMT Pyramid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches statistical vs symbolic

MT ApproachesStatistical vs. Symbolic

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches noisy channel model

MT ApproachesNoisy Channel Model

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf


Mt approaches ibm model word based model

MT ApproachesIBM Model (Word-based Model)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf


Mt approaches statistical vs symbolic vs h y b r i d

MT ApproachesStatistical vs. Symbolic vs. Hybrid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches statistical vs symbolic vs h y b r i d1

MT ApproachesStatistical vs. Symbolic vs. Hybrid

Source meaning

Target meaning

Source syntax

Target syntax

Source word

Target word

Analysis

Generation


Mt approaches hybrid example ghmt

poner

lay locate place put render set stand

:subj

:mod

:obj

:subj

:mod

mantequilla

en

Maria

:obj

on in into at

butter bilberry

Maria

pan

:obj

bread loaf

MT ApproachesHybrid Example: GHMT

  • Generation-Heavy Hybrid Machine Transaltion

  • Lexical transfer but NO structural transfer

Maria puso la mantequilla en el pan.

:obj


Mt approaches hybrid example ghmt1

MT ApproachesHybrid Example: GHMT

  • LCS-driven Expansion

  • Conflation Example

[CAUSE GO]

[CAUSE GO]

PUTV

BUTTERV

Agent

Agent

Goal

Goal

Theme

MARIA

MARIA

BUTTERN

BREAD

BREAD

CategorialVariation


Mt approaches hybrid example ghmt2

put

lay

render

into

butter

on

butter

at

butter

Maria

Maria

Maria

bread

loaf

loaf

butter

bread

bread

Maria

butter

Maria

MT ApproachesHybrid Example: GHMT

  • Structural Overgeneration


Mt approaches hybrid example ghmt target statistical resources

buy

John

car

a

red

MT ApproachesHybrid Example: GHMTTarget Statistical Resources

  • Structural N-gram Model

    • Long-distance

    • Lexemes

  • Surface N-gram Model

    • Local

    • Surface-forms

John

bought

a

red

car


Mt approaches hybrid example ghmt linearization ranking

MT ApproachesHybrid Example: GHMTLinearization &Ranking

Maria buttered the bread -47.0841

Maria butters the bread -47.2994

Maria breaded the butter -48.7334

Maria breads the butter -48.835

Maria buttered the loaf -51.3784

Maria butters the loaf -51.5937

Maria put the butter on bread -54.128


Mt approaches practical considerations

MT ApproachesPractical Considerations

  • Resources Availability

    • Parsers and Generators

      • Input/Output compatability

    • Translation Lexicons

      • Word-based vs. Transfer/Interlingua

    • Parallel Corpora

      • Domain of interest

      • Bigger is better

  • Time Availability

    • Statistical training, resource building


Mt approaches resource poverty

MT ApproachesResource Poverty

  • No Parser?

  • No Translation Dictionary?

  • Parallel Corpus

    • Align with rich language

      • Extract dictionary

      • Parse rich side

        • Infer parses

          • Build a statistical parser


Road map3

Road Map

  • Why Machine Translation (MT)?

  • Multilingual Challenges for MT

  • MT Approaches

  • MT Evaluation


Mt evaluation

MT Evaluation

  • More art than science

  • Wide range of Metrics/Techniques

    • interface, …, scalability, …, faithfulness, ... space/time complexity, … etc.

  • Automatic vs. Human-based

    • Dumb Machines vs. Slow Humans


Mt evaluation metrics

MT Evaluation Metrics

(Church and Hovy 1993)

  • System-based MetricsCount internal resources: size of lexicon, number of grammar rules, etc.

    • easy to measure

    • not comparable across systems

    • not necessarily related to utility


Mt evaluation metrics1

MT Evaluation Metrics

  • Text-based Metrics

    • Sentence-based Metrics

      • Quality: Accuracy, Fluency, Coherence, etc.

      • 3-point scale to 100-point scale

    • Comprehensibility Metrics

      • Comprehension, Informativeness,

      • x-point scales, questionnaires

      • most related to utility

      • hard to measure


Mt evaluation metrics2

MT Evaluation Metrics

  • Text-based Metrics (cont’d)

    • Amount of Post-Editing

      • number of keystrokes per page

      • not necessarily related to utility

  • Cost-based Metrics

    • Cost per page

    • Time per page


Machine translation challenges and approaches

Human-based Evaluation ExampleAccuracy Criteria


Machine translation challenges and approaches

Human-based Evaluation ExampleFluency Criteria


Fluency vs accuracy

Fluency vs. Accuracy

FAHQ

MT

conMT

Prof.

MT

Fluency

Info.

MT

Accuracy


Automatic evaluation example bleu metric

Automatic Evaluation ExampleBleu Metric

  • Bleu

    • BiLingual Evaluation Understudy (Papineni et al 2001)

    • Modified n-gram precision with length penalty

    • Quick, inexpensive and language independent

    • Correlates highly with human evaluation

    • Bias against synonyms and inflectional variations


Automatic evaluation example bleu metric1

Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep irately

drab emerald concepts sleep furiously

colorless immature thoughts nap angrily


Automatic evaluation example bleu metric2

Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideassleepfuriously

Gold Standard References

all dull jade ideassleep irately

drab emerald concepts sleepfuriously

colorless immature thoughts nap angrily

Unigram precision = 4/5


Automatic evaluation example bleu metric3

Automatic Evaluation ExampleBleu Metric

Test Sentence

colorless green ideas sleep furiously

colorless green ideas sleep furiously

colorless greenideas sleepfuriously

colorless green ideassleep furiously

Gold Standard References

all dull jade ideassleep irately

drab emerald concepts sleepfuriously

colorless immature thoughts nap angrily

Unigram precision = 4 / 5 = 0.8

Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n

= (0.8╳ 0.5)½ = 0.6325  63.25


  • Login