Improving SMT with
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Improving SMT with Phrase to Phrase Translations PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

Improving SMT with Phrase to Phrase Translations. Joy Ying Zang, Ashish Venugopal, Stephan Vogel, Alex Waibel Carnegie Mellon University Project: Mega-RADD. CMU Mega RADD.

Download Presentation

Improving SMT with Phrase to Phrase Translations

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Improving smt with phrase to phrase translations

Improving SMT withPhrase to Phrase Translations

Joy Ying Zang, Ashish Venugopal,

Stephan Vogel, Alex Waibel

Carnegie Mellon University

Project: Mega-RADD


Improving smt with phrase to phrase translations

CMU Mega RADD

The Mega-RADD Team:SMT: Stephan Vogel, Alex Waibel, John Lafferty,EMBT: Ralf Brown, Bob Frederking,Chinese: Joy Ying Zang, Ashish Venugopal,

Bing Zhao, Fei HuangArabic: Alicia Tribble, Ahmed Badran


Improving smt with phrase to phrase translations

Overview

  • Goals:

    • Develop Data-Driven General Purpose MT Systems

    • Train on Large and Small Corpora, Evaluate to test Portability

  • Approaches

    • Two Data-driven Approaches: Statistical, Example-Based

    • Also Grammar based Translation System

    • Multi-Engine Translation

  • Languages: Chinese and Arabic

  • Statistical Translation:

    • Exploit Structure in Language: Phrases

    • Determine Phrases from Mono- and Bi-Lingual Co-occurrences

    • Determine Phrases from Lexical and Alignment Information


Improving smt with phrase to phrase translations

Arabic: Initial System

  • 1 million words of UN data, 300 sentences for testing

  • Preprocessing: separation of punctuation marks, lower case for English, correction of corrupted numbers

  • Adding Human knowledge: cleaning statistical lexicon for 100 most frequent wordsbuilding lists names, simple date expressions, numbers (total: 1000 entries, total effort: two part-timers * 4 weeks)

  • Alignment: IBM1 plus HMM training, lexicon plus phrase translations

  • Language Model: trained on 1m sub-corpus

  • Results (20 May 2002):UN test data (300 sentences): Bleu = 0.1176NIST devtest (203 sentences): Bleu = 0.0242 NIST = 2.0608


Improving smt with phrase to phrase translations

Arabic: Portability to a New Language

  • Training on subset of UN corpus chosen to cover vocabulary of test data

  • Training English to Arabic for extraction of phrase translations

  • Minimalist Morphology: strip/add suffixes for ~200 unknown wordsNIST: 5.5368  5.6700

  • Adapting LM: Select stories from 2 years of English Xinhua storiesaccording to 'Arabic' keyword list (280 entries); size 6.9m words.NIST: 5.5368  5.9183

  • Results:- 20 Mai (devtest): 2.0608- 13 June (devtest): 6.5805- 14 June (evaltest): 5.4662 (final training not completed)- 17 June (evaltest): 6.4499 (after completed training)- 19 Juli (devtest): 7.0482


Two approaches

Two Approaches

  • Determine Phrases from Mono- and Bi-Lingual Co-occurrences

    • Joy

  • Determine Phrases from Lexical and Alignment Information

    • Ashish


Why phrases

Why phrases?

  • Mismatch between languages: word to word translation doesn’t work

  • Phrases encapsulate the context of words, e.g. verb tense


Why phrases cont

Why phrases? (Cont.)

  • Local reordering, e.g. Chinese relative clause

  • Using phrases to soothe word segmentation failure


Utilizing bilingual information

Utilizing bilingual information

  • Given a sentence pair (S,T),

    S=<s1,s2,…,si,…sm>

    T=<t1,t2,…,tj,…,tn>, where si/tj are source/target words.

  • Given an m*n matrix B, where

    B(i,j)= co-occurrence(si,tj)=

    where, N=a+b+c+d;


Utilizing bilingual information cont

Utilizing bilingual information (Cont.)

  • Goal: find a partition over matrix B, under the constraint that one src/tgt word can only align to one tgt/src word or phrase (adjacent word sequence)

Legal segmentation, imperfect alignment

Illegal segmentation, perfect alignment


Utilizing bilingual information cont1

Utilizing bilingual information (Cont.)

For each sentence pair in the training data:

While(still has row or column not aligned){

Find cell[i,j], where B(i,j) is the max among all available(not aligned) cells;

Expand cell[i,j] with similarity sim_thresh to region[RowStart,RowEnd; ColStart,ColEnd]

Mark all the cells in the region as aligned

}

Output the aligned regions as phrases

-----------------------------------------------------

Sub expand cell[i,j] with sim_thresh {

current aligned region: region[RowStart=i, RowEnd=i; ColStart=j, ColEnd=j]

While(still ok to expand){

if all cells[m,n], where m=RowStart-1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart --; //expand to north

if all cells[m,n], where m=RowEnd+1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart ++; //expand to south

… //expand to east

… //expand to west

}

Define similar(x,y)= true, if abs((x-y)/y) < 1-similarity_thresh


Utilizing bilingual information cont2

Utilizing bilingual information (Cont.)

Expand to North

Expand to South

Expand to East

Expand to West


Integrating monolingual information

Santa Clarita

Union town

Pittsburgh

Los Angeles

Corona

Somerset

Integrating monolingual information

  • Motivation:

    • Use more information in the alignment

    • Easier for aligning phrases

    • There is much more monolingual data than bilingual data

Santa Monica


Integrating monolingual information cont

Integrating monolingual information (Cont.)

  • Given a sentence pair (S,T),

    S=<s1,s2,…,si,…sm> and T=<t1,t2,…,tj,…,tn>, where si/tj are source/target words.

  • Construct m*m matrix A,where A(i,j) = collocation(si, sj); Only A(i,i-1) and A(i,i+1) have values.

  • Construct n*n matrix C,where C(i,j) = collocation(ti, tj); Only C(j-1,j) and A(j+1,j) have values.

  • Construct m*n matrix B, where B(i,j)= co-occurrence(si, tj).


Integrating monolingual information cont1

Integrating monolingual information (Cont.)

  • Normalize A so that:

  • Normalize C so that:

  • Normalize B so that:

  • Calculating new src-tgt matrix B’

B’

B


Discussion and results

Discussion and Results

  • Simple

  • Efficient

    • Partitioning the matrix is linear O(min(m,n)).

    • The construction of A*B*C is O(m*n);

  • Effective

    • Improved the translation quality from baseline (NIST= 6.3775, Bleu=0.1417 ) to (NIST= 6.7405, Bleu=0.1681) on small data track dev-test


Utilizing alignment information motivation

Utilizing alignment information: Motivation

  • Alignment model associates words and their translations on the sentence level.

  • Context and co-occurrence are represented when considering a set of sentence level alignments.

  • Extract phrase relations from the alignment information.


Processing alignments

Processing Alignments

  • Identification – Selection of the source phrases target phrase candidates.

  • Scoring – Assigning a score to each candidate phrase pair to create a ranking.

  • Pruning – Reducing the set of candidate translations to a computationally tractable number.


Identification

Identification

  • Extraction from sentence level alignments.

  • For each source phrase identify the sentences in which they occur and load the sentence alignment

  • Form a sliding/expanding window in the alignment to identify candidate translations.


Identification example i

Identification Example - I


Identification example ii

Identification Example - II

  • - is

  • is in step with the

  • is in step with the establishment

  • is in step with the establishment of

  • is in step with the establishment of its

  • is in step with the establishment of its legal

  • is in step with the establishment of its legal system

  • the

  • the establishment

  • the establishment of

  • ……

  • the establishment of its legal system

  • ……

  • establishment

  • establishment of

  • establishment of its

  • ….


Scoring i

Scoring - I

  • This candidate set H needs to be scored and ranked before pruning.

  • Alignment based scores.

  • Similarity clustering

    • Assume that the hypothesis set contains several similar phrases ( across several sentences ) and several noisy phrases.

    • SimScore(h) = Mean(EditDistance(h, h’)/AvgLen(h,h’)) for h,h’ in H


Scoring example

Scoring Example


Scoring ii

Scoring - II

  • Lexicon augmentation

    • Weight each point in alignment scoring by their lexical probability.

      • P( si | tj ) where I, J represent the area of the translation hypothesis being considered. Only the pairs of words where there is an alignment is considered.

    • Calculate translation probability of hypothesis

      • ΣiΠj P( si | tj ) All words in the hypothesis are considered.


Combining scores

Combining Scores

  • Final Score(h) = Πj Scorej(h) for each scoring method.

  • Due to additional morphology present in English as compared to Chinese, a length model is used to adjust the final score to prefer longer phrases.

  • Diff Ratio = (I-J) / J if I>J

  • FinalScore(h)=FinalScore(h)*(1.0+c*e(-1.0*DiffRatio) )

    • c is an experimentally determined constant


Pruning

Pruning

  • This large candidate list is now sorted by score and is ready for pruning.

  • Difficult to pick a threshold that will work across different phrases. We need a split point that separates the useful and the noisy candidates.

  • Split point = argmax p {MeanScore(h<p) – MeanScore(h>=p)}where h represents each hypothesis in the ordered set H.


Experiments

Experiments

  • Alignment model – experimented with one-way (EF) and two-way (EF-FE union/intersection) for IBM Models 1-4.

    • Best results found using union (high recall model) from model 4.

  • Both lexical augmentation (using model 1 lexicon) scores and length bonus were applied.


Results and thoughts

Results and Thoughts

Small Track

Large Track

Baseline (IBM1+LDC-Dic)

6.3775

6.52

+ Phrases

6.7405

7.316

-More effective pruning techniques will significantly reduced the experimentation cycle

- Improved alignment models that better combine bi-directional alignment information


Combining methods

Combining Methods

Small Data Track

(Dec-01 data)

Segmentation

standard

improved

Baseline(IBM1+LDC-Dic)

6.2381

6.3775

+ Phrases Joy

6.5624

6.7987

+ Phrases Ashish

6.5295

6.7405

+ Phrases Joy & Ashish

6.6427

6.8790


  • Login