Improving SMT with
1 / 29

Improving SMT with Phrase to Phrase Translations - PowerPoint PPT Presentation

  • Uploaded on

Improving SMT with Phrase to Phrase Translations. Joy Ying Zang, Ashish Venugopal, Stephan Vogel, Alex Waibel Carnegie Mellon University Project: Mega-RADD. CMU Mega RADD.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Improving SMT with Phrase to Phrase Translations' - kalea

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Improving SMT withPhrase to Phrase Translations

Joy Ying Zang, Ashish Venugopal,

Stephan Vogel, Alex Waibel

Carnegie Mellon University

Project: Mega-RADD


The Mega-RADD Team:SMT: Stephan Vogel, Alex Waibel, John Lafferty,EMBT: Ralf Brown, Bob Frederking,Chinese: Joy Ying Zang, Ashish Venugopal,

Bing Zhao, Fei HuangArabic: Alicia Tribble, Ahmed Badran


  • Goals:

    • Develop Data-Driven General Purpose MT Systems

    • Train on Large and Small Corpora, Evaluate to test Portability

  • Approaches

    • Two Data-driven Approaches: Statistical, Example-Based

    • Also Grammar based Translation System

    • Multi-Engine Translation

  • Languages: Chinese and Arabic

  • Statistical Translation:

    • Exploit Structure in Language: Phrases

    • Determine Phrases from Mono- and Bi-Lingual Co-occurrences

    • Determine Phrases from Lexical and Alignment Information

Arabic: Initial System

  • 1 million words of UN data, 300 sentences for testing

  • Preprocessing: separation of punctuation marks, lower case for English, correction of corrupted numbers

  • Adding Human knowledge: cleaning statistical lexicon for 100 most frequent words building lists names, simple date expressions, numbers (total: 1000 entries, total effort: two part-timers * 4 weeks)

  • Alignment: IBM1 plus HMM training, lexicon plus phrase translations

  • Language Model: trained on 1m sub-corpus

  • Results (20 May 2002): UN test data (300 sentences): Bleu = 0.1176 NIST devtest (203 sentences): Bleu = 0.0242 NIST = 2.0608

Arabic: Portability to a New Language

  • Training on subset of UN corpus chosen to cover vocabulary of test data

  • Training English to Arabic for extraction of phrase translations

  • Minimalist Morphology: strip/add suffixes for ~200 unknown wordsNIST: 5.5368  5.6700

  • Adapting LM: Select stories from 2 years of English Xinhua storiesaccording to 'Arabic' keyword list (280 entries); size 6.9m words.NIST: 5.5368  5.9183

  • Results:- 20 Mai (devtest): 2.0608- 13 June (devtest): 6.5805- 14 June (evaltest): 5.4662 (final training not completed)- 17 June (evaltest): 6.4499 (after completed training)- 19 Juli (devtest): 7.0482

Two approaches
Two Approaches

  • Determine Phrases from Mono- and Bi-Lingual Co-occurrences

    • Joy

  • Determine Phrases from Lexical and Alignment Information

    • Ashish

Why phrases
Why phrases?

  • Mismatch between languages: word to word translation doesn’t work

  • Phrases encapsulate the context of words, e.g. verb tense

Why phrases cont
Why phrases? (Cont.)

  • Local reordering, e.g. Chinese relative clause

  • Using phrases to soothe word segmentation failure

Utilizing bilingual information
Utilizing bilingual information

  • Given a sentence pair (S,T),


    T=<t1,t2,…,tj,…,tn>, where si/tj are source/target words.

  • Given an m*n matrix B, where

    B(i,j)= co-occurrence(si,tj)=

    where, N=a+b+c+d;

Utilizing bilingual information cont
Utilizing bilingual information (Cont.)

  • Goal: find a partition over matrix B, under the constraint that one src/tgt word can only align to one tgt/src word or phrase (adjacent word sequence)

Legal segmentation, imperfect alignment

Illegal segmentation, perfect alignment

Utilizing bilingual information cont1
Utilizing bilingual information (Cont.)

For each sentence pair in the training data:

While(still has row or column not aligned){

Find cell[i,j], where B(i,j) is the max among all available(not aligned) cells;

Expand cell[i,j] with similarity sim_thresh to region[RowStart,RowEnd; ColStart,ColEnd]

Mark all the cells in the region as aligned


Output the aligned regions as phrases


Sub expand cell[i,j] with sim_thresh {

current aligned region: region[RowStart=i, RowEnd=i; ColStart=j, ColEnd=j]

While(still ok to expand){

if all cells[m,n], where m=RowStart-1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart --; //expand to north

if all cells[m,n], where m=RowEnd+1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart ++; //expand to south

… //expand to east

… //expand to west


Define similar(x,y)= true, if abs((x-y)/y) < 1-similarity_thresh

Utilizing bilingual information cont2
Utilizing bilingual information (Cont.)

Expand to North

Expand to South

Expand to East

Expand to West

Integrating monolingual information

Santa Clarita

Union town


Los Angeles



Integrating monolingual information

  • Motivation:

    • Use more information in the alignment

    • Easier for aligning phrases

    • There is much more monolingual data than bilingual data

Santa Monica

Integrating monolingual information cont
Integrating monolingual information (Cont.)

  • Given a sentence pair (S,T),

    S=<s1,s2,…,si,…sm> and T=<t1,t2,…,tj,…,tn>, where si/tj are source/target words.

  • Construct m*m matrix A,where A(i,j) = collocation(si, sj); Only A(i,i-1) and A(i,i+1) have values.

  • Construct n*n matrix C,where C(i,j) = collocation(ti, tj); Only C(j-1,j) and A(j+1,j) have values.

  • Construct m*n matrix B, where B(i,j)= co-occurrence(si, tj).

Integrating monolingual information cont1
Integrating monolingual information (Cont.)

  • Normalize A so that:

  • Normalize C so that:

  • Normalize B so that:

  • Calculating new src-tgt matrix B’



Discussion and results
Discussion and Results

  • Simple

  • Efficient

    • Partitioning the matrix is linear O(min(m,n)).

    • The construction of A*B*C is O(m*n);

  • Effective

    • Improved the translation quality from baseline (NIST= 6.3775, Bleu=0.1417 ) to (NIST= 6.7405, Bleu=0.1681) on small data track dev-test

Utilizing alignment information motivation
Utilizing alignment information: Motivation

  • Alignment model associates words and their translations on the sentence level.

  • Context and co-occurrence are represented when considering a set of sentence level alignments.

  • Extract phrase relations from the alignment information.

Processing alignments
Processing Alignments

  • Identification – Selection of the source phrases target phrase candidates.

  • Scoring – Assigning a score to each candidate phrase pair to create a ranking.

  • Pruning – Reducing the set of candidate translations to a computationally tractable number.


  • Extraction from sentence level alignments.

  • For each source phrase identify the sentences in which they occur and load the sentence alignment

  • Form a sliding/expanding window in the alignment to identify candidate translations.

Identification example ii
Identification Example - II

  • - is

  • is in step with the

  • is in step with the establishment

  • is in step with the establishment of

  • is in step with the establishment of its

  • is in step with the establishment of its legal

  • is in step with the establishment of its legal system

  • the

  • the establishment

  • the establishment of

  • ……

  • the establishment of its legal system

  • ……

  • establishment

  • establishment of

  • establishment of its

  • ….

Scoring i
Scoring - I

  • This candidate set H needs to be scored and ranked before pruning.

  • Alignment based scores.

  • Similarity clustering

    • Assume that the hypothesis set contains several similar phrases ( across several sentences ) and several noisy phrases.

    • SimScore(h) = Mean(EditDistance(h, h’)/AvgLen(h,h’)) for h,h’ in H

Scoring ii
Scoring - II

  • Lexicon augmentation

    • Weight each point in alignment scoring by their lexical probability.

      • P( si | tj ) where I, J represent the area of the translation hypothesis being considered. Only the pairs of words where there is an alignment is considered.

    • Calculate translation probability of hypothesis

      • ΣiΠj P( si | tj ) All words in the hypothesis are considered.

Combining scores
Combining Scores

  • Final Score(h) = Πj Scorej(h) for each scoring method.

  • Due to additional morphology present in English as compared to Chinese, a length model is used to adjust the final score to prefer longer phrases.

  • Diff Ratio = (I-J) / J if I>J

  • FinalScore(h)=FinalScore(h)*(1.0+c*e(-1.0*DiffRatio) )

    • c is an experimentally determined constant


  • This large candidate list is now sorted by score and is ready for pruning.

  • Difficult to pick a threshold that will work across different phrases. We need a split point that separates the useful and the noisy candidates.

  • Split point = argmax p {MeanScore(h<p) – MeanScore(h>=p)}where h represents each hypothesis in the ordered set H.


  • Alignment model – experimented with one-way (EF) and two-way (EF-FE union/intersection) for IBM Models 1-4.

    • Best results found using union (high recall model) from model 4.

  • Both lexical augmentation (using model 1 lexicon) scores and length bonus were applied.

Results and thoughts
Results and Thoughts

Small Track

Large Track

Baseline (IBM1+LDC-Dic)



+ Phrases



-More effective pruning techniques will significantly reduced the experimentation cycle

- Improved alignment models that better combine bi-directional alignment information

Combining methods
Combining Methods

Small Data Track

(Dec-01 data)







+ Phrases Joy



+ Phrases Ashish



+ Phrases Joy & Ashish