Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan Speech and Language Processing Unit BBN Technologies Cambridge, MA

Talk progress • Statistical machine translation • Word alignment • Alignment entropy • Alignment error analysis • Bitext translation quality • Translation quality analysis • Conclusion and future directions

Statistical machine translation (SMT) Start with a large bitext Parallel corpora or “sentence pairs” Lots (thousands/millions) of translation pairs! Align sentence pairs at the word level Extract phrase pairs or translation rules Constrained by word alignments Decode source with extracted phrases/rules In conjunction with a language model

Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

Word alignment Link corresponding words in sentence pairs Forms basis of almost all SMT architectures Statistical word alignment [Brown93, Vogel96] Probabilistic noisy-channel-based translation model Tm Estimated using expectation-maximization (EM) Choose most likely (Viterbi) alignment Av NULL tjAr bDAEp commodity traders

Word alignment quality • Errors in alignment are caused by • Data sparsity (low-resource languages) • Translation errors • Paraphrasing, non-literal translations • Alignment errors affect translation quality [Fraser07] • Correcting or discarding bad alignments may help • How do we identify poorly aligned constituents? • Need automated alignment quality metric • Unsupervised: no manual intervention • Correlates with supervised measures (e.g. AER) • Scales up from the word- to the corpus-level

An obvious candidate metric • Length-normalized Viterbi alignment score • Monotonic function of p(Av | Tm) • By-product of alignment process • Benefits • Readily available unsupervised metric • Intuitive, easy to understand • Drawbacks • A low probability alignment need not be incorrect • Poor granularity: only sentence-level alignment quality

Alignment entropy • Uncertainty of a link in the Viterbi alignment • Higher uncertainty implies poorer alignment? • Basis for automated alignment quality metric • Need a probability distribution over alignments • Different contexts for a given sentence pair • Estimate a multinomial distribution over word alignments • Bootstrapping simulates different contexts • Resample original bitext with replacement

iterate over all target words (including NULL) index of target word to which fij is aligned { = 1 iff fij is aligned to eik in the l-th bag = 0 otherwise jth word of ith source sentence set of resampled bitexts in which the ith sentence pair occurs Defining alignment entropy

Evaluating alignment entropy

Notes on alignment entropy • Measures variability of alignments across bags • Defined only for IBM model alignments • Each source word linked to exactly one target word • Unidirectional: defined for source-target links • Reverse alignment for target-source alignment entropy • Combine the two for bidirectional alignment entropy • Sentence-pair specific • Not fixed for a given source vocabulary word • Defined for each source word in every sentence pair

Alignment error analysis • IBM-4 alignments using GIZA++ [Al-Onaizan99] • English/Arabic: 129,126 pairs (ca. 1.5M words) • 100 training contexts (1 original, 99 resampled) • Bidirectional sentence-level alignment entropy • Bin into (H)ighest, (L)ow, and (Z)ero entropy sets • Select ca. 250 sentence pairs from each set • Length-normalized Viterbi alignment score • Pool sentence pair sets selected above • Re-rank by normalized Viterbi alignment score • Pick ca. 250 pairs with worst scores (A) • Gold-standard manual alignments for each set • Precision, recall, AER, balanced F-measure

Alignment error analysis Table 1 Alignment entropy vs. alignment quality

Notes on alignment error analysis • Results support our hypothesis • Higher alignment entropy indicates poorer quality • Superior to normalized Viterbi alignment score • AER(H-set) > AER(A-set) by 6.8% absolute

Bitext translation quality • Human translations often contain errors • Non-native speakers of one language • Some constructs difficult to translate (e.g. idioms) • Oversight, inadequate quality control • Predicting problems in human translations • Semantic errors, missing chunks, etc. • Non-literality (paraphrasing) • Use alignment entropy to identify problems? • Is it correlated with translation quality?

Measuring bitext translation quality • TER/HTER analysis of existing translations [Snover06] • Against carefully prepared gold-standard translations • Translation Edit Rate (TER) • # insertions, deletions, substitutions, and shifts • Lexically-based, no notion of semantic equivalence • Human-targeted Translation Edit Rate (HTER) • Human expert produces targeted references • Minimally edit hypotheses for semantic equivalence to untargeted gold-standard references • HTER = TER evaluated against targeted references • Minimizes impact of lexical choice

Translation quality analysis • Existing translations are the “hypotheses” • New gold-standard references for H-, L-, and Z-sets • Minimal paraphrasing, as literal as possible • Thoroughly checked for quality • Evaluate TER between “hypotheses” and gold-standard • Measure of translation literality • HTER evaluation • Targeted references from hypotheses and gold-standard • HTER = TER of hypotheses w.r.t. targeted references • Measure of semantic translation correctness

Translation quality analysis Table 2 Alignment entropy vs. translation quality

Notes on translation quality analysis • Predicting translation literality • Higher alignment entropy produces higher TER • Indicative of paraphrasing • Semantic correctness of translation pairs • Excellent equivalence in zero/low-entropy pairs • Significant errors in highest alignment entropy pairs

Conclusion • Excellent predictor of alignment quality • Fine-grained, extensible, word-level measure • Superior to normalized Viterbi alignment score • Serves as measure of translation literality • Identifies translation pairs with gross errors • Useful tool for validating human translations

Future directions • Bootstrapped phrase confidence for SMT [ongoing] • Consistency of phrase pairs across resampled bitexts • Integrated as a phrase level feature (tuned with MERT) • Modest BLEU improvements (0.7-1.0 point) • Online human translation validation [planned] • Identify potential translation errors on the fly • Assist human translators for rapid SMT development • Enriched machine translation [planned] • Project features across high-confidence alignments • Availability of a fine-grained measure is key

References [Brown93] Peter E. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19:263-311. [Vogel96] Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based Word Alignment in Statistical Translation. In Proceedings of the 16th conference on Computational Linguistics, pp. 836-841, Morristown, NJ. [Fraser07] Alexander Fraser and Daniel Marcu. 2007. Measuring Word Alignment Quality for Statistical Machine Translation. Computational Linguistics, 33(3):293-303. [Al-Onaizan99] Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz Josef Och, David Purdy, Noah A. Smith, and David Yarowsky. 1999. Statistical Machine Translation: Final Report. Technical Report, JHU Summer Workshop. [Snover06] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings AMTA, pp. 223-231.

Thank you!

Supervised alignment quality • Annotate “sure” (S) and “possible” (P) links • Evaluate against hypothesis alignment A

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation

Presentation Transcript

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation System

Statistical Machine Translation

An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Machine Translation Phrase Alignment

Statistical Machine Translation Word Alignment

Statistical Machine Translation

Machine Translation Word Alignment

Statistical Machine Translation

An Overview of Statistical Machine Translation

Statistical Alignment and Machine Translation

Bayesian Word Alignment for Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

An Introduction to Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Machine Translation, Statistical Approach