1 / 29

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation. Shankar Ananthakrishnan Rohit Prasad Prem Natarajan. Speech and Language Processing Unit BBN Technologies Cambridge, MA. Talk progress. Statistical machine translation Word alignment

tamal
Download Presentation

Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan Speech and Language Processing Unit BBN Technologies Cambridge, MA

  2. Talk progress • Statistical machine translation • Word alignment • Alignment entropy • Alignment error analysis • Bitext translation quality • Translation quality analysis • Conclusion and future directions

  3. Statistical machine translation (SMT) Start with a large bitext Parallel corpora or “sentence pairs” Lots (thousands/millions) of translation pairs! Align sentence pairs at the word level Extract phrase pairs or translation rules Constrained by word alignments Decode source with extracted phrases/rules In conjunction with a language model

  4. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  5. Word alignment Link corresponding words in sentence pairs Forms basis of almost all SMT architectures Statistical word alignment [Brown93, Vogel96] Probabilistic noisy-channel-based translation model Tm Estimated using expectation-maximization (EM) Choose most likely (Viterbi) alignment Av NULL tjAr bDAEp commodity traders

  6. Word alignment quality • Errors in alignment are caused by • Data sparsity (low-resource languages) • Translation errors • Paraphrasing, non-literal translations • Alignment errors affect translation quality [Fraser07] • Correcting or discarding bad alignments may help • How do we identify poorly aligned constituents? • Need automated alignment quality metric • Unsupervised: no manual intervention • Correlates with supervised measures (e.g. AER) • Scales up from the word- to the corpus-level

  7. An obvious candidate metric • Length-normalized Viterbi alignment score • Monotonic function of p(Av | Tm) • By-product of alignment process • Benefits • Readily available unsupervised metric • Intuitive, easy to understand • Drawbacks • A low probability alignment need not be incorrect • Poor granularity: only sentence-level alignment quality

  8. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  9. Alignment entropy • Uncertainty of a link in the Viterbi alignment • Higher uncertainty implies poorer alignment? • Basis for automated alignment quality metric • Need a probability distribution over alignments • Different contexts for a given sentence pair • Estimate a multinomial distribution over word alignments • Bootstrapping simulates different contexts • Resample original bitext with replacement

  10. iterate over all target words (including NULL) index of target word to which fij is aligned { = 1 iff fij is aligned to eik in the l-th bag = 0 otherwise jth word of ith source sentence set of resampled bitexts in which the ith sentence pair occurs Defining alignment entropy

  11. Evaluating alignment entropy

  12. Notes on alignment entropy • Measures variability of alignments across bags • Defined only for IBM model alignments • Each source word linked to exactly one target word • Unidirectional: defined for source-target links • Reverse alignment for target-source alignment entropy • Combine the two for bidirectional alignment entropy • Sentence-pair specific • Not fixed for a given source vocabulary word • Defined for each source word in every sentence pair

  13. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  14. Alignment error analysis • IBM-4 alignments using GIZA++ [Al-Onaizan99] • English/Arabic: 129,126 pairs (ca. 1.5M words) • 100 training contexts (1 original, 99 resampled) • Bidirectional sentence-level alignment entropy • Bin into (H)ighest, (L)ow, and (Z)ero entropy sets • Select ca. 250 sentence pairs from each set • Length-normalized Viterbi alignment score • Pool sentence pair sets selected above • Re-rank by normalized Viterbi alignment score • Pick ca. 250 pairs with worst scores (A) • Gold-standard manual alignments for each set • Precision, recall, AER, balanced F-measure

  15. Alignment error analysis Table 1 Alignment entropy vs. alignment quality

  16. Notes on alignment error analysis • Results support our hypothesis • Higher alignment entropy indicates poorer quality • Superior to normalized Viterbi alignment score • AER(H-set) > AER(A-set) by 6.8% absolute

  17. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  18. Bitext translation quality • Human translations often contain errors • Non-native speakers of one language • Some constructs difficult to translate (e.g. idioms) • Oversight, inadequate quality control • Predicting problems in human translations • Semantic errors, missing chunks, etc. • Non-literality (paraphrasing) • Use alignment entropy to identify problems? • Is it correlated with translation quality?

  19. Measuring bitext translation quality • TER/HTER analysis of existing translations [Snover06] • Against carefully prepared gold-standard translations • Translation Edit Rate (TER) • # insertions, deletions, substitutions, and shifts • Lexically-based, no notion of semantic equivalence • Human-targeted Translation Edit Rate (HTER) • Human expert produces targeted references • Minimally edit hypotheses for semantic equivalence to untargeted gold-standard references • HTER = TER evaluated against targeted references • Minimizes impact of lexical choice

  20. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  21. Translation quality analysis • Existing translations are the “hypotheses” • New gold-standard references for H-, L-, and Z-sets • Minimal paraphrasing, as literal as possible • Thoroughly checked for quality • Evaluate TER between “hypotheses” and gold-standard • Measure of translation literality • HTER evaluation • Targeted references from hypotheses and gold-standard • HTER = TER of hypotheses w.r.t. targeted references • Measure of semantic translation correctness

  22. Translation quality analysis Table 2 Alignment entropy vs. translation quality

  23. Notes on translation quality analysis • Predicting translation literality • Higher alignment entropy produces higher TER • Indicative of paraphrasing • Semantic correctness of translation pairs • Excellent equivalence in zero/low-entropy pairs • Significant errors in highest alignment entropy pairs

  24. Talk progress Statistical machine translation Word alignment Alignment entropy Alignment error analysis Bitext translation quality Translation quality analysis Conclusion and future directions

  25. Conclusion • Excellent predictor of alignment quality • Fine-grained, extensible, word-level measure • Superior to normalized Viterbi alignment score • Serves as measure of translation literality • Identifies translation pairs with gross errors • Useful tool for validating human translations

  26. Future directions • Bootstrapped phrase confidence for SMT [ongoing] • Consistency of phrase pairs across resampled bitexts • Integrated as a phrase level feature (tuned with MERT) • Modest BLEU improvements (0.7-1.0 point) • Online human translation validation [planned] • Identify potential translation errors on the fly • Assist human translators for rapid SMT development • Enriched machine translation [planned] • Project features across high-confidence alignments • Availability of a fine-grained measure is key

  27. References [Brown93] Peter E. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19:263-311. [Vogel96] Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based Word Alignment in Statistical Translation. In Proceedings of the 16th conference on Computational Linguistics, pp. 836-841, Morristown, NJ. [Fraser07] Alexander Fraser and Daniel Marcu. 2007. Measuring Word Alignment Quality for Statistical Machine Translation. Computational Linguistics, 33(3):293-303. [Al-Onaizan99] Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz Josef Och, David Purdy, Noah A. Smith, and David Yarowsky. 1999. Statistical Machine Translation: Final Report. Technical Report, JHU Summer Workshop. [Snover06] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings AMTA, pp. 223-231.

  28. Thank you!

  29. Supervised alignment quality • Annotate “sure” (S) and “possible” (P) links • Evaluate against hypothesis alignment A

More Related