1 / 10

Semantic Evaluation of Machine Translation

Semantic Evaluation of Machine Translation. Billy Wong, City University of Hong Kong 21 st May 2010. Introduction. Surface text similarity is not a reliable indicator in automatic MT evaluation Insensitive to variation of translation Deeper linguistic analysis is preferred

Download Presentation

Semantic Evaluation of Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21st May 2010

  2. Introduction • Surface text similarity is not a reliable indicator in automatic MT evaluation • Insensitive to variation of translation • Deeper linguistic analysis is preferred • WordNet is widely used for matching synonyms • E.g. METEOR (Banerjee & Lavie 2005), TERp (Snover et al. 2009), ATEC (Wong & Kit 2010)… • Is the similarity of words between MT outputs and references fully described?

  3. Motivation • WordNet • Granularity of sense distinctions is highly fine-grained • Word pairs not in the same sense: • [mom vs mother], [safeguard vs security], [expansion vs extension], [journey vs tour], [impact vs influence]…etc. • Word pairs in similar meaning • Problematic if ignore them in evaluation • What is needed is a word similarity measure • Proposal: • Utilization of word similarity measures in automatic MT evaluation

  4. Word Similarity Measures • Knowledge-based (WordNet) • Wup (Wu & Palmer 1994) • Res (Resnik 1995) • Jcn (Jiang & Conrath 1997) • Hso (Hirst & St-Onge 1998) • Lch (Leacock & Chodorow 1998) • Lin (Lin 1998) • Lesk (Banerjee & Pedersen 2002) • Corpus-based • LSA (Landauer et al. 1998)

  5. Experiment • Three questions: • To what extent two words are considered similar? • Which word similarity measure(s) is/are more appropriate to use? • How much performance gain an MT evaluation metric can obtain by incorporating word similarity measures?

  6. Setting • Data • MetricsMATR08 development data • 1992 MT outputs • 8 MT systems • 4 references • Evaluation metric • Unigram matching • Exact match / synonym / semantically similar • Same weight • Three variants • Precision (p), recall (r) and F-measure (f) where c: MT output t: reference translation

  7. Result (1) • Correlation thresholds of each measure

  8. Result (2) • Correlation of the metric

  9. Conclusion • The importance of semantically similar words in automatic MT evaluation • Two word similarity measures, wup and LSA, perform relatively better • Remaining problems • Semantic similarity vs. Semantic relatedness • E.g. [committee vs chairman] (LSA) • Most WordNet similarity measures run on verbs and nouns only

  10. Thank you

More Related