1 / 11

Linguistically Targeted Test Suites

Linguistically Targeted Test Suites. November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro. Linguistic Core evaluation for Linguistic Core MT. Corpus of naturally occurring sentences in Kinyarwanda and Malagasy

gyula
Download Presentation

Linguistically Targeted Test Suites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistically Targeted Test Suites November 2, 2012 Lori Levin Jason Baldridge Chris Dyer Vijay John Kyle Jerro

  2. Linguistic Core evaluation for Linguistic Core MT • Corpus of naturally occurring sentences in Kinyarwanda and Malagasy • Sentences are annotated with tags showing constructions of interest (relative clauses, passives, etc.) Example: • Conditional, relative clause, headless relative clause, VOS, voice alternation, proximity, adjectival predicate • To really increase farmers’ representation in national politics, it is not enough to increase the number of delegates elected by farmers. • Tsyampynymampitombonyisan'nysolontenafidian'nytantsaharahatiananyhampitomboananysolontenan'nytantsahaeoamin'nysehatranasionaly.

  3. Lexical similarity is not always a good measure of translation quality • Good translations have low scores • when higher order n-grams don’t match • Bad translations persist when function words are undervalued • Errors in tense, definiteness, and negation persist • Lack of error analysis • Well understood constructions like relative clauses and passive voice are not modelled

  4. Underrating good translations From Giménez and Màrquez, 2010, page212 HYP: On Tuesday several missiles and mortar shells fell in southern Israel, but there were no casualties. R1: Several Qassam rockets and mortar shells were fired on southern Israel today Tuesday without victims. R2: Several Qassam rockets and mortars hit southern Israel today without causing any casualties. R3: A number of Quassam rockets and Howitzer missiles fell over southern Israel today, Tuesday, without causing any casualties. R4: Several Qassam rockets and mortar shells fell today, Tuesday on southern Israel without causing any victim. R5: Several Qassam rockets and mortar shells fell today, Tuesday, in southern Israel without causing any casualties. Acceptable to human translators but low BLEU score because of no higher order n-gram matches.

  5. Underrating good translations • From our Malagasy-English system: • Low BLEU score 0.0149826 • HYP: many held for many months but have no right to a lawyer . • REF: many got arrested for months without any right to have access to any lawyer . • High BLEU score 0.510864 • HYP: in a long post , called for freedom for other members of the committee for zon'oombelona i koohyargoodarzi . • REF: koohyargoodarzi in a long post asked for freedom for other members of the committee of human rights .

  6. Overrating bad translationsLack of focus on function words • Google Translate, October 31, 2012 • Chinese to English: Lost tense and missed preposition • I saw the person you talked to • 我看到了你交談的人 • I see the person you are talking • English to Japanese: Trouble with negative determiner “no” • Nostudents bought books. • いかなる生徒は本を買った。 • Any student bought a book.

  7. Not identifying the source of errors • In mature MT systems, many systematic errors occur in well-understood linguistic constructions: • Relative clauses (non-subject gaps) • Google translate, October 31, 2012 • I saw the person you gave a book to. • 私はあなたに本をくれた人を見た。 • I saw a man who gave me the book for you.

  8. Linguistic Evaluation • Evaluation based on syntactic or semantic roles is not reliable in the early stages of development when the output cannot be parsed well. • Och et al. 2003; Giménez and Màrquez 2010

  9. Early Stage Linguistic Core Evaluation • How well are we translating specific constructions? • Preliminary list of constructions of interest in Kinyarwanda and Malagasy: • Relative clauses • Passives and other non-active voices • Clefts and focus constructions • Conditional sentences • Comparatives • VOS word order • Causatives • Applicatives

  10. Early stage linguistically targeted evaluation • Automatic measures of lexical similarity • Which constructions correlate with low scores? • Error analysis conducted by human system developers

  11. Plans • More constructions • Possessives, rates, questions, tense, mood, aspect, etc. • Evaluation metrics based on linguistic structure • Such as lexical similarity of syntactic and semantic functions

More Related