1 / 33

Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

May 26 th 2014. Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy. Ekaterina Stambolieva e katerina.stambolieva@euroscript.lu. Outline. Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work. WHY?. impending industry problem:.

eavan
Download Presentation

Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. May 26th 2014 Continuous Operational Evaluation of Evolving Proprietary MT Solution’s Adequacy Ekaterina Stambolieva ekaterina.stambolieva@euroscript.lu

  2. Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work

  3. WHY? • impending industry problem: MTE, May 26th 2014

  4. WHY? • impending industry problem: How do we compare MT systems over time? MTE, May 26th 2014

  5. WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? MTE, May 26th 2014

  6. WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? BLEU? MTE, May 26th 2014

  7. WHY? • impending industry problem: • We measure MT quality continuously How do we compare MT systems over time? We want adequate translations BLEU? MTE, May 26th 2014

  8. Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work

  9. ADEQUACY How do we define MT adequacy in business? MTE, May 26th 2014

  10. ADEQUACY How do we define MT adequacy in business? accelerate time-to-delivery reduce translation costs achieve near-native fluency MTE, May 26th 2014

  11. ADEQUACY adequacy MTE, May 26th 2014

  12. ADEQUACY adequacy improving MT output’s acceptance for the task of post-editing MTE, May 26th 2014

  13. WHAT • We aim at evaluating our MT systems continuously and compare results over time MTE, May 26th 2014

  14. WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback MTE, May 26th 2014

  15. WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback • We do not directly evaluate translation quality, instead we assesses over-time MT output improvement MTE, May 26th 2014

  16. WHAT • We aim at evaluating our MT systems continuously and compare results over time • We design our system’s improvements based on human end-user feedback • We do not directly evaluate translation quality, instead we assesses over-time MT output improvement • no annotation effort required MTE, May 26th 2014

  17. Outline Why? MT Adequacy? What? Evaluation • Edit Distance Findings Conclusion & Future Work

  18. THE EXAMPLE • We compare the results of 2 MT English<->Danish systems MTE, May 26th 2014

  19. THE EXAMPLE • We compare the results of 2 MT English<->Danish systems BLEU 1 2 EN->DA 59.22 DA->EN 64.26 MTE, May 26th 2014

  20. THE EXAMPLE • We compare the results of 2 MT English<->Danish systems BLEU 1 2 EN->DA 59.22 58.84 DA->EN 64.26 63.98 MTE, May 26th 2014

  21. CATEGORIES • 3 objective categories to evaluate MT output • Does the MT output look better than before? • Does the MT output look worse than before? • Is it difficult for you to judge whether the MT output is better or not? MTE, May 26th 2014

  22. EVALUATION • We will present MT output evaluation based on the Edit Distance (ED) score MTE, May 26th 2014

  23. EVALUATION • We will present MT output evaluation based on the Edit Distance (ED) score We compute in how many edits MT output transforms into the human translation segment based on the same source MTE, May 26th 2014

  24. Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work

  25. FINDINGS MTE, May 26th 2014

  26. FINDINGS MTE, May 26th 2014

  27. FINDINGS Improved MT acceptance for the task of post-editing MTE, May 26th 2014

  28. FINDINGS Length variance comparison between MT output with the new and old system does not affect MT acceptance MTE, May 26th 2014

  29. Outline Why? MT Adequacy? What? Evaluation Findings Conclusion & Future Work

  30. FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014

  31. FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014

  32. FUTURE WORK • Modify ED to take into consideration the number of UNK words • Modify the metric so that it detects small improvements in the system • such as number isolation • tag protection • Take segment character length into consideration • So not to penalize too much shorter segments MTE, May 26th 2014

  33. Thank you MTE, May 26th 2014

More Related