1 / 33

Post-editing, linguistic expertise and QA for MT at eBay

Post-editing, linguistic expertise and QA for MT at eBay. Lucie Le Naour and Costantis Galatis eBay Machine Translation Linguistic Team June 2015. 1. Introduction to MT at eBay and challenges 2. Post-editing best practices 3. QA processes 4. Specifics of MT corpora for EN/DE <> FR/DE 5. Q&A .

joerichards
Download Presentation

Post-editing, linguistic expertise and QA for MT at eBay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Post-editing, linguistic expertise and QA for MT at eBay Lucie Le Naour and Costantis Galatis eBay Machine Translation Linguistic Team June 2015

  2. 1. Introduction to MT at eBay and challenges2. Post-editing best practices3. QA processes4. Specifics of MT corpora for EN/DE <> FR/DE5. Q&A

  3. Introduction to MT at eBay MT at eBay is mission critical for promoting cross-border trade, connecting buyers and sellers worldwide and increasing revenue. • 152+M active users • 800+M total listings (80% new) • Enabled Commerce Volume (ECV) in Q3 2014 was $63 billion • Cross-border trade grew 27%, representing $14 billion of ECV • Why MT? - User expectation : >80% in FR and >89% in IT would recommend translating all item titles on eBay - Volume in Q3 2014: >72m queries and >30m titles per day eBay MT

  4. eBay MT

  5. Users expect a localized page Cross-border buyers want to see the whole website in their own language. Translations of poor quality lead to disappointed users. At worst, it causes “drop-offs” – meaning users leaving for competitors that offer services with native linguistic standards. At best, even if the user completes their transaction, their opinion of eBay might be affected negatively. eBay MT

  6. eBay MT

  7. Introduction to MT at eBay Numbers on MT at eBay (2014): Word count for all languages in 2014 — 5,186,400 Word count for all languages in Q1 2015 — 1,270,000 Word count for FR<>DE in 2014 — 313,000 Word count for FR<>DE in Q1 2015 — 200,000  User feedback related to quality: >88% FR and >94% IT buyers rate the translation quality as medium or higher. >85% agreed or were neutral that the presence of international products improved their shopping experience >91% agreed or were neutral that they could find what they are looking for >83% agreed or were neutral to the statement that they are not opposed to buying products outside of their country eBay MT

  8. Introduction to MT at eBay What is translated by MT: Statistical MT engines are currently used at eBay to translate user-generated content (UGC), namely search queries (QT), item descriptions (ITD) and item titles (ITT) Search queries eBay MT

  9. Item titles eBay MT

  10. Item descriptions eBay MT

  11. Challenges User-generated content Linguistic problems: User-generated content is linguistically/grammatically imperfect and thus harder to translate correctly. Lack of context is an additional issue. eBay MT

  12. Challenges Non-normalized content: Colloquial language in UGC vs. formal language in corpora Spelling/grammar Terminology their vs. there phone case mobile enclosure eBay MT

  13. Challenges Corpora/category specific language: There are more than 4kcategories on eBay.com, which means our MT engine has to learn the terminology of as many domains. eBay MT

  14. Post-editing best practices Differences between general post-editing and eBay specific: Post-editing is used by many companies as a way to produce actionable translations faster. The content we post-edit is for training the MT engine; in other words our texts are not for a human but a machine. Training cycle diagram: Vendor Post-editing Source MTLS Review Post-processing MT translation eBay MT

  15. Experimentation and Evaluation Cycle production Track Measure Feedback train Human judgment and feedback for obvious errors Internal hillclimbing with autom. measures eval2 eval1 … tune test eBay MT

  16. Test and Evaluation Data Human translations Post-editing depending on content type, e.g. language id, segmentation avoid repetition, cover range of categories, products, styles high visibility or engagement, e.g. impression, content relevancy eBay MT

  17. Post-editing best practices Guidelines for vendors Guidelines and trainings are essential components of our PE process Current guidelines introduce our project to the vendors: Guidelines are different for each type of content and present clear instructions on post-editing. Statistical QA approach Our approach to measure quality is statistical Quality thresholds used during corpora review and Named Entity Recognition (NER) eBay MT

  18. Evaluation template eBay MT

  19. QA:Processes and linguistic expertise Prior post-processing Translation data post-processing with regex before vendor post-editing eBay MT

  20. QA: Processes and linguistic expertise Requirements: Blacklist: non-usable expressions Acronym lists: compulsory and non-compulsory Do-not-translate lists: DNTs (brands/products) Regex lists: for post-processing Spellchecker rule list eBay MT

  21. QA: Processes and linguistic expertise Named Entity Handling • represent at least 50% of the words in item titles • named entities are for example: brand and product name, color, material, dimension, operating system, type, style, size, etc. eBay MT

  22. QA: Processes and linguistic expertise Human evaluation (raw MT output) and error analysis Search Queries: Translation: acceptable/not acceptable Item Titles and Item Descriptions: eBay MT

  23. QA: Processes and linguistic expertise Query evaluation Search results: relevant/not relevant Locate null results queries eBay MT

  24. Specifics of MT corpora for EN/DE <> FR/DE Edit distance issues eBay MT

  25. Specifics of MT corpora for EN/DE <> FR/DE Compound issues in German (compound splitter) Aktionsplan = actionplan Aktion s plan = action plan Akt ion s plan = act ion plan eBay MT

  26. Specifics of MT corpora for EN/DE <> FR/DE Special characters related to language Inches converted to quotes eBay MT

  27. Specifics of MT corpora for EN/DE <> FR/DE Accents in French eBay MT

  28. Specifics of MT corpora for EN/DE <> FR/DE Umlaute and ß in German eBay MT

  29. Specifics of MT corpora for EN/DE <> FR/DE Special characters related to categories In philatelics, „*“ indicates the condition of a stamp eBay MT

  30. Specifics of MT corpora for EN/DE <> FR/DE Acronyms (compulsory, only 1 meaning) eBay MT

  31. Specifics of MT corpora for EN/DE <> FR/DE Acronyms (several meanings possible) RAF= UK: Royal Air Force DE: Rote Armee Fraktion (Red Army Faction) Dt.= DE: German EN: Delta t (time derivative) eBay MT

  32. Specifics of MT corpora for EN/DE > FR/DE Localization issues Artistic titles in queries are localized to avoid mistranslations: eBay MT

  33. Thank you! eBay MT

More Related