1 / 35

Post-Editing of MT Output in a Production Setting : Experiences at the Pan American Health Organization Julia Aymerich

. . Post-Editing of MT Output in a Production Setting : Experiences at the Pan American Health Organization Julia Aymerich & Hermes Camelo. MT Post-Editing. “correction of machine translation output by human linguists/editors” (Dorothy Senez, 1998) Can it be automated?.

pillan
Download Presentation

Post-Editing of MT Output in a Production Setting : Experiences at the Pan American Health Organization Julia Aymerich

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. . . Post-Editing of MT Output in a Production Setting : Experiences at the Pan American Health Organization Julia Aymerich & Hermes Camelo Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  2. MT Post-Editing • “correction of machine translation output by human linguists/editors” (Dorothy Senez, 1998) Can it be automated? Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  3. Automated Post-Editing • Detection of intelligibility errors • Categorization of intelligibility errors • Automated correction of intelligibility errors • Techniques for automated post-editing • Techniques for evaluating post-edited improvement, including human acceptance ratings of automatically post-edited output Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  4. PAHO Translation Services • Provide translation services into English, Spanish, Portuguese, and French for HQ units • 8 staff members: • 1 chief • 1 Spanish translator • 1 English translator • 2 computational linguists • 3 office assistants • Roster of over 100 free-lance translators • MT licensed to over 180 sites throughout the world • MT developed and used in-house over 25 years Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  5. PAHOMTS® History 1976 First contract: SPANAM 1979 In-house development began 1980 SPANAM on mainframe Post-editing macros for the Wang word processor Feedback provided in writing on side-by-side printout 1985 ATN parser created ENGSPAN on mainframe Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  6. PAHOMTS® History 1991 Post-editing macros for WordPerfect 1992 From mainframe to PC on the PAHO LAN 1996 Post-editing macros for Microsoft Word 2000 32-bit Windows version 2003 Portuguese-English; English-Portuguese 2004 Portuguese-Spanish; Spanish-Portuguese Feedback provided electronically on side-by-side file Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  7. PAHOMTS® History 2004 Aligned corpus started 2005 Post-editing macros for PowerPoint Editing macros from WHO 2006 Synchronized post-editing Enhanced corpus alignment 2007 Automatic feedback from tri-column side-by-side Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  8. Translation request from HQ Unit  Workflow Job assignment  Processable with MT? TM available? NO YES YES Human translation  NO MT/TM translation  MT processing Post-editing  Feedback/Synchronization In-house revision  MT enhancements Delivery  Bilingual corpus  Translation Tracking System Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  9. Processing: is MT Appropriate? • Appropriate in 90% of cases • manuals, reports, proposals, abstracts, scientific articles, position papers, PowerPoint presentations • Not appropriate if: • Target language is French • Source document cannot be converted (e.g. PDF file with graphics only) • Hard copy only and quality not good enough for OCR • Too idiomatic: personal correspondence, dialogue, scripts, poetry, personal chat Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  10. MT Processing: Checklist • Perform a spelling check • Macro to spellcheck the document using the PAHOMTS® dictionaries. If too many words are not in the dictionaries, they are added before the job is run. • For large documents, before running the job, we extract terminology from bilingual corpus and feed it into the dictionaries. • Check language code • Incorrect hard returns • Punctuation in lists • Text boxes, embedded objects, and drawings No Pre-Editing! Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  11. MT Processing: Checklist • Sections blocked for translation • bibliographic references • text in another language • lists of names/addresses • tables with numbers only • independent words (The SMILE program) • Check consistency: if two different styles and vocabulary, divide in two jobs or activate different grammars Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  12. Machine Translation Processing • Rules for particular domains or styles are selected before the MT process begins • Type of Grammar: • Abstract, letter, manual, report, resolution, survey, speech, post description, news article, summary record • One type of grammar only • Specialized Vocabulary: • Medical research, financial, environment, equipment, agriculture, patient education, legal, computer science, United Nations, radiation, pharmaceutical, statistics, European variety • Several microglossaries may be activated, in order of priority Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  13. MT Processing (cont.) • Assistant verifies that MT did an acceptable job • If too many not-found words, they are added to the dictionaries and the document is retranslated. • If percentage of complete parses is too low (less than 60%), the document is rechecked for formatting and/or spelling errors. • Occasionally, very poorly written or formatted documents are returned to the requesting unit. • If the output is acceptable, post-edit it ! Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  14. Post-Editing • Get the big picture first on screen • Freelancers don’t have access to PAHOMTS®; only the output files. • Done directly on the RAW file or on the side-by-side file • Using the side-by-side file as a reference and to provide feedback • Making use of the MS Word / PowerPoint post-editing macros Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  15. Post-Editing: Training Translators • Post-edit on screen, not on paper • Insist on referring to original document • Use the side-by-side and mark: • errors • not-found words • preferred translations • When researching a translation, mark as highly reliable Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  16. Post-Editing: Non-Translators • Teach them what to fix in the source text. Give them a list of trouble spots to look for: • text that should not be translated • acronyms • formatting problems • Insist on careful post-editing: • non-translators tend to trust the raw output too much and overlook errors • Give them examples of embarrassing MT errors left uncorrected • Insist on providing terminological feedback Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  17. Feedback • Stylistic preferences should only be entered on the actual translation. • Post-editors work with dictionary coders • Provide feedback in the side-by-side file • Good feedback: • All new words • Official names of organizations (with reference) • Erroneous dictionary entries • Preferred alternate glosses • Wordfast: Translators provide feedback via e-mail using the source and target segments. Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  18. Seriousness of Errors • Easily fixed lexical errors • Some target constructions not generated correctly • Some source constructions not parsed correctly • Total lack of parsing Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  19. Techniques for Automated Post-Editing • Post-editing macros • Complete on-screen editing • Linked source and target segments • Editing and style macros from WHO Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  20. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  21. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: • Width adjustment • Search & Replace • Browse PAHOMTS® dictionary • Move word left • Move word right • Delete word • Lowercase • Uppercase • Document cleanup Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  22. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: • Delete definite article • Change next found “its” to “their” • Create possessive • Delete and switch • Create Noun-Noun compound • Undelete “of” • Serial comma Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  23. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: • Pluralize and go to next • Singularize and go to next • Feminine/Masculine • Smart delete of next definite article • Adjective > Adverb in -mente • Add diacritic Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  24. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: • Pluralize and go to next • Singularize and go to next • Feminine/Masculine • Smart delete of next definite article • Adjective > Adverb in -mente • Add diacritic • Clitic movement Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  25. PAHOMTS® Editing Tools • Microsoft Word / PowerPoint macros: • Display source segment • Provide feedback • Transfer segments from translated document into side-by-side (and vice-versa) • Clean-up synchronization marks • Create tri-column side-by-side Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  26. Synchronized Post-Editing • Source and Target segments are linked • Post-editor can easily provide feedback after each segment, if appropriate • Parallel tri-column side-by-side • source text • MT output • post-edited text Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  27. Synchronized Post-Editing • Side effects: • Parallel text can be used to extract editing rules to train post-editors • RAW and Final columns can be used to extract grammar fixes and dictionary entries to enhance the MT output • Source and Final columns are perfectly aligned matches that can be automatically exported into any translation memory / bilingual corpus Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  28. Tri-Column Side-by-Side handouts Statistics for Document SE0403 – Agenda de salud • Preliminary classification of changes • Done manually • 65% of the segments were changed • 73% of these changes can be fixed in the MT dictionaries and/or MT program • Automatic classification of changes • We’re still working on it Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  29. Tri-Column Side-by-Side handouts Preliminary Classification of Changes (Document SE0403) can be fixed cannot be fixed 73% 27% • Stylistic Changes (11%) • Deletions (5%) • POS Changes (4%) • Punctuation (3%) • Phrase Order (3%) • Sentence Split (1%) • Lexical Changes (45%) • Articles (13%) • NN Compounds (9%) • Word order (5%) • Verb Tense (1%) Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  30. Editing Macros from WHO • Editing is not only for MT output • Styleguides for authors and translators • Search for terms or expressions on the Internet (Google, WHOLIS) and desktop applications (Oxford SuperLex, dtSearch) from Word • Search for synonyms or related terms recorded in an institutional database • Provide feedback to dictionary coders • Detect strings repeated in the surroundings • Fast word find Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  31. Improving MT Quality • Where can problem be solved? SOURCE TEXT Author, typist, staff DICTIONARY Subject-area specialists, dictionary coders POST-EDITING Translator, subject-area specialist, TM, SMT on the translation ALGORITHM Computational linguists Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  32. MT Enhancements • Daily • Lists of not-found words • Lexical issues pointed out by translators (SBS file) • SBS files examined by computational linguists to improve parsing/synthesis • Occasionally • Incorporation of bilingual glossaries using the Import or Merge utilities • Research into specific linguistic issues, always using bilingual corpus (ex: def. articles, questions) • Automatic terminology extraction from aligned corpus using MultiTrans Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  33. So, Does Post-Editing Work? • It works for us ! • Post-editing (= improving MT output) happens in many different stages of machine translation • input document • translation options that change MT output • automated tools for post-editing • human feedback • automatically extracted feedback • It never ends ! Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  34. What’s Next? • Fine tuning of algorithm that makes full use of tri-column side-by-side • Automatic terminology extraction from bilingual corpus Automated Post-Editing Workshop Cambridge, MA – 12 August 2006

  35. Questions ?

More Related