1 / 34

IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel

IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel. Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO, Head of Research, EPO. EPO Research The case of Machine Translation. Our Vision & Mission MT versus Patents

Download Presentation

IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IRF Symposium 2007 Vienna, AustriaNovember 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO, Head of Research, EPO

  2. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  3. Our Vision & Mission (1/3) Our Vision: Turning Technology into IP Business R&D center as a source of Efficiency: • Efficient Reading • Accurate Searching • Fast Granting

  4. The EPO Research Department Our Vision & Mission (2/3) • Merged in March 2007 in a new Information Management structure; became "horizontal" • Located in The Hague, Netherlands • Large portfolio of academic contacts (Labs, Universities) • Entry point for testing and evaluating industrial solutions since 1990 • Partnerships with International institutions (WIPO, EC) • Strong background in mathematics, algorithms, and data structures • Network of active users and testers inside the EPO

  5. Our mission & Mission (3/3) Help addressing Challenges • Coordinating research initiatives across departments • Technology watch and green-field research • Performing quantitative analysis • Identifying and communicating business opportunities • Providing users with sensible options - courses of action • Ensuring smooth transition from research to development • Communicate practices and experiences • Report and advise over technical solutions to decision-makers

  6. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  7. MT versus PatentsA Strategic Domain foreseen 5 years ago • Needs less investment than expected • Can re-use existing data and knowledge • Mature enough to improve efficiency • Satisfies patent professionals • Offers a key technology for future language challenges Lessons learned from the European Machine Translation Programme

  8. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  9. Chinese language case (1/) • Issue 1: Sentence + Word Segmentation • Issue 2: Text Reordering • Issue 3: Alignment + System training • Issue 4: Translation with proper terms • Issue 5: Regeneration

  10. Example: The Re-ordering Issue • [Brown & al. 93] set the foundations of the SMT approach (use of Bayes' theorem) • [Knight 99] approach (Model 3) to word re-ordering does bring in some improvement in the target sentence, but it is rather oriented towards French or English structures. • [Chiang 05] proposes to re-order sentences in Chinese by using hierarchical phrase pairs, which are phrases that contain subphrases. • Produce better results than the traditional phrase-based approach. Many Years of research on the subject:

  11. The Re-ordering Issue • Re-ordering: the phrase-base approach "Australia is diplomatic relations with North Korea is one of the few countries"

  12. Re-ordering : Hierarchical-phrase approach (1/2) Step 1 Step 2

  13. Re-ordering : Hierarchical-phrase approach (2/2) Step 3 "Australia is one of the few countries that have diplomatic relations with North Korea".

  14. Solution?A semi-automatic approach • Computer-Assisted Translation (CAT) • Using high-quality manually-aligned texts based on international organizations bi-text repositories and translation memories. • Using a bilingual ontology to align words or phrases which are not present in the training corpuses. • There are available ontologies of patent vocabulary in English; • a manual Chinese translation of the central concepts could be gradually added by IPC category • Use syntactic rules to improve lexical choices and collocation processing. I.e Univ. of Geneva (Chomsky syntactic parser for English) • process to guarantee a well-formed final English sentence

  15. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  16. Comparison of MT systemAn empirical approach (1/3) • Rule based system (Systran) • Statistical system (Language Weaver) • Hybrid system (CCID prototype) 3 systems on the test bench 1 Evaluation grid • Scores of 1-4 Usability & Readability criteria

  17. Comparison of MT systems (2/3) Poor (1) Medium (2) Good (3) Excellent (4) Rule-based MT Hybrid MT ? ??? Statistical MT

  18. Comparison of MT systemAn empirical approach (3/3) • No MT system performs properly, CAT (Computer Aided Translation) seems necessary • The hybrid system seems more promising • Post-editors needed for checking outputs? No statistical significance is to be reported - further investigations needed!

  19. Readability Tests on Human Translations: Flesch et al. • Designed to indicate how difficult a reading passage is to understand. • There are two tests: • Flesch Reading Ease • Flesch–Kincaid Grade Level. This test has become a standard. Bundled with popular word processing programs

  20. Readability Tests on Human Translations: Flesch et al. • Flesch Reading Ease score : • 206.835 – (1.015 x ASL) – (84.6 x ASW) • Rates text on a 100-point scale; the higher the score, the easier it is to understand the document (60 to 70 for standard docs). • Flesch-Kincaid Grade Level score: • (.39 x ASL) + (11.8 x ASW) – 15.59 • Rates text on a U.S. school grade level. A score of 8.0 means that an eighth grader can understand the document (7.0 to 8.0 for standard docs) Where: ASL = average sentence length (# words / # of sentences) ASW = average number of syllables per word (# syllables / # of words)

  21. Human Translation assessmentExample (1/2) • CN1926077The Making and Using Methods of Plant/Soil Activated Liquid Abstract In the mineral composition ion water of concentrated sulfuric acid, which add the vegetal leavening confected by enzyme and microbe used to produce enzyme and the muscovado made by sugarcane together, under the aerobic condition, the selective preference is, do the commensalisms cultivation at about 25 Centigrade. After decomposing the sugar, before rot and ferment, the selective preference is, spreading on the leaf surface or pouring in the soil during the alcohol fermenting stage. What's Important? Figures or Comments? Flesch-Kincaid Reading Ease score: 13/100 Flesch-Kincaid Grade level: 17. Score: 7/10 Comments: The Abstract and parts of the claims are convoluted/badly structured in parts and some spelling mistakes.

  22. Human Translation assessmentExample (2/2) • CN2354381 • Claims • 1. A time switch of gas appliances, composing of mechanical gear timer and fuel gas valve, wherein it also comprises round upper cover board subassembly and lower cover board subassembly, a valve switch knob (4) fixed on the upper end of the valve switch spigot shaft (7) is installed on the front of the upper cover board, the valve switch spigot shaft (7) penetrates through the upper cover board (6) and the lower cover board (29), a timer hollow shaft (8) is installed out of the valve switch spigot shaft (7), the timer hollow shaft (8) penetrates through uthe pper cover board (6), a round time knob (5) is installed between the upper end valve switch knob of the timer hollow shaft and the upper cover board (6), a time indicating dial (3) interlocking with the timer hollow shaft (8) is installed between the round time knob (5) and the upper cover board (6); a mechanical gear timer is installed on the reverse side of the upper cover board (6), an unlocking cam(9) is installed out of the timer hollow shaft (8) in the central part; Flesch-Kincaid Grade level:49.Flesch-Kincaid Reading Ease score:-45. Score: 9/10 Comments: Long convoluted sentences. Diagrammatical explanations. Minor grammatical and typo errors.

  23. Human vs machine: unfair competition? Original text 一种利用相位锁定一捷变频率调制输出信号到梳式发生器形成输出的任何选定信道的装置和方法。跟踪输入信号的相位误差,该输入信号被调制成载波输出频率,和该调制过的输出频率,利用减去该输入信号的方法锁定到梳式发生器输出,并消除该相位误差。 One kind to combs the type generator using a phase lock agility frequency modulation output signal to form the output any to designate channel's installment and the method. The track input signal's phase error, this input signal is modulated the carrier output frequency, with should modulate the output frequency, the use subtracts this input signal the method to lock combs the type generator output, and eliminates this phase error Human translation An apparatus and method is disclosed which phase locks a frequency-agile modulated output signal to any selected channel of a comb generated output. The phase error of an input signal is tracked, the input signal is modulated up to a carrier output frequency, and the modulated output frequency is locked to the comb generator output by subtracting the input signal and negating the phase error. Is such an MT useful? Systran

  24. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  25. Our Accomplishments Chinese patents showing Priority documents • 105000 CN documents with US priorities • 15000 CN documents with EP priorities • 15000 CN documents with GB priorities • 15000 CN documents with EP priorities • 400 CN documents with WO priorities (June 2006) A sufficient source for starting-up an alignment? # of aligned sentences

  26. Manual Data cleaningDirty texts generate XML failures • CN86103346 • Spherical particles of vinyl resins having high bulk density can be prepared by the suspension polymerization process by using as a dispersant an alkyl hydroxy cellulose having a viscosity of from about 1000 to about 100,000 cps. A suitable dispersant is a hydroxypropyl methyl cellulose polymer having the formula: <IMAGE> +TR <IMAGE> where n is from about 300 to about 1500. • Use of XMLSpy Professional to check text

  27. Methodology of World Alignment • [OCH93]

  28. First Example of alignment

  29. Second example of alignment

  30. TMX Formatting of aligned texts • <?xml version="1.0" ?> • <!DOCTYPE tmx SYSTEM "tmx14.dtd"> • <tmx version="1.4"> • <header creationtoolversion="1.0.0" datatype="plaintext" segtype="sentence" adminlang="EN-US" srclang="EN" o-tmf="txt" creationtool="MetaReadAlign" > • </header> <body> • <tu> • <tuv xml:lang="EN"><seg> In a preferred embodiment, a low-band isolator network, coupled to the antenna element, provides signal isolation between high-band and low-band signal paths during high-band operation.</seg></tuv> • <tuv xml:lang="ZH"><seg>NOT DISPLAYABLE </seg></tuv> • </tu> Provides compatibility to Industry standards

  31. QUALITY CONTROL PANEL BEFORE ALIGNMENT Allows browsing Evaluation record CN85108669 Radio buttons, multiple entries possible (e.g. partial translation, 100% match), default value "100% match" Entries saved on server Evaluated/not evaluated • 100% match • >70% match • <50% match • partial translation • bad translation • total mismatch Record Status Record Evaluated, Proceed with next Save status for next time Saves the selected buttons for this record and jump to next record Save Status Reset Transmit Evaluation Reset the complete evaluation process (everything gets resetted and lost) Welcome EvaluatorX

  32. EPO ResearchThe case of Machine Translation Our Vision & Mission MT versus Patents The Chinese language case Our Experiments Our Accomplishments Perspectives

  33. Acknowledgments EPO Staff experts in Research & Development Jan Mannekens Betty Yang CrossLanguage Metaread University of Geneva Questions? Bdiallo@epo.org

  34. References • Brown & al. 93 Brown, Della Pietra, Mercer: The Mathematics of Statistical Machine Translation: Parameter Estimation, ACL vol.19 no.2, 1993 • Kevin Knight: A Statistical MT Tutorial Workbook, April 1999 • David Chiang: A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, 2005

More Related