1 / 103

Machine Translation (Level 2)

Machine Translation (Level 2). Anna Sågvall Hein GSLT Course, September 2004. Translation. ”substitute the text material of one language (SL) by the equivalent text material of another language (TL)” (Catford 1965: 20)

lord
Download Presentation

Machine Translation (Level 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004

  2. Translation ”substitute the text material of one language (SL) by the equivalent text material of another language (TL)” (Catford 1965: 20) ”Translation consists in producing in the target language the closest natural equivalent of the text material of the source language, in the first hand concerning meaning, in the second hand concerning style (Nida 1975: 32) ”Translation is in theory impossible, but in practice fairly possible” Mounin (1967) Catford, J. C. (1965), A Linguistic Theory of Translation, Oxford Press, England. Mounin, G. (1967) Les problèmes théotitiques de la traduction. Paris Nida, E. (1975), A Framework for the Analysis and Evaluation of Theories of Translation, in Brislin, R. W. (ed) (1975), Translation Application and Research, Gardner Press, New York. Anna Sågvall Hein, GSLT, September 2004

  3. Equivalence • form • meaning • style • effect Anna Sågvall Hein, GSLT, September 2004

  4. Formal and dynamic equivalence • Formal equivalence focuses attention on the message itself, in both form and content. It aims to  allow the reader to understand as much of the SL context as possible. • Dynamic equivalence is based on the principle of equivalent effect, i.e. that the relationship between receiver and message should aim at being the same as that between the original receivers and the SL message. (Nida 75) Anna Sågvall Hein, GSLT, September 2004

  5. Can computers translate? • Not a simple yes or no; it depends on the purpose of the translation and the required quality. Anna Sågvall Hein, GSLT, September 2004

  6. Classical problems with MT • unrealistic expectations • bad translations • difficulties in integrating MT in the work flow • the Ericsson case Anna Sågvall Hein, GSLT, September 2004

  7. Feasibility of machine translation • quality in relation to purpose • control of the source language • human machine interaction • re-use of translations • evalution Anna Sågvall Hein, GSLT, September 2004

  8. Quality • publishing quality • editing quality • browsing qualiy Anna Sågvall Hein, GSLT, September 2004

  9. Translation related tasks • translation • browsing • gisting • drafting • message dissemination • cross-language information searches • cross-language interchanges Anna Sågvall Hein, GSLT, September 2004

  10. MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 (http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.pdf) Anna Sågvall Hein, GSLT, September 2004

  11. Control of the source language • spell checked and grammar checked SL • sublanguage • Domain • Text type • controlled language Anna Sågvall Hein, GSLT, September 2004

  12. Spell checking and grammar checking • If there are spelling errors or typos in the SL dictionary search will fail • If there are grammatical errors in the SL grammatical analysis will fail • Where and how should spell and grammar checking be accounted for? Before or in the process? Anna Sågvall Hein, GSLT, September 2004

  13. Controlled language • consistent authoring of source texts • reduction of ambiguity • full linguistic coverage • controlled vocabulary • full lexical coverage • controlled grammar • full grammatical coverage • controlled language checking • e.g. Scania Checker Anna Sågvall Hein, GSLT, September 2004

  14. Ex. of controlled languages • Simplified English • KANT controlled English • Scania Swedish • Scania checker Anna Sågvall Hein, GSLT, September 2004

  15. Human intervention • before • language checking • during • e.g. ambiguity resolution • after • post-editing Anna Sågvall Hein, GSLT, September 2004

  16. Re-use of translations • translation memories • translation dictionaries incl. terminologies • lexicalistic translation • statistical machine translation • example-based translation Anna Sågvall Hein, GSLT, September 2004

  17. Evaluation of MT • human • automatic • using a gold standard • coverage (recall) • quality (precision) • global similarity measures • merge of recall and precision • BLEU, NIST Anna Sågvall Hein, GSLT, September 2004

  18. Why machine translation? • cheaper • faster • more consistent • when it succeeds … Anna Sågvall Hein, GSLT, September 2004

  19. What is MT proper? To be considered as MT, a system should provide • minimally correct morphology • minimal syntactic processing • minimal semantic processing • handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://nl.ijs.si/eamt00/proc/Hutchins.pdf) Anna Sågvall Hein, GSLT, September 2004

  20. Examples of MT products • Systran (http://babelfish.altavista.com/) • Comprendium (based on Metal) • ProMT(http://www.translate.ru/eng) • ESTeam See further: http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-4.pdf , http://www.foreignword.com/Technology/mt/mt.htm Anna Sågvall Hein, GSLT, September 2004

  21. Basic strategies • direct translation • rule-based translation • transfer • interlingua • example-based translation • statistical translation • hybrids Anna Sågvall Hein, GSLT, September 2004

  22. Direct translation • no complete intermediary sentence structure • translation proceeds in a number of steps, each step dedicated to a specific task • the most important component is the bilingual dictionary • typically general language • problems with • ambiguity • inflection • word order and other structural shifts Anna Sågvall Hein, GSLT, September 2004

  23. Simplistic approach • sentence splitting • tokenisation • handling capital letters • dictionary look-up and lexical substitution incl. some heuristics for handling ambiguities • copying unknown words, digits, signs of punctuation etc. • formal editing Anna Sågvall Hein, GSLT, September 2004

  24. Advanced classical approach(Tucker 1987) • Source text dictionary look-up and morphological analysis • Identification of homographs • Identification of compound nouns • Identification of nouns and verb phrases • Processing of idioms Anna Sågvall Hein, GSLT, September 2004

  25. Advanced approach, cont. • processing of prepositions • subject-predicate identification • syntactic ambiguity identification • synthesis and morphological processing of target text • rearrangement of words and phrases in target text Anna Sågvall Hein, GSLT, September 2004

  26. Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a complete sentence structure? Anna Sågvall Hein, GSLT, September 2004

  27. Assignment 1: manual direct translation Sv. Ytterst handlar kampen för sysselsättning om att hålla samman Sverige. En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. (from Statement of Government Policy 1996) • Define an algorithm and a dictionary (based on Norstedts) for simplistic translation of the example. • Present the model and the result. Anna Sågvall Hein, GSLT, September 2004

  28. Assignment 1, cont. • Improve the result stepwise in accordance with the advanced direct translation strategy • Specify each step carefully and demonstrate its effect on the translation. • Evaluate and discuss the final result. • Translate the ex. using Systran (http://kwic.systran.fr/systran/svdemo) and discuss the differences in an evaluative way • Report the assignment and up-load on the web (041001) Anna Sågvall Hein, GSLT, September 2004

  29. Current trends in direct translation • re-use of translations • translation memories of sentences and sub-sentence units such as words, phrases and larger units • lexicalistic translation • example-based translation • statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can they be handled? Anna Sågvall Hein, GSLT, September 2004

  30. Systran • System Translation • developed in the US by Peter Toma • first version 1969 (Ru-En) • EC bought the rights of Systran in 1976 • currently 18 language pairs • demo version sv-en in 2003 (http://kwic.systran.fr/systran/svdemo) • http://babelfish.altavista.com/ Anna Sågvall Hein, GSLT, September 2004

  31. Systran, cont. • more than 1,600,000 dictionary units • 20 domain dictionaries • daily use by EC translators, administrators of the European institutions • originally a direct translation strategy • see H&S • today more of a transfer-based strategy Anna Sågvall Hein, GSLT, September 2004

  32. Ex. 1: fairly good translation /Systran sv-en • "Enskilda företagare som inte bildat bolag klassificeras hit."  • "Individual entrepreneurs that have not formed companies are classified  here.” • Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats. Anna Sågvall Hein, GSLT, September 2004

  33. Ex. 2: word order problem/ Systran sv-en • "När byarna kontaktades hade de inte ens utsatts för influensa."  • "When the villages were contacted had they not even been exposed to flu.” • Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd. Anna Sågvall Hein, GSLT, September 2004

  34. Ex. 3: ambiguity problem/ Systran sv-en • "Vad kan vi lära av Arrawetestammen?"  • "What can we faith of the Arawete?” • Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb. Anna Sågvall Hein, GSLT, September 2004

  35. Ex. 4: ambiguity problem/ Systran sv-en • ”Extrapoleringen går till så här. "  • ”The extrapolation goes to so here.” • Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord. Anna Sågvall Hein, GSLT, September 2004

  36. Systran Linguistic Resources • Dictionaries • POS Definitions • Inflection Tables • Decomposition Tables • Segmentation Dictionaries • Disambiguation Rules • Analysis Rules Anna Sågvall Hein, GSLT, September 2004

  37. Systran Processing Steps • Analysis • Lookup • Compound Decomposition • Disambiguation • Syntactic Analysis • Compound Expansion • Sentence Transfer • Initial Target Structure • Lookup • Default Transfer of Attributes • Structure Transformation Anna Sågvall Hein, GSLT, September 2004

  38. Systran Processing Steps (cont) • Sentence Synthesis • Structure Transformation • Inflection lookup • Surface Transformation Anna Sågvall Hein, GSLT, September 2004

  39. Motivations for transfer-based translation • lexical ambiguity • structural differences See further Ingo 91 Anna Sågvall Hein, GSLT, September 2004

  40. Example 1 Sv. Fyll på olja i växellådan.  En. Fill gearbox with oil. (from the Scania corpus) • fyll på  fill • obj  adv • adv  obj Anna Sågvall Hein, GSLT, September 2004

  41. Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil.  En. The oil filter retainer has an overflow valve. (from the Scania corpus) • sitter  has • adv  subj • subj  obj Anna Sågvall Hein, GSLT, September 2004

  42. Transfer-based translation • intermediary sentence structure • basic processes • analysis • transfer • generation (synthesis) • language modules • dictionary and grammar of SL • transfer dictionary and transfer rules • dictionary and grammar of TL Anna Sågvall Hein, GSLT, September 2004

  43. Direct translation SL TL Metal Transfer Multra Interlingua Anna Sågvall Hein, GSLT, September 2004

  44. Levels of intermediary structure • cf. J&M, Chapter 21 • word order Anna Sågvall Hein, GSLT, September 2004

  45. Metal • See H&S Anna Sågvall Hein, GSLT, September 2004

  46. MULTRA Multilingual Support for Translation and Writing • translation engine • transfer-based • shake-and-bake • modular • unification-based • preference machinery • trace-able Anna Sågvall Hein, GSLT, September 2004

  47. Anna Sågvall Hein, GSLT, September 2004

  48. Analysis • chart parser (Lisp  C) • procedural formalism • unification and other kinds of operations • sentence structure • feature structure • grammatical relations • surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89) Anna Sågvall Hein, GSLT, September 2004

  49. Transfer • unification-based • declarative formalism • Multra transfer formalism (Beskow 93) • lexical and structural rules • rules are partially ordered • a more specific rule takes precedence over a less specific one • specificity in terms of number of transfer equations • all applicable rules are applied • written in prolog Anna Sågvall Hein, GSLT, September 2004

  50. Generation • syntactic generation • Multra syntactic generation formalism (Beskow 97a) • PATR-like style • unification • concatenation • typed features • morphological generation (Beskow 97b) • lexical insertion rules • morphological realisation and phonological finish in prolog • written in prolog Anna Sågvall Hein, GSLT, September 2004

More Related