1 / 57

Web-Based Machine Translation

Web-Based Machine Translation. Andy Way School of Comput ing Email : away@computing.dcu.ie URL : www.computing.dcu.ie/~away Room: L245 Phone: (700)5644. Plan of Attack (1). What is MT? Why do we do it? How much is it used? How much more could it be used?

pancho
Download Presentation

Web-Based Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web-Based Machine Translation Andy Way School of Computing Email: away@computing.dcu.ie URL: www.computing.dcu.ie/~away Room: L245 Phone: (700)5644

  2. Plan of Attack (1) • What is MT? • Why do we do it? How much is it used? How much more could it be used? • Is it any good? What exactly is it good for? What is it not good for? • What MT methods are there? • Do on-line MT systems translate word-for-word? How might we be able to tell?

  3. Plan of Attack (2) • Do pairs of on-line MT systems work the same in both directions? • How can we help these MT systems help us? • The Future (?!) • Further Reading/More Information

  4. What is MT? MT = FAHQMT MAHT (on-line dictionaries, termbanks, TM etc …) CAT HAMT (resolving ambiguity etc …)

  5. Why do we do MT? • To communicate in other languages than the ones we know … • (If we’re a company) To increase/maintain market share • To speed up the translation process • etc etc ...

  6. How much is it used? • In 2000, MT specialist Scott Bennett said “Altavista's BabelFish... initiated in late 1997, is now used a million times per day”. • In 2001, Softissimo announced that the Internet translation request volume processed by its Reverso translation engine (www.reverso.net) has now reached several million translation requests (of Web pages, e-mail, short texts and results of search engine requests) per month on its mail translation portal and the portals of its Internet partners.“ • V.d. Meer (2003) "Every day, portals like Altavista and Google process nearly 10 million requests for automatic translation."

  7. How much more could it be used? • Volume of text required to be translated currentlyexceeds translators’ capacity (demand outstrips supply). Thisimbalance will only get worse, cf. accession of new Member states inEU. • NB, also Official Languages Act 2003 Solution: automation (the onlysolution).

  8. How much more could it be used? • translation and localisation industry have focussed onproduct documentation whichrepresents probably less than 20%of all text-based information repositories that need to belocalised • time: five times the volume of text needs to be translatedin practically no time. Corporate decision makers will have to begin supporting multilingualcommunication initiatives and strategies.

  9. How much more could it be used? • GIL market growing from $4.2 billion in 2001 to $8.9billion in 2006, an annual growth rate of 16.3%. Localisation and translationservices form by far the largest part of this market with 69.8% of thetotal, i.e. $2.9 billion in 2001 and $5.8 billion in 2006, an annualgrowth rate of 14.6%. • W.r.t. crosslingual applications, expected to grow from lessthan 1% of the total market in 2001 ($42 million) to $193 million in2006, 35% annual growth.

  10. Is MT any good? (1) Depends … what you want to use it for and how you use it!! Cost Input MT Output

  11. Is MT any good? (2) • No pre-editing  Lots of post-editing! • Lots of pre-editing  No(t much) post-editing! GARBAGE IN, GARBAGE OUT!!!

  12. Is MT any good? (3) • Sometimes no pre-editing is required: • for gisting; • for company-internal circulation; • etc etc … • What it’s not good for is literary translation, i.e. won’t take translators’ jobs - will free them up for new (more interesting) tasks and create new niche markets

  13. MT Developers • So MT is of use, and will become used much more than it is currently, so … • … we need people out there who can improve current systems and develop new ones.  let’s look at how people currently “design” MT systems …

  14. MT Methods MT Rule-Based MT Data-Driven MT Transfer Interlingua EBMT SMT

  15. The Vauquois Pyramid for MT Interlingua Analysis Transfer Generation $_source Direct $_target

  16. Examples of MT methods: Transfer English SVO, Irish VSO, Japanese SOV. So translation between them is complicated by facts about word order. But at a ‘deeper’ level, the languages are more similar ...

  17. Transfer (cont’d) e.g. John saw MaryChonaic Seán Máire S S HEAD SUBJOBJGOV SUBJ OBJ see John Mary feic Seán Máire

  18. Examples of MT methods: Transfer e.g. John likes Mary  Marie plaît à Jean (SUBJ) (OBJ) (SUBJ) (IOBJ) Rule: like(A1,A2)  plaire(A2’,A1’). i.e. arguments are switched.

  19. Examples of MT methods: Interlingua John likes Mary  Marie plaît à Jean lex=like/plaire sem=Experiencer sem=Patient lex=John/Jean lex=Mary/Marie

  20. Examples of MT methods: EBMT Data-driven, compiles probabilities for translations … Needs: • bilingual aligned corpora; • find best match(es) of $_source; • establish translational equivalents; • recombine to generate $_target.

  21. EBMT - translation chunks • Sentence aligned: The man swims  L’homme nage. The woman laughs  La femme rit. • Sub-sententially aligned: the man  L’homme, swims  nage, the  l’, man  homme, the  la, woman  femme, laughs  rit ...

  22. EBMT: deriving translations Let’s now translate The man laughs … Best matches: • the man  L’homme • laughs  rit Combined together, we get: L’homme rit Great, can you see any problems?! We can fix these by looking on the Web …

  23. Web Validation of Translations Inputstring: the personal computers Chunks retrieved: • personal computers  ordinateurs personnels • the  le /la/ l’/ les Via Altavista, we get: • Les ordinateurs personnels: 980 hits • L’ ordinateurs personnels: 0 hits • La ordinateurs personnels: 0 hits • Le ordinateurs personnels: 0 hits

  24. Examples of MT methods: SMT Needs: • bilingual aligned corpora; • statistical models of languages and translation. Works by assuming that French is like English in a noisy channel, i.e. in code! cf. Speech Processing models!

  25. Examples of MT methods: Hybridity Rule-based Methods: • generate good translations (if it works!); • encode rule-based phenomena: sent(Num) nounphrase(Num), verbphrase(Num).

  26. Examples of MT methods: Hybridity Statistical Methods: • are robust; • can get a lot right automatically; • don’t need specialised linguistic knowledge of source, target, and how they relate to one another. So let’s choose the best bits from each ...

  27. Do MT systems translate word-for-word? translate([Head1| Tail1], [Head2|Tail2):- biling_lex (Head1,Head2), translate (Tail1, Tail2). biling_lex(john,jean). biling_lex(swims,nage). etc etc …. Well, the MT systems we’re using are a black box (as opposed to a glass box), so we can’t look at the rules to tell definitively …

  28. Translating word-for-word How can we tell then? Compare the input and the output for a suite of test sentences and try and work out what’s going on …

  29. Translating word-for-word If on-line MT systems did translate word-for-word, they would: • pick the most likely translation of each word each time (i.e. no translational variation ever); • we could build up the translation of the sentence compositionally. • Let’s see if this is what happens by looking at some real systems ...

  30. Translating word-for-word Let’s translate We have just finished reading this book French Word-for word we get (from Babelfish): we:nous, have:ayez, just:juste, finished:fini, reading:lecture,this:ceci,book:livre Model 0 Translation: Nous ayez juste fini lecture ceci livre - hopeless!

  31. Translating word-for-word Let’s give the MT system larger chunks: we have:nous avons, just finished reading: lecture finie just, this book:ce livre have just finished reading: ont juste fini la lecture have just … this book: ont juste … ce livre

  32. Translating word-for-word Typing in the whole sentence, we get: nous avons juste fini de lire ce livre, not bad! Capitalizing the ‘we’ and adding a fullstop makes no difference to the translation here. Oracle translation: nous venons de finir de lire ce livre, so you can see Babelfish hasn’t done too badly here ...

  33. Translating word-for-word Let’s try another sentence, The thief was kicking the policeman Word-for-word we get (from Reverso): the:le, thief:Voleur, was:Était, kicking:coup de pied, policeman:policier Model 0 Translation: le Voleur Était coup de pied le Policier, not very good!

  34. Translating word-for-word Building the translation up compositionally: the thief:Le voleur, was kicking:Donnait un coup de pied, the policeman:Le policier Final translation: Le voleur donnait un coup de pied le policier, pretty good!

  35. ENFR = FR EN?! • That is, do both components use the same rules and dictionaries? • Are the translation components reversible? • Are the structural and lexical rules bidirectional? Only one way to find out … let’s see!

  36. ENFR = FR EN?! For our 2 strings, we get: Babelfish: Nous venons de finir de lire ce livre Reverso: Nous venons de finir de lire ce livre --------------------------------------------------------------- Reverso: Le voleur donnait un coup de pied au policier Babelfish: Le voleur donnait un coup de pied le policier

  37. ENFR = FR EN?! Let’s see the pairwise translations. Babelfish: We have just finished reading this book Nous avons juste fini de lire ce livre Nous venons de finir de lire ce livre  We have just finished reading this book Aha!

  38. ENFR = FR EN?! Babelfish, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied le policier Le voleur donnait un coup de pied au policier  The robber gave a kick to the police officer Aha!

  39. ENFR = FR EN?! Reverso, 1st sentence pair: We have just finished reading this book Nous venons de finir de lire ce livre Nous venons de finir de lire ce livre  We have just stopped reading this book Aha!

  40. ENFR = FR EN?! Reverso, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied au policier Le voleur donnait un coup de pied au policier The thief kicked the policeman Aha!

More Related