570 likes | 696 Views
Web-Based Machine Translation. Andy Way School of Comput ing Email : away@computing.dcu.ie URL : www.computing.dcu.ie/~away Room: L245 Phone: (700)5644. Plan of Attack (1). What is MT? Why do we do it? How much is it used? How much more could it be used?
E N D
Web-Based Machine Translation Andy Way School of Computing Email: away@computing.dcu.ie URL: www.computing.dcu.ie/~away Room: L245 Phone: (700)5644
Plan of Attack (1) • What is MT? • Why do we do it? How much is it used? How much more could it be used? • Is it any good? What exactly is it good for? What is it not good for? • What MT methods are there? • Do on-line MT systems translate word-for-word? How might we be able to tell?
Plan of Attack (2) • Do pairs of on-line MT systems work the same in both directions? • How can we help these MT systems help us? • The Future (?!) • Further Reading/More Information
What is MT? MT = FAHQMT MAHT (on-line dictionaries, termbanks, TM etc …) CAT HAMT (resolving ambiguity etc …)
Why do we do MT? • To communicate in other languages than the ones we know … • (If we’re a company) To increase/maintain market share • To speed up the translation process • etc etc ...
How much is it used? • In 2000, MT specialist Scott Bennett said “Altavista's BabelFish... initiated in late 1997, is now used a million times per day”. • In 2001, Softissimo announced that the Internet translation request volume processed by its Reverso translation engine (www.reverso.net) has now reached several million translation requests (of Web pages, e-mail, short texts and results of search engine requests) per month on its mail translation portal and the portals of its Internet partners.“ • V.d. Meer (2003) "Every day, portals like Altavista and Google process nearly 10 million requests for automatic translation."
How much more could it be used? • Volume of text required to be translated currentlyexceeds translators’ capacity (demand outstrips supply). Thisimbalance will only get worse, cf. accession of new Member states inEU. • NB, also Official Languages Act 2003 Solution: automation (the onlysolution).
How much more could it be used? • translation and localisation industry have focussed onproduct documentation whichrepresents probably less than 20%of all text-based information repositories that need to belocalised • time: five times the volume of text needs to be translatedin practically no time. Corporate decision makers will have to begin supporting multilingualcommunication initiatives and strategies.
How much more could it be used? • GIL market growing from $4.2 billion in 2001 to $8.9billion in 2006, an annual growth rate of 16.3%. Localisation and translationservices form by far the largest part of this market with 69.8% of thetotal, i.e. $2.9 billion in 2001 and $5.8 billion in 2006, an annualgrowth rate of 14.6%. • W.r.t. crosslingual applications, expected to grow from lessthan 1% of the total market in 2001 ($42 million) to $193 million in2006, 35% annual growth.
Is MT any good? (1) Depends … what you want to use it for and how you use it!! Cost Input MT Output
Is MT any good? (2) • No pre-editing Lots of post-editing! • Lots of pre-editing No(t much) post-editing! GARBAGE IN, GARBAGE OUT!!!
Is MT any good? (3) • Sometimes no pre-editing is required: • for gisting; • for company-internal circulation; • etc etc … • What it’s not good for is literary translation, i.e. won’t take translators’ jobs - will free them up for new (more interesting) tasks and create new niche markets
MT Developers • So MT is of use, and will become used much more than it is currently, so … • … we need people out there who can improve current systems and develop new ones. let’s look at how people currently “design” MT systems …
MT Methods MT Rule-Based MT Data-Driven MT Transfer Interlingua EBMT SMT
The Vauquois Pyramid for MT Interlingua Analysis Transfer Generation $_source Direct $_target
Examples of MT methods: Transfer English SVO, Irish VSO, Japanese SOV. So translation between them is complicated by facts about word order. But at a ‘deeper’ level, the languages are more similar ...
Transfer (cont’d) e.g. John saw MaryChonaic Seán Máire S S HEAD SUBJOBJGOV SUBJ OBJ see John Mary feic Seán Máire
Examples of MT methods: Transfer e.g. John likes Mary Marie plaît à Jean (SUBJ) (OBJ) (SUBJ) (IOBJ) Rule: like(A1,A2) plaire(A2’,A1’). i.e. arguments are switched.
Examples of MT methods: Interlingua John likes Mary Marie plaît à Jean lex=like/plaire sem=Experiencer sem=Patient lex=John/Jean lex=Mary/Marie
Examples of MT methods: EBMT Data-driven, compiles probabilities for translations … Needs: • bilingual aligned corpora; • find best match(es) of $_source; • establish translational equivalents; • recombine to generate $_target.
EBMT - translation chunks • Sentence aligned: The man swims L’homme nage. The woman laughs La femme rit. • Sub-sententially aligned: the man L’homme, swims nage, the l’, man homme, the la, woman femme, laughs rit ...
EBMT: deriving translations Let’s now translate The man laughs … Best matches: • the man L’homme • laughs rit Combined together, we get: L’homme rit Great, can you see any problems?! We can fix these by looking on the Web …
Web Validation of Translations Inputstring: the personal computers Chunks retrieved: • personal computers ordinateurs personnels • the le /la/ l’/ les Via Altavista, we get: • Les ordinateurs personnels: 980 hits • L’ ordinateurs personnels: 0 hits • La ordinateurs personnels: 0 hits • Le ordinateurs personnels: 0 hits
Examples of MT methods: SMT Needs: • bilingual aligned corpora; • statistical models of languages and translation. Works by assuming that French is like English in a noisy channel, i.e. in code! cf. Speech Processing models!
Examples of MT methods: Hybridity Rule-based Methods: • generate good translations (if it works!); • encode rule-based phenomena: sent(Num) nounphrase(Num), verbphrase(Num).
Examples of MT methods: Hybridity Statistical Methods: • are robust; • can get a lot right automatically; • don’t need specialised linguistic knowledge of source, target, and how they relate to one another. So let’s choose the best bits from each ...
Do MT systems translate word-for-word? translate([Head1| Tail1], [Head2|Tail2):- biling_lex (Head1,Head2), translate (Tail1, Tail2). biling_lex(john,jean). biling_lex(swims,nage). etc etc …. Well, the MT systems we’re using are a black box (as opposed to a glass box), so we can’t look at the rules to tell definitively …
Translating word-for-word How can we tell then? Compare the input and the output for a suite of test sentences and try and work out what’s going on …
Translating word-for-word If on-line MT systems did translate word-for-word, they would: • pick the most likely translation of each word each time (i.e. no translational variation ever); • we could build up the translation of the sentence compositionally. • Let’s see if this is what happens by looking at some real systems ...
Translating word-for-word Let’s translate We have just finished reading this book French Word-for word we get (from Babelfish): we:nous, have:ayez, just:juste, finished:fini, reading:lecture,this:ceci,book:livre Model 0 Translation: Nous ayez juste fini lecture ceci livre - hopeless!
Translating word-for-word Let’s give the MT system larger chunks: we have:nous avons, just finished reading: lecture finie just, this book:ce livre have just finished reading: ont juste fini la lecture have just … this book: ont juste … ce livre
Translating word-for-word Typing in the whole sentence, we get: nous avons juste fini de lire ce livre, not bad! Capitalizing the ‘we’ and adding a fullstop makes no difference to the translation here. Oracle translation: nous venons de finir de lire ce livre, so you can see Babelfish hasn’t done too badly here ...
Translating word-for-word Let’s try another sentence, The thief was kicking the policeman Word-for-word we get (from Reverso): the:le, thief:Voleur, was:Était, kicking:coup de pied, policeman:policier Model 0 Translation: le Voleur Était coup de pied le Policier, not very good!
Translating word-for-word Building the translation up compositionally: the thief:Le voleur, was kicking:Donnait un coup de pied, the policeman:Le policier Final translation: Le voleur donnait un coup de pied le policier, pretty good!
ENFR = FR EN?! • That is, do both components use the same rules and dictionaries? • Are the translation components reversible? • Are the structural and lexical rules bidirectional? Only one way to find out … let’s see!
ENFR = FR EN?! For our 2 strings, we get: Babelfish: Nous venons de finir de lire ce livre Reverso: Nous venons de finir de lire ce livre --------------------------------------------------------------- Reverso: Le voleur donnait un coup de pied au policier Babelfish: Le voleur donnait un coup de pied le policier
ENFR = FR EN?! Let’s see the pairwise translations. Babelfish: We have just finished reading this book Nous avons juste fini de lire ce livre Nous venons de finir de lire ce livre We have just finished reading this book Aha!
ENFR = FR EN?! Babelfish, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied le policier Le voleur donnait un coup de pied au policier The robber gave a kick to the police officer Aha!
ENFR = FR EN?! Reverso, 1st sentence pair: We have just finished reading this book Nous venons de finir de lire ce livre Nous venons de finir de lire ce livre We have just stopped reading this book Aha!
ENFR = FR EN?! Reverso, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied au policier Le voleur donnait un coup de pied au policier The thief kicked the policeman Aha!