170 likes | 402 Views
CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation. Machine Translation. Machine Translation System. Target Language. Source Language. Understanding. 31/03/06.
E N D
CS460/IT632Natural Language Processing/Language Technology for the WebGuest Lecture (31/03/06)Prof. Niladri ChatterjeeIIT DelhiGuest Lecture on Machine Translation
Machine Translation Machine Translation System TargetLanguage Source Language Understanding 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 2
Problems in Machine Translation (MT) • I take rice with dal. I take rice with my friend. • Same syntax but different semantics • Polysemy • The computer prints data. It is fast. The computer prints data. It is numeric. • Different meaning for “it” in both cases. 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 3
Problem with Multilingual MT systems Suppose we have a multilingual MT system with N languages • O(N2) translators required • Interlingua: Intermediate language, which captures the semantics. • The translation is: SL -> IL -> TL • The number of MT translators required is O(2N) 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 4
Other Approaches for MT • Word Based Approach • Rule Based Approach • Statistical Approach • Generation-Heavy Approach • Example Based Approach 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 5
Example Based Approach • Knowledge base of translation examples. • Given input, apply similarity metric to pick up a close match. • Adapt the retrieved translation to suit the current requirement. 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 6
Example for English to Bengali translation using Example Based Approach -Ram goes to school Ram bidyalaya jaay -Ram goes home Ram bari jaay -Sita goes to school ? (guess to get a feel) 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 7
Some considerations • Similarity measure • What are the adaptation strategies? 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 8
Typical Techniques used • Word Deletion • Ram eats rice with spoon. • Ram chamach diye bhaat khaaye • Ram eats rice • ? (guess it, given that from dictionary you have Bengali word for spoon is “chamach”) • Word Addition • Word Replacement • Word Swapping 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 9
A simple assumption “Sentences of similar structure in the source language have a similar structure in the target language.” 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 10
Problems with the assumption.. • Translation Divergence • It is running • Wah bhaag raha hai • It is raining • Baarish ho rahi hai • Structural Divergence • Ram will attend the meeting • Ram sabha mein jayegaa • Ram will go to school • Ram school jayegaa 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 11
Problems.. (contd.) • Promotional Divergence • The fan is on [adverb] • Pankha chal [verb] raha hai • The fan is good [adjective] • Pankha achcha [adjective] hai • Conflational Divergence (conflate: to make bigger) • To get same meaning we have to add more words than in SL. • Ram killed Ravana • Ram ne Ravan ko mara => No divergence • Ram stabbed Ravana • Ram ne Ravan ko chaku se mara => divergence 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 12
Problems.. (contd.) • Categorical Divergence • She is hungry • Use bhookh lagi hai • She is beautiful • Wah sundar hai • In approx. 12% of sentences divergence occur. 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 13
Solution to Divergence • Classify as standard or divergence translation • Measure the similarity of a sentence in two databases. • Example • She is in panic • She is in trouble • She is in pain • Present all the solutions to the user. 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 14
Adaptation Problem • There is more morphological variation in Hindi than in English 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 15
Divergence Identification • 7 types of divergence between Hindi and English are defined • Based on 7K-8K sentences 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 16
Word Sense Disambiguation • I saw the man with a binocular • Keep the ambiguity even in the translation 31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay 17