1 / 35

Cross-Lingual Image Search on the Web

Cross-Lingual Image Search on the Web. Kobi Reiter, Stephen Soderland, Oren Etzioni Turing Center Computer Science and Engineering University of Washington. Limitations to Monolingual Image Search. Limited Resource Languages Slovenian query ‘grenivka’ (grapefruit).

Olivia
Download Presentation

Cross-Lingual Image Search on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Lingual Image Searchon the Web Kobi Reiter, Stephen Soderland, Oren Etzioni Turing Center Computer Science and Engineering University of Washington

  2. Limitations to Monolingual Image Search • Limited Resource LanguagesSlovenian query ‘grenivka’ (grapefruit)

  3. Slovenian query: ‘grevnika’(grapefruit) Only 24 results - 9 show grapefruit

  4. Limitations to Monolingual Image Search • Limited Resource Languages Slovenian query ‘grenivka’ (grapefruit) • Cross-Cultural ImagesSearch for images of ‘food’ in different cultures

  5. English query: ‘food’ Finds hamburgers, hot dogs, etc.

  6. Zulu query: ‘ukudla’ (food) Toto, I don’t think we’re in Kansas anymore.

  7. Limitations to Monolingual Image Search • Limited Resource Languages Slovenian query ‘grenivka’ (grapefruit) • Cross-Cultural Images Search for images of ‘food’ in different cultures • Cross-lingual homonymsSearch for images with Hungarian word for tooth

  8. Hungarian query: ‘fog’ (tooth) Can’t see any teeth in all this fog.

  9. Limitations to Monolingual Image Search • Limited Resource Languages Slovenian query ‘grenivka’ (grapefruit) • Cross-Cultural Images Search for images of ‘food’ in different cultures • Cross-lingual homonyms Search for images with Hungarian word for tooth • Word Sense Ambiguity Search in English for spring (flexible coil)

  10. English query: ‘spring’ Doesn’t find intended sense (flexible coil)

  11. The Solution: PanImages PanImages compiler dictionaries translation graph 1. query PanImages query processor 2. translations translated query user 3. select translation images

  12. PanImages for Slovenian ‘grenivka’

  13. Translations of ‘grenivka’ French and English query has over 40,000 images

  14. Select the Intended Sense of ‘spring’

  15. Translated query for ‘spring’ French ‘ressort’ is unambiguous

  16. Outline of Talk • Overview of PanImages • Building a translation graph • Merging entries from multiple dictionaries • Computing translation probabilities • Image search with a translation graph • Experimental Results • Conclusions

  17. Input from Machine Readable Dictionaries • Multilingual dictionaries: • Each entry has translations in multiple languages • Distinguishes different senses of the word • “Wiktionaries” for 171 languages created by Web volunteers www.wiktionary.org • Esperanto dictionary purl.org/net/voko/revo • Bilingual dictionaries: • Each entry has translations into a single language • May mix together different senses of the word • freedict.org has 64 open source dictionaries

  18. Translation Graph • Nodes in the graph are ordered pairs (w, l) where w is a word in language l • Edges in the graph indicate translations between words • Each edge is labeled with a word sense ID Edges from ‘spring’ from an English dictionary ressort French printemps French 2 1 1 springEnglish 2 … … 2 1 primavera Spanish пружинаRussian

  19. Merging English and French Dictionaries ربيعArabic 3 udaherri Basque printempsFrench 3 … 1 … 3 1 … 1 3 3 1 1 3 … koangaMaori primaveraSpanish 1 springEnglish 2 veer Dutch … рысора Belarusian 2 2 2 4 4 2 … … 4 4 2 … vzmetSlovenian ressort French 4 пружинаRussian 4

  20. Inferring Word Sense Equivalence • Compute where and are word senses • Case1: and are each from dictionaries that • have translations into multiple languages • distinguish word senses • Case2: or are from dictionary that either • have translation into a single language • mix together word senses

  21. Word Sense Equivalence Multilingual dictionaries: is proportional to the degree of overlap between and , where nodes(s) is the set of nodes with edges labeled s.

  22. Word Sense Equivalence Bilingual dictionaries (or not sense distinguished): is high. Estimate this probability empirically. : a triangle from three dictionary entries xuân Vietnamese spring English printemps French

  23. Computing Translation Probabilities • Probability decreases each time the word sense ID changes. • Probability increases with multiple distinct paths. ربيعArabic 3 udaherri Basque printempsFrench 3 … 1 … 3 1 … 1 3 3 1 1 3 … koangaMaori primaveraSpanish 1 springEnglish 2 2 пружина Russian …

  24. Using PanImages: 1:Select language and word Select from 50 source languages automatic word completion

  25. 2: Select a Word Sense and Translation Select a word sense Select one or more translations

  26. 3.Get Images

  27. Graph Statistics • Translation graph from 17 dictionaries: • English Wiktionary: 19,500 words with translations • French Wiktionary: 12,700 words with translations • Esperanto dictionary: 23,000 words with translations • 14 bilingual dictionaries: average 90,000 words each • Graph has: 1.4 million words 957 languages 60 languages have over 1,000 words

  28. Experiment 1: Sense Merging • Graph from 3 dictionaries: • English wiktionary • French wiktionary • Esperanto dictionary • Select random 1,000 English words from graph • Compare number of Hebrew or Russian translations • No sense merging: direct edges in graph • Merging: translation paths where • Measure precision of translations gained by merging

  29. Results of Experiment 1 Sense merging gained: • 90% increase in translations for Hebrew with precision .83 • 79% increase in translations for Russian with precision .68 Recall Gain Merging no merging merged gain Precision Hebrew 80 152 90% 0.83 Russian 503 900 79% 0.68 Error analysis: Strong assumption that dictionaries distinguish word sense. Probability formulas fail when this assumption is violated.

  30. Experiment 2: Image Search • Graph from 3 dictionaries as in Experiment 1 • 10 concepts with distinctive images: ant, clown, fig, lake, sky, train, eat, run, happy, tired • 100 random non-English terms • 10 terms for each concept • Compare results of Google Image search • using non-English term as search query • using PanImages default translation as search query • Metrics: • Number of results • Precision of first page of results

  31. Results of Experiment 2 • 38-fold increase in the number of images returned • Increases precision from .50 to .69 Untranslated query PanImages translation Concept Results Precision Results Precision ant 7,415 0.49 653,000 0.72 clown 91,271 0.80 444,000 1.00 fig 11,905 0.40 3,210,000 0.33 lake 53,826 0.61 3,380,000 1.00 sky 121,373 0.50 2,290,000 0.83 train 34,168 0.84 1,790,000 1.00 eat 28,833 0.51 876,000 0.67 run 30,823 0.23 2,520,000 0.44 happy 37,056 0.21 3,120,000 0.39 tired 65,042 0.44 222,000 0.56 Average 48,171 0.50 1,850,500 0.69

  32. Conclusions • PanImages: a fully-implemented cross-lingual image search system for the Web: www.cs.washington.edu/research/panimages • PanImages boosts recall 38-fold, and raises precision for non-English search queries • We introduced the translation graph • Combines multiple machine readable dictionaries • Probabilistic word-sense merging across dictionaries • Infers translations not found in any source dictionary

  33. Thank you. Demo and poster in afternoon in room CSE 609 Or try it yourself: www.cs.washington.edu/research/panimages

  34. Computing Translation Probabilities • CLISE can find translations that are not in any single source dictionary • Translation probability decreases with each transition to a new word sense ID path P from to …

  35. Probability from Multiple Paths • Probability that is translated as in sense s increases when there are multiple paths between and “Noisy-or” probability model:

More Related