1 / 15

Multilingual Information Access in a Digital Library

Multilingual Information Access in a Digital Library. Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India. Context. Digital Library of India 155,000 English books 145,000 Other language books

hammer
Download Presentation

Multilingual Information Access in a Digital Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information Technology Hyderabad, India

  2. Context • Digital Library of India • 155,000 English books • 145,000 Other language books • Population of literates • 20% of India understand English • 80% can not IIIT Hyderabad - http://dli.iiit.ac.in

  3. Multilingual Access to Information • Retrieve a book • By metadata • By keyword / content • Cross Lingual Information Retrieval • Read a book • Help understand sentences in a language • Help understand sentences across languages • Machine Translation IIIT Hyderabad - http://dli.iiit.ac.in

  4. Approaches to Multilingual Access • Cross Lingual Retrieval • Translate Query to Document Language • Translate Document to Query Language • Machine Translation • Knowledge Based Approaches • Corpus Based Approaches • Hybrid Approaches IIIT Hyderabad - http://dli.iiit.ac.in

  5. Challenges in Multilingual Access • Corpus Based Approaches • Unavailability of Parallel Corpus for pairs of languages • Unavailability of Computational Linguistics Resources • Dictionary Based Approaches • Unavailability of multiple bilingual dictionaries IIIT Hyderabad - http://dli.iiit.ac.in

  6. Resources • Universal Dictionary • Conceived and implemented by Michael Shamos at CMU, USA • ITRANS • A transcription scheme and associated tool built by IISc, IIIT and CMU • Corpus • Data Entry by TTD and DLI project • TIDES project IIIT Hyderabad - http://dli.iiit.ac.in

  7. Universal Dictionary IIIT Hyderabad - http://dli.iiit.ac.in

  8. How are we doing it • Cross Lingual Search (Identify Information) • Dictionary lookup • User feedback based • Lucene Search Engine • Machine Translation (Understand Information) • Corpus based technique (EBMT) • Dictionary based word-word lookup • Good-enough translation vs Perfect translation IIIT Hyderabad - http://dli.iiit.ac.in

  9. Cross Lingual Retrieval IIIT Hyderabad - http://dli.iiit.ac.in

  10. Cross Lingual Retrieval IIIT Hyderabad - http://dli.iiit.ac.in

  11. Reading Assistant System IIIT Hyderabad - http://dli.iiit.ac.in

  12. Reading Assistant IIIT Hyderabad - http://dli.iiit.ac.in

  13. Status Today • CLIR for 6 languages • MT for 3 languages • Shakti (a knowledge based MT system) • Parallel Corpus for Hindi-Eng • UDICT • About 40 Foreign Languages • 6 Indian Languages IIIT Hyderabad - http://dli.iiit.ac.in

  14. What more is needed? • UDICT • Improving coverage of existing languages • Adding new languages • Machine Translation • Corpus acquisition • State of art techniques applied to Indian Languages • Multi-way parallel corpus development • Textual format for the books • Books currently are in Image formats • OCR should be developed for textual content IIIT Hyderabad - http://dli.iiit.ac.in

  15. Thank You Questions ?

More Related