evaluation of hindi english marathi english and english hindi clir at fire 2008
Download
Skip this Video
Download Presentation
Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008

Loading in 2 Seconds...

play fullscreen
1 / 18

Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008 - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Center for Indian Language Technologies (CFILT) Department of CSE IIT Bombay. Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008. Nilesh Padariya, Manoj Chinnakotla, Ajay Nagesh and Om P. Damani. CLIR System Architecture. System Flow Example. युरोमधील वाढ.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008' - oria


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
evaluation of hindi english marathi english and english hindi clir at fire 2008
Center for Indian Language Technologies (CFILT)

Department of CSE

IIT Bombay

Evaluation of Hindi→English, Marathi→English and English→Hindi CLIR at FIRE 2008

Nilesh Padariya, Manoj Chinnakotla, Ajay Nagesh and Om P. Damani

clir system architecture
CLIR System Architecture

Evaluation of Hindi, Marathi CLIR at FIRE 2008

system flow example
System Flow Example

युरोमधीलवाढ

Query

Marathi Stemmer

Stemmed Query

यूरोवाढ

Translation Not Found

Dictionary Lookup

यूरो

Found

Translation Options

Transliteration

Inflation, rise, increase

Euro

Translation Disambiguation

English IR Engine

Final Translated Query

Euro Inflation

Evaluation of Hindi, Marathi CLIR at FIRE 2008

first participation in clef 2007
First Participation in CLEF 2007
  • Developed basic Query Translation system for Hindi to English and Marathi to English
  • Transliteration Algorithm
    • Simple rule-based system
    • Edit-distance based index-lookup to retrieve index tokens
    • Accuracy: ~ 65% at top 20
  • Translation Disambiguation
    • Many parameters were chosen based on intuition
    • Disambiguation strategy, No. of translation candidates etc.
  • Performance at CLEF 2007
    • Hindi to English: 67.06 % of Monolingual
    • Marathi to English: 56.09% of Monolingual

Evaluation of Hindi, Marathi CLIR at FIRE 2008

failure analysis for clef 2007
Failure Analysis for CLEF 2007

Evaluation of Hindi, Marathi CLIR at FIRE 2008

transliteration
Transliteration
  • Collection of parallel list of names for evaluation
    • Available datasets too small
    • Do not contain a good mix of words from native and loans words
    • Our current dataset: around 25K words
  • Algorithmic Improvements
    • Hindi to English: Rule-Based Transliteration with improved pruning and ranking strategies
    • English to Hindi: Substring-Based Transliteration
  • Current accuracy figures
    • Hindi to English: 80% accuracy at rank 5
    • English to Hindi: Evaluation to be done

Evaluation of Hindi, Marathi CLIR at FIRE 2008

translation disambiguation
Translation Disambiguation
  • Empirical study on translation disambiguation strategies and parameter choices
  • Choice of disambiguation strategy
    • Best Pair
    • Best cohesion
    • Best sequence
    • Iterative
  • Various parameters to the iterative disambiguation algorithm
    • Number of final candidates to choose
    • Use of weights?
    • Similarity measure
  • Datasets used: TREC AP, CLEF 2007
  • Best choice: Iterative, Dice Coefficient, 1 translation candidate, weights do not improve much

Evaluation of Hindi, Marathi CLIR at FIRE 2008

only transliteration on query
Only Transliteration on Query?
  • Motivation
    • Quite common to use actual Hindi word in English documents in Indian domains
    • Examples:
      • “Amarnath Travel” is referred to as “Amarnath Yatra”
      • “Ghar ka Khana” name of a restaurant in Bangalore
      • “Bhumi Pujan”
    • NEs crucial for fetching relevant documents
  • Experiments
    • Transliterate whole query
    • Transliterate only NEs, no translation

Evaluation of Hindi, Marathi CLIR at FIRE 2008

overall results title only
Overall Results (Title Only)

Evaluation of Hindi, Marathi CLIR at FIRE 2008

p r curves for english target
P-R Curves for English Target

Evaluation of Hindi, Marathi CLIR at FIRE 2008

p r curves for hindi target
P-R Curves for Hindi Target

Evaluation of Hindi, Marathi CLIR at FIRE 2008

results of transliteration experiment
Results of Transliteration Experiment

Evaluation of Hindi, Marathi CLIR at FIRE 2008

p r curves for transliteration expt
P-R Curves for Transliteration Expt.

Evaluation of Hindi, Marathi CLIR at FIRE 2008

conclusion
Conclusion
  • Improved transliteration and translation disambiguation modules based on CLEF 2007 analysis
  • Hindi to English CLIR performance is 75% of monolingual and Marathi to English is 64% of monolingual
  • Need further investigation on results especially the monolingual baselines – Hindi, Marathi and English
  • Only transliteration achieves around 35% of monolingual performance in Hindi and 25% in Marathi

Evaluation of Hindi, Marathi CLIR at FIRE 2008

acknowledgements
Acknowledgements
  • The second author is supported by the Infosys Fellowship Award
  • Project linguists at CFILT, IIT Bombay

Evaluation of Hindi, Marathi CLIR at FIRE 2008

references
References
  • S. Tarek and K. Grzegorz, Substring-Based Transliteration, In Proceedings of ACL, 2007
  • F. Huang, Cluster-specific named entity transliteration, In HLT ’05, pages 435–442, Morristown, NJ, USA, 2005.
  • I. Ounis, G. Amati, P. V., B. He, C. Macdonald, and Johnson, Terrier Information Retrieval Platform, In Proceedings of ECIR 2005, volume 3408 of Lecture Notes in Computer Science, pages 517–519. Springer, 2005.
  • Christof Monz and Bonnie J. Dorr, Iterative Translation Disambiguation for Cross-Language Information Retrieval, In SIGIR ’05, Pages 520-527, New York, USA, ACM Press
  • Nicola Bertoldi and Marcello Federico, Statistical Models for Monolingual and Bilingual Information Retrieval, Information Retrieval, 7 (1-2): 53-72, 2004

Evaluation of Hindi, Marathi CLIR at FIRE 2008

references contd
References (Contd..)
  • Martin Braschler and Carol Peters, Cross Language Evaluation Forum: Objectives, Results, Achievements,Information Retrieval, 7 (1-2): 7-31, 2004
  • Ricardo BaezaYates and Berthier RibeiroNeto, Modern Information Retrieval, Pearson Education, 2005.
  • Dan Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology,Cambridge University Press, 1997.

Evaluation of Hindi, Marathi CLIR at FIRE 2008

thanks
Thanks!

Evaluation of Hindi, Marathi CLIR at FIRE 2008

ad