1 / 16

Robust – Word Sense Disambiguation exercise

UBC : Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH : Thomas Mandl. Robust – Word Sense Disambiguation exercise. Introduction. Robust: emphasize difficult topics using non-linear combination of topic results (GMAP) WSD: also automatic word sense annotation:

duyen
Download Presentation

Robust – Word Sense Disambiguation exercise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl Robust – Word Sense Disambiguation exercise CLEF 2009 - Kerkyra

  2. CLEF 2009 - Kerkyra Introduction Robust: emphasize difficult topics using non-linear combination of topic results (GMAP) WSD: also automatic word sense annotation: English documents and topics (English WordNet) Spanish topics (Spanish WordNet - closely linked to the English WordNet) Participants explore how the word senses (plus the semantic information in wordnets) can be used in IR and CLIR This is the second edition of Robust-WSD

  3. CLEF 2009 - Kerkyra Documents News collection: LA Times 94, Glasgow Herald 95 Sense information added to all content words Lemma Part of speech Weight of each sense in WordNet 1.6 XML with DTD provided Two leading WSD systems: National University of Singapore University of the Basque Country Significant effort (100Mword corpus) Special thanks to Hwee Tou Ng and colleagues from NUS and Oier Lopez de Lacalle from UBC

  4. CLEF 2009 - Kerkyra Documents: example XML

  5. CLEF 2009 - Kerkyra Topics We used existing CLEF topics in English and Spanish: 2001; 41-90; LA 94 2002; 91-140; LA 94 2004; 201-250; GH 95 2003; 141-200; LA 94, GH 95 2005; 251-300; LA 94, GH 95 2006; 301-350; LA 94, GH 95 First three as training (plus relevance judg.) Last three for testing

  6. CLEF 2009 - Kerkyra Topics: WSD English topics were disambiguated by both NUS and UBC systems Spanish topics: no large-scale WSD system available, so we used the first-sense heuristic Word sense codes are shared between Spanish and English wordnets Sense information added to all content words Lemma Part of speech Weight of each sense in WordNet 1.6 XML with DTD provided

  7. CLEF 2009 - Kerkyra Topics: WSD example

  8. CLEF 2009 - Kerkyra Evaluation Reused relevance assessments from previous years Relevance assessment for training topics were provided alongside the training topics MAP and GMAP Participants had to send at least one run which did not use WSD and one run which used WSD

  9. Participation • 10 official participants • 58 monolingual runs • 31 bilingual runs CLEF 2009 - Kerkyra

  10. Monolingual results • MAP: non-WSD best, 2 participants improve using WSD • GMAP: non-WSD best, 3 participants improve using WSD CLEF 2009 - Kerkyra

  11. CLEF 2009 - Kerkyra Monolingual: using WSD • Darmstadt: combination of several indexes, including monolingual translation model • No improvement using WSD • Reina: • UNINE: synset indexes, combine with results from other indexes • Improvement in GMAP • UCM: query expansion using structured queries • Improvement in MAP and GMAP • IXA: use semantic relatedness to expand documents • No improvement using WSD • GENEVA: synset indexes, expanding to synonyms and hypernyms • No improvement, except for some topics • UFRGS: only use lemmas (plus multiwords) • Improvement in MAP and GMAP

  12. CLEF 2009 - Kerkyra Monolingual: using WSD UNIBA: combine synset indexes (best sense) Improvements in MAP Univ. of Alicante: expand to all synonyms of best sense Improvement on train / decrease on test Univ. of Jaen: combine synset indexes (best sense) No improvement, except for some topics

  13. Bilingual results • MAP and GMAP: best results for non-WSD • 2 participants increase GMAP using WSD, 2 increase MAP. Improvements are rather small. CLEF 2009 - Kerkyra

  14. CLEF 2009 - Kerkyra Bilingual: using WSD IXA: wordnets as the sole sources for translation Improvement in MAP UNIGE: translation of topic for baseline No improvement UFRGS: association rules from parallel corpora, plus use of lemmas (no WSD) No improvement UNIBA: wordnets as the sole sources for translation Improvement in both MAP and GMAP

  15. CLEF 2009 - Kerkyra Conclusions and future Successful participation 10 participants Use of word senses allows small improvements on some stop scoring systems Further analysis ongoing: Manual analysis of topics which get significant improvement with WSD Significance tests (WSD non-WSD) No need of another round: All necessary material freely available http://ixa2.si.ehu.es/clirwsd Topics, documents (no word order, Lucene indexes), relevance assesments, WSD tags

  16. Thank you! Robust – Word Sense Disambiguation exercise CLEF 2009 - Kerkyra

More Related