1 / 17

Similarity Measures for Query Expansion in TopX

Similarity Measures for Query Expansion in TopX. Caroline Gherbaoui. Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2 - Informatik. Max-Planck-Institut für Informatik AG 5 - Datenbanken und Informationssysteme Prof. Dr. Gerhard Weikum. Overview.

Download Presentation

Similarity Measures for Query Expansion in TopX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SimilarityMeasuresfor Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2 - Informatik Max-Planck-Institut für Informatik AG 5 - Datenbanken und Informationssysteme Prof. Dr. Gerhard Weikum

  2. Overview • background knowledge • similarity measures for the query expansion • evaluation of the computed similarity values • changes in TopX • conclusion

  3. Background • top-k query processing • provides k most relevant results • query expansion • extends source query terms • word sense disambiguation • extracts correct meaning • ontology • amount of terms with their meanings and semantic relations

  4. Word Sense Disambiguation „java, coffee“ „island“ „coffee“ „java “ „programming language“ …

  5. Query Expansion „COFFEE“ „drink, espresso“

  6. TopX • top-k retrieval engine • text and XML data • word sense disambiguation • query expansion • ontology

  7. TopX – WordNet Ontology • lexicon for the English language • hierarchical relations • one relation  one direction • ~160,000 words • ~120,000 synsets • ~210,000 relations

  8. TopX – YAGO Ontology • Wikipedia and WordNet • hierarchical and not hierarchical relations • one relation  two directions • ~2,100,000 words • ~2,200,000 concepts • ~6,000,000 relations

  9. Similarity Measures • Dice similarity • the already used measure in TopX • NAGA similarity • applied measure for YAGO • Best WordNet similarity • measure with best result among WordNet measures

  10. Dice Similarity Measure • sdfsdf • measures the intersection of two regions

  11. NAGA Similarity Measure • sdfasfsdf • combination of the confidence of a relation and the informativeness of a relation

  12. Best WordNet Similarity Measure • sdfsdfsdf • product of the transfer function of the path length and the transfer function of the concept depth

  13. Evaluation

  14. Evaluation • DICE measure  applicable • also on the YAGO ontology • NAGA measure  applicable • with omitting of the forward direction • Best WordNet measure  not applicable • due to the density of YAGO

  15. Changes for TopX • tuning of some procedures • Dijkstra algorithm • word sense disambiguation • query expansion • extension of configuration file

  16. Conclusion • larger knowledge base • more flexibility • increased complexity • further measure for the similarity computation  NAGA similarity

  17. Questions?

More Related