1 / 17

Entity Ranking Using Wikipedia as a Pivot

Entity Ranking Using Wikipedia as a Pivot. (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu. Outline. Introduction From Wikipedia Entities to Web Entities and back Entity Ranking on Wikipedia Entity Ranking on Web Conclusion. Introduction.

maeve
Download Presentation

Entity Ranking Using Wikipedia as a Pivot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu

  2. Outline • Introduction • From Wikipedia Entities to Web Entities and back • Entity Ranking on Wikipedia • Entity Ranking on Web • Conclusion

  3. Introduction • Entity ranking is the task of finding documents representing entities of a correct type that are relevant to a query. • presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities.

  4. Differs from document retrieval on at least three points: • i) returned documents have to represent an entity • ii) this entity should belong to a specified entity type • iii) to create a diverse result list an entity should only be returned once.

  5. Main Goal • To Rank Web entities • 1. Associate target entity types with the query • 2. Rank Wikipedia pages according to their similarity with the query and target entity types • 3. Find web entities corresponding to the Wikipedia entities

  6. Using Wikipedia as a pivot • entities: Wikipedia pages • the name of the entity: the title of the page • the content of the page: the representation of the entity • Each Wikipedia page is assigned to a number of categories: topical, type, and administrative categories.

  7. From Wikipedia Entities to Web Entities and back • From Web to Wikipedia • these repositories provide enough clues to find the corresponding entities on theWeb? • they contain enough entities that cover the complete range of entities needed to satisfy all kinds of information needs?

  8. From Wikipedia to Web • Use External Link

  9. Entity Ranking on Wikipedia* Entity Types • Entity Type Assignment • exploit the existing Wikipedia categorization of documents • Pseudo-relevance feedback of the top retrieved documents • we extract the categories that are most frequently assigned • the top 10 results, and look at the 2 most frequently occurring categories belonging to these documents

  10. : the query terms: the document: the entire Wikipedia document collection : the name of the category: the category *Entity Types-Scoring Entities • estimate background probabilities • smooth the probabilities of a term occurring in a category name with the background collection

  11. Similarity between two categories • The entity type score for a document in relation to a query topic • Score Normalization

  12. Entity Ranking on Wikipedia*Experimental Setup • Data Set: • INEX: specific, ex countries, national parks.. • TREC: people, organization, product • Advantage: clear, few options, could be easily selected • Disadvantage: cover a small part of all possible entity ranking queries manually assigned more specific entity types

  13. rerank the top 2,500 results of the baseline • Manually assigned (author) • Automatically assigned (PRF) • evaluation • 2009 TREC:P10 and NDCG@20 • INEX:P10 and MAP • INEX 2006-2008 consisting of 79 topics • INEX 2009 topics consisting of a selection of 55 topics from the 2006-2008 topics. • only count the so-called ‘primary’ pages

  14. Entity Ranking on The Web • We have three approaches for finding web pages associated with Wikipedia pages. • 1. External links: • the External links section of the Wikipedia page • 2. Anchor text: • Wikipedia page title as query • retrieve pages from the anchor text index • 3. Combined: • not all Wikipedia pages have external links • not all external links of Wikipedia pages are part of the Clueweb collection • less than 3 webpages are found, we fill up the results to 3 pages using the top pages retrieved using anchor text

  15. Conclusion • Our experiments show that our wikipedia-as-a-pivot approach outperforms a baselines of full-text search. • Both external links on Wikipedia pages, and searching an anchor text index of the web are effective approaches to find homepages for entities represented by Wikipedia pages.

More Related