1 / 22

Kshitij: A Search and Page Recommendation System for Wikipedia

Kshitij: A Search and Page Recommendation System for Wikipedia. Phanikumar Bhamidipati, Kamalakar Karlapalem Center for Data Engineering International Institute of Information Technology, Hyderabad, India COMAD 2008. Nam, Kwang-hyun Intelligent Database Systems Lab

franz
Download Presentation

Kshitij: A Search and Page Recommendation System for Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kshitij: A Search and Page Recommendation System for Wikipedia Phanikumar Bhamidipati, Kamalakar Karlapalem Center for Data Engineering International Institute of Information Technology, Hyderabad, India COMAD 2008 Nam, Kwang-hyun Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Center for E-Business Technology Seoul National University Seoul, Korea

  2. Contents • Motivation • Problem statement • Kshitij • Overview • Graph Model • Architecture • Algorithms • CBR, LBR, YBR, AR • Results • Conclusion & Future Work • Discussion

  3. Motivation • New paradigms in Search • Increased interest after PageRank and HITS (Hyperlink-Induced Topic Search) algorithms • Wikipedia • Powerful online collaborative encyclopedia • Vast knowledge, available in structured format • The links in each page represent some kind of relation with the base page • Can be mine both the semantics and data from Wikipedia • Need for systems that leverage Wikipedia knowledge in recommendations

  4. Kshitij • A generic recommendation system based on Wikipedia semantics • Provides two services • Search Recommendations • Page Recommendations • Uses Yago as the stored knowledge base • Extracts additional knowledge dynamically from the Wiki pages.

  5. Search Recommendations Result from Search Engine Kshitij Recommendations Keyword as input

  6. Page Recommendations • When the user visit a page, its identifier is sent as input to the algorithms to obtain recommendations • The most relevant aggregated results • Displayed as hyperlinks

  7. Kshitij - Overview • Leverages the structured model powered by Wikis • Categories • Links • YAGO • An ontology compiled from Wikipedia • The static source of knowledge

  8. The Graph Structure Search Atari 7800 Atari Jaguar Atari Jaguar II Jaguar Felidae Jaguar Cars Black Panther William Lyons Automobile Mammal

  9. Kshitij – Architecture

  10. Kshitij - Algorithms • Three individual recommendations that explore different semantics • CBR • LBR • YBR • A link based aggregator (AR) • Combines the three into single set of recommendations

  11. Category Based Recommendations (CBR) • Key idea • If two pages belong to multiple categories together, the probability that they belong to the same topic increases • London and Berlin in Capitals In Europe and Host cities of the Summer Olympic Games • Algorithm • Starts with a set of pages (search output) • Explores category structure to obtain candidate pages • Prunes the list based on similarity values calculated from shared categories using threshold T1 and T2

  12. Link Based Recommendations (LBR) • Key idea • If two pages are referred together from the same set of pages, they could be considered as related • Competing sports persons, countries in same alliance • Algorithm • Start with search results and output of CBR • Identify frequent item sets • Support by search results is high over CBR output

  13. Yago Based Recommendations (YBR) • Set of facts in triplet form <E1, R, E2> • <New Delhi, Is Capital Of, India> • Prune the relation types • Key idea • To find a prioritized set of entities that are related to a given set of Wikipedia pages • Algorithm • Start with search output • Retrieve entities related to these pages based on the weight measure • Merge the lists and identify the related pages

  14. Diversity of the algorithms • Each explores different knowledge space • The graph explored along edges of a specific color • Recommendations of individual algorithms differ • Need for aggregation • Combines and prioritizes the results

  15. Aggregated Recommendations (AR) • To group them based on the topic each result belongs to • A link based approach • Algorithm • Start with search results and an aggregated list of CBR, LBR and YBR (Cumulative List (CL)) • Explore the neighborhood for each search result to find how many in CL are reachable • A threshold T on the nearness value to filter the related page • Each result page as a point in k-dimensional space (each dimension by one page in CL) • Run Agglomerative Nesting (AGNES – A hierarchical clustering algorithm) to obtain clusters of result pages

  16. Results: Evaluation • Mean Absolute Error (MAE) • To evaluate the effectiveness of a recommendation system

  17. Results: Search Recommendations • A value of 0.4 for T balances both fetching moderate number of recommendation and keeping good quality

  18. Results: Search Recommendations • Keyword: jaguar

  19. Results: Search Recommendations • Keyword: amazon

  20. Results: Page Recommendations

  21. Conclusion & Future Work • Good quality recommendations can be obtained from annotated knowledge bases using only semantic information • More Wikipedia structures • Templates, References, Info-Boxes, History • Currently, calculates the recommendations on-demand • Plan to come up with a strategy that pre-calculates and stores recommendations set

  22. Discussion • Pros • Present a generic recommendation system that utilizes the stored as well as dynamically extracted semantics from Wikipedia • Good examples • Cons • The figures and tables are not sequentially located. • No comparison with other recommendation system • But, the authors mention that there is no existing recommendation system with which they can directly compare theirs.

More Related