1 / 37

Stefanos Souldatos , Theodore Dalamagas , Timos Sellis

Captain Nemo: a Metasearch Engine with Personalized Hierarchical Search Space ( http://www.dblab.ntua.gr/~stef/nemo). Stefanos Souldatos , Theodore Dalamagas , Timos Sellis ( N ational Technical University of Athens , Greece ). INTRODUCTION. Metasearching.

ferris
Download Presentation

Stefanos Souldatos , Theodore Dalamagas , Timos Sellis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Captain Nemo:a Metasearch Engine with Personalized Hierarchical Search Space(http://www.dblab.ntua.gr/~stef/nemo) Stefanos Souldatos, Theodore Dalamagas, Timos Sellis (National Technical University of Athens, Greece)

  2. INTRODUCTION

  3. Metasearching Metasearch engines can reach a large part of the web. Search Engine 1 Metasearch Engine Search Engine 2 Search Engine 3

  4. Personalization Personalization is the new need on the Web.

  5. Personalization in Metasearching Personalization can be applied in all 3 stages of metasearching: Result Retrieval Result Presentation Result Administration

  6. Personalization in Metasearching Personal Retrieval Model search engines, #pages, timeout Personalization can be applied in all 3 stages of metasearching: Result Retrieval Result Presentation Result Administration

  7. Personalization in Metasearching Personal Presentation Style grouping, ranking, appearance Personalization can be applied in all 3 stages of metasearching: Result Retrieval Result Presentation Result Administration

  8. Personalization in Metasearching Thematic Classification of Results k-Nearest Neighbor, Support Vector Machines, Naive Bayes, Neural Networks, Decision Trees, Regression Models Personalization can be applied in all 3 stages of metasearching: Result Retrieval Result Presentation Result Administration

  9. ROOT ART fine arts SPORTS athlete score referee CINEMA movie film actor PAINTING painter camvas gallery BASKETBALL basket nba game FOOTBALL ground ball match ROOT ART fine arts SPORTS athlete score referee CINEMA movie film actor PAINTING painter camvas gallery BASKETBALL basket nba game FOOTBALL ground ball match Hierarchical Classification Flat Model Hierarchical Model

  10. RELATED WORK

  11. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration

  12. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration Search Ixquick Infogrid Mamma Profusion WebCrawler Query Server User defines the: search engines to be used

  13. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration User defines the: Infogrid Mamma Profusion Query Server timeout option (i.e. max time to wait for search results)

  14. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration User defines the: Profusion Query Server number of pages to be retrieved by each search engine

  15. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration

  16. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration Dogpile WebCrawler MetaCrawler Result can be grouped by search engine that retrieved them

  17. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration

  18. Personalization in Metasearch Engines Result Retrieval Result Presentation Result Administration Organizes search results into dynamic custom folders Northern Light Recognises thematic categories and improves queries towards a category Inquirus2 Buntine et al. (2004) Topic-based open source search engine

  19. CAPTAIN NEMO

  20. Personal Retrieval Model Result Retrieval Result Presentation Result Administration

  21. Personal Retrieval Model Result Retrieval Result Presentation Result Administration • Search Engines • Number of Results • Search Engine Timeout • Search Engine Weight Search Engine 1 Search Engine 2 Search Engine 3    20 30 10 6 8 4 7 10 5

  22. Personal Presentation Style Result Retrieval Result Presentation Result Administration

  23. Personal Presentation Style Result Retrieval Result Presentation Result Administration • Result Grouping • Merged in a single list • Grouped by search engine • Grouped by relevant topic of interest • Result Content • Title • Title, URL • Title, URL, Description

  24. Personal Presentation Style Result Retrieval Result Presentation Result Administration • Look ‘n’ Feel • Color Themes (XSL Stylesheets) • Page Layout • Font Size

  25. Topics of Personal Interest Result Retrieval Result Presentation Result Administration

  26. Topics of Personal Interest Result Retrieval Result Presentation Result Administration • Administration of topics of personal interest • The user defines a hierarchy of topics of personal interest (i.e. thematic categories). • Each thematic category has a name and a description of 10-20 words. • The system offers an environment for the administration of the thematic categories and their content.

  27. Topics of Personal Interest Result Retrieval Result Presentation Result Administration • Hierarchical classification of results • The system proposes the most appropriate thematic category for each result (Nearest Neighbor). • The user can save the results in the proposed or other category.

  28. ROOT ART fine arts SPORTS athlete score referee CINEMA movie film actor PAINTING painter camvas gallery BASKETBALL basket nba game FOOTBALL ground ball match Classification Example • Query: “Michael Jordan” • Results in user’s topics of interest: 3 3 2 8

  29. METASEARCH RANKING

  30. Two Ranking Approaches Using Initial Scores of Search Engines Not Using Initial Scores of Search Engines

  31. Using Initial Scores • Rasolofo et al. (2001) believe that the initial scores of the search engines can be exploited. • Normalization is required in order to achieve a common measure of comparison. • A weight factor incorporates the reliability of each search engine. Search engines that return more Web pages should receive higher weight. This is due to the perception that the number of relevant Web pages retrieved is proportional to the total number of Web pages retrieved as relevant.

  32. Not Using Initial Scores • The scores of various search engines are not compatible and comparable even when normalized. • Towell et al. (1995) note that the same document receives different scores in various search engines. • Gravano and Papakonstantinou (1998) point out that the comparison is not feasible not even among engines using the same ranking algorithm. • Dumais (1994) concludes that scores depend on the document collection used by a search engine.

  33. Aslam and Montague (2001) • Bayes-fuse uses probabilistic theory to calculate the probability of a result to be relevant to a query. • Borda-fuse is based on democratic voting. It considers that each search engine gives votes in the results it returns (N votes in the first result, N-1 in the second, etc). The metasearch engine gathers the votes and the ranking is determined democratically by summing up the votes.

  34. Aslam and Montague (2001) • Weighted borda-fuse: weighted alternative of borda-fuse, in which search engines are not treated equally, but their votes are considered with weights depending on the reliability of each search engine.

  35. Weighted Borda-Fuse • V (ri,j) = wj * (maxk(rk) - i + 1) • V(ri,j): Votes of i result of j search engine • wj: weight of j search engine (set by user) • maxk(rk) : maximum number of results • Example: W1=7 SE1: W2=10 SE2: W3=5 SE3:

  36. Captain Nemo http://www.dblab.ntua.gr/~stef/nemo

  37. Links Introduction Related work Captain Nemo Metasearch Ranking

More Related