1 / 20

Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011

Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins. Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011. Outline. Characteristics of the WWW Motivation for building search engines

mariarose
Download Presentation

Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypersearching the WebSoumen Chakrabarti, Byron Dom, S. Ravi Kumar,Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011

  2. Outline Characteristics of the WWW Motivation for building search engines Traditional SEs and the challenges Improvements the associated problems CLEVER Power of hyperlinks Hubs and Authorities Algorithm Evaluate CLEVER Future scope Answer questions and class discussion CS572-Joseph

  3. WWW ~ Universe CS572-Joseph

  4. Motivation for search engines CS572-Joseph

  5. Initial Attempts Ranking functions based on simple heuristics CS572-Joseph

  6. Challenges: Synonymy CS572-Joseph

  7. Challenges: Polysemy CS572-Joseph

  8. Challenges: Spamming Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets White font on White background CS572-Joseph

  9. Improvements Semantic Networks Human selectors Impractical Helps synonymy but worsens polysemy CS572-Joseph

  10. Hyperlinks - What a CLEVER idea! CS572-Joseph

  11. Hubs & Authorities CS572-Joseph

  12. How it works CS572-Joseph

  13. Clever vs. Google Google’s faster! Clever looks back also CS572-Joseph

  14. Pros Rapid convergence (5 iterations for root set of 3000 pages) Independent of the initial H, A scores Get info even before we actually crawl CS572-Joseph

  15. Segregation of web into clusters CS572-Joseph

  16. Cons • The underlying assumption – “Web links confer authority” – could be incorrect! • Navigation • Advertisement • Disapproval CS572-Joseph

  17. Cons Ignores the Anchor text It is not necessary for every page to be either a hub or an authority Universally popular Websites like Wikipedia will be an authority on almost everything May return a General result for a Narrow topic search CS572-Joseph

  18. What’s next? CS572-Joseph

  19. References • S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins,Hypersearching the Web. Scientific American, June 1999. • CLEVER project (http://www.almaden.ibm.com/projects/clever.shtml) • J. Kleinberg.Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998 • S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. Vol. 30, No. 1-7, pp. 107-117, 1998. • WordNet Project (http://wordnet.princeton.edu/) CS572-Joseph

  20. Group Discussion CS572-Joseph

More Related