1 / 22

Lecture 4

Lecture 4. CS492 Special Topics in Computer Science Distributed Algorithms and Systems. “The PageRank Citation Ranking: Bringing Order to the Web”. L. Page, S. Brin , R. Motwani , T Winograd 1998. Origin of “Google”. Hostnames Active. http://news.netcraft.com. Googol 10^100

tori
Download Presentation

Lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4 CS492 Special Topics in Computer Science Distributed Algorithms and Systems

  2. “The PageRank Citation Ranking:Bringing Order to the Web” L. Page, S. Brin, R. Motwani, T Winograd 1998 Fall 2008 CS492

  3. Origin of “Google” Hostnames Active http://news.netcraft.com Fall 2008 CS492 • Googol • 10^100 • Motivation behind • Human maintained indices such as Yahoo! • Explosive growth

  4. Design Goals of Google Fall 2008 CS492 • Improved search quality • In 1997, 1 out of 4 top search engines found itself • High precision in finding relevant document was necessary • Academic search engine research • Search engine technology went commercial: an black art • To build systems that a good number of people could use • To build an architecture to support novel research on large-scale Web data

  5. Weakness of Existing Approaches Fall 2008 CS492 • Calculate similarities • Based on flat, vector-space model of each page • Prone to cheating (Web spamming or search engine persuasion)

  6. Basic Idea of PageRank Fall 2008 CS492 Exploit the topological structure of hypertextual systems

  7. Simple Example A 0.4 C 0.4 B 0.2 Fall 2008 CS492

  8. Related Work Fall 2008 CS492 • Academic citation analysis • Similarities • Graph structure; paper = node, web page = node citation = link, URL = link • “node” authority independent of “node” content • Differences • Uniform unit of info (paper) versus great variability in quality, usage, citations, and length • Equal link weight vs variable importance A backlink from Yahoo! vs. from a friend

  9. Which Page Should Be Ranked Higher? JohnDoe A B Fall 2008 CS492

  10. Simple Expression page rank of set of pages pointing at out-degree of Question: role of c? Answer: total rank of all web pages constant Fall 2008 CS492

  11. Dangling links Fall 2008 CS492 • Pages without outgoing pointers • Example: Pages not yet downloaded • Do not affect the calculation much • Remove them, calculate ranks, and add them back

  12. Loop A C B Question: ranks of A, B, and C? Answer: infinite! (rank sink) Fall 2008 CS492

  13. Basic Algorithm page rank of set of pages pointing at out-degree of dumping factor Fall 2008 CS492

  14. Matrix Representation where and Fall 2008 CS492 Question: Where to start?

  15. Iterative Algorithm where and Question: Will it converge? Fall 2008 CS492

  16. Example [LM04] Fall 2008 CS492

  17. Turn the Problem into a Markov Process [LM04] Fall 2008 CS492

  18. Evenly Split Rank of Dangling Links [LM04] Fall 2008 CS492

  19. Final Solution Fall 2008 CS492 Eigenvector of P = steady state rank

  20. Spam Rank [BGS05] Fall 2008 CS492

  21. Questions Fall 2008 CS492 • Where to start? • Find a nondegenerate start vector • What if there are two pages that point to each other and no one else and there is a page that points to one of them? • Role of dumping factor guarantees no rank sink

  22. References Fall 2008 CS492 [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999).

More Related