1 / 24

Social Networks 101

Social Networks 101. Prof. Jason Hartline and Prof. Nicole Immorlica. Lecture Ten : The web and PageRank . The internet vs the web. The internet : The world wide web : Nodes = machines Nodes = webpages Edges = wires Edges = hyperlinks. The web is a directed graph. Cows:

vina
Download Presentation

Social Networks 101

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Social Networks 101 Prof. Jason Hartline and Prof. Nicole Immorlica

  2. Lecture Ten: The web and PageRank.

  3. The internet vs the web The internet: The world wide web: Nodes = machines Nodes = webpages Edges = wires Edges = hyperlinks

  4. The web is a directed graph Cows: Dairy Meat Dairy: Cheese Milk Meat: Cow Lamb

  5. Directed graphs a b Edge (a,b) = edge from a to b.

  6. Directed paths v2 v3 v1 Path (v1, v2, v3, v4). v4 Definition: A directed path from v1 to vk is a sequence of nodes (v1, …, vk) such that for any adjacent pair vi and vi+1, there’s an edge from vi to vi+1.

  7. Strongly connected components Not strongly connected. Strongly connected. Definition: A strongly connected component is a subset of nodes {v1, …, vk} such that for any pair vi and vj in the set, there’s a path from vi to vj.

  8. What does the web look like? Strongly connected component 56 million nodes

  9. What does the web look like? Disconnected components Strongly connected component In Out Tubes Tendrils

  10. Searching the web Q. How can Google answer your questions without understanding them? A. It uses the hyperlink structure.

  11. Basic ideas • A link to a page is an endorsement of that page’s quality. • Links from high quality pages are better than links from low quality pages.

  12. First attempt Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides its tokens equally among all out-going links.

  13. 1/5 1/5 1/5 1/5 Initialization 1/5

  14. 4/15 3/15 1/15 3/15 First round 4/15

  15. What could go wrong? Some node eventually collects all tokens.

  16. What could go wrong? Some node eventually collects all tokens.

  17. PageRank Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides 1. an s fraction of its tokens equally among all out-going links. 2. a (1-s) fraction equally among all nodes

  18. Important properties of PageRank • It converges (the PageRank of a page is the number of tokens it owns in the limit). • The initialization doesn’t matter.

  19. Random walks and PageRank Randy browses the web randomly.

  20. Start at arbitrary node. With prob. s, travel to random out-going link, With prob. (1-s), travel to random node. Repeat forever and ever.

  21. Important properties Randy’s walk, 1. Converges: the probability Randy is on any given page approaches a fixed number in the limit. 2. It doesn’t matter where he starts.

  22. Randy’s walk = PageRank The probability Randy is on a given page is proportional to that page’s PageRank.

  23. Extensions Anchor text Click probabilities Link/click spam

  24. Next time TBA

More Related