1 / 21

9 Algorithms: PageRank

9 Algorithms: PageRank. Ranking. After matching, have to rank:. Index Based Ranking. Strategies we could (do) use: Frequency Position Metadata. Missing Ingredient. Index lacks intra-page information. Link Quality. More links is easy to abuse. Spam Link Pages. Link Quality.

adamma
Download Presentation

9 Algorithms: PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 9 Algorithms:PageRank

  2. Ranking • After matching, have to rank:

  3. Index Based Ranking • Strategies we could (do) use: • Frequency • Position • Metadata

  4. Missing Ingredient • Index lacks intra-page information

  5. Link Quality • More links is easy to abuse Spam Link Pages

  6. Link Quality • Not all links are equal • Who do you trust? • CS Prof • World Famous Chef

  7. Identifying Authority • Links into a page give it authority • Page value = sum of authorities of pages linking to it

  8. Issues • Spam Links • Discourage with negative weight Spam Link Pages -1 -1 -1 -1 -1 -1

  9. Issues • Spam Links • Discourage with negative weight • Cycles:

  10. Issues • Spam Links • Discourage with negative weight • Cycles:

  11. Issues • Spam Links • Discourage with negative weight • Cycles: …

  12. Random Surfer • Simulating a web surfing session • Start at random page • At each page have a chance to • Pick a random link to go to • Jump to a completely random page

  13. Results • Results of many random sessions:

  14. Results • Expressed as percentages, results stabilize • Law of large numbers

  15. Cycle Buster • Random surfer not phased by cycles:

  16. Random Surfer In Use • The recipe pages visited by random surfers:

  17. Simulator • PageRank Simulator: http://caccio.blogdns.net/software/pagerank-simulator

  18. The Real Math • Markov Chains • Set of states • Each state has probability of leading to other states • Represent as matrix

  19. Excel Simulation • Three pages:

  20. Limitations • Still have issues/room for growth • Link Spam • Context of link • Where link is on page • "Bob's recipe is terrible" vs "Bob's recipe is great" • Lack of semantic knowledge • Page's Authority should not be the same for all domains

  21. Power • Controlling search is power: http://www.bitsbook.com/ "If you're not paying for the product, you are the product."

More Related