1 / 16

Study of Page Rank and HITS

Study of Page Rank and HITS. By: Ankit Sethi FSU. How do Search Engines work. Motive . Goals of Study: Getting aware about Page Rank and HITS- two popular algorithms used by search engines to rank webpages.

gittel
Download Presentation

Study of Page Rank and HITS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Study of Page Rank and HITS By: AnkitSethi FSU

  2. How do Search Engines work

  3. Motive • Goals of Study: • Getting aware about Page Rank and HITS- two popular algorithms used by search engines to rank webpages. • To provide a future direction in the area of efficient algorithms for ranking of web pages.

  4. Do we really need Page Ranking? • The name “Page Rank” comes from Larry Page. • In earlier days, search engines used to link to pages having the highest keyword density—But Is that good? • The number and importance of links pointing to a given webpage determines its Page ranking

  5. Importance of Page Rank • Page rank algorithm computes a web page’s importance • Larry Page concluded that the pages with the highest number of links to them are most important.

  6. Link Structure of the Web • Backlinks and Forward links: • A and B are C’s backlinks • C is A and B’s forward link • Generally, a webpage is important if it has a lot of backlinks, but it’s not always true!

  7. Simplified Algorithm of Page Rank Example 1 Suppose there are 4 webpages: A, B, C and D Assume B, C, D points to A. Each link would transfer 0.25 PR to A. PR(A)= PR(B)+PR(C)+PR(D) Page Rank = 0.75 Remember 0.25 is just a random assumption!

  8. Page Rank Algo Contd.. Example 2 • Now, lets say Page B has a link to page C and A, page C has a link to page A, and Page D has links to all 3 pages • PR(A)= PR(B)/2 + PR(C)/1+PR(D)/3 • We can say that Page B contributes to .25/2=.125 to page A and Page C, Page C still contributes 0.25 , and page D contributes 0.25/3=0.083 to A So Page Rank of A=.125+0.25+0.083=0.458

  9. Page Rank: Damping Factor • Damping Factor: Probability that a user stops clicking links and request another random page. It is originally set to be 0.85 or 85%. PR(A)= 1-d/N+d(PR(B)/L(B)+PR(C)/L(C)+PR(D)/L(D)+……) Where N: number of documents in the collection

  10. Page rank: Matrix form R is solution of the equation • Adjacency function l(pi, pj)=0 if page pj doesn’t link to pi, else 1.

  11. Page Rank Summary • Query Independent • A global ranking of all web pages based on their locations in the web graph structure • Uses information which is external to the web pages – backlinks • Backlinks from important pages are more significant than backlinks from average pages

  12. Interesting Points about Page Rank • Aaron Wall quoted “Page Rank is certainly important to driving indexing, but it nowhere near as important as it once was in terms of delivering top rankings…” • Do sites have Page Rank? • No, they don’t. It applies to individual pages • It is just one out of several algorithms used by google for ranking webpages!

  13. HITS Algorithm • Stands for Hyperlink-Induced Topic Search. • Used by Twitter 1. Authorities are pages containing useful information • course home pages • home pages of auto manufacturers 2. Hubs are pages that link to authorities • course bulletin • list of US auto manufacturers

  14. HITS Contd.. • A good hub links to many good authorities • A good authority is linked from many good hubs • Authority Update: Update each node authority’s score to be equal to sum of Hub scores of each node that points to it. • Hub Update: Update each node’s Hub score to be equal to sum of Authority scores that it points to.

  15. Hub Score and Authority Score • Start with each node having a hub score and authority score of 1. • Run the Authority update rule • Run the Hub update Rule • Normalize those values by dividing Authority and Hub score by square root of sum of squares of all Auth. scores and Hub scores resp. • Just keep repeating

  16. Interesting Points about HITS! • Query Dependent • Executed at query time, and not at indexing time • Not widely used • Computes two scores per document

More Related