1 / 23

Page Ranking Techniques In Search Engines

Page Ranking Techniques In Search Engines. Introduction . Need Increasing need of Search engine. Search results should be ordered by Relevancy. Importance. What is Page Ranking. Algorithms. HITS (Hyperlink Induced Topic Search) e.g.Alta Vista PageRank

rianna
Download Presentation

Page Ranking Techniques In Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Page Ranking TechniquesIn Search Engines

  2. Introduction • Need Increasing need of Search engine. Search results should be ordered by Relevancy. Importance. • What is Page Ranking

  3. Algorithms • HITS (Hyperlink Induced Topic Search) e.g.Alta Vista • PageRank e.g. Google.

  4. Definition – PageRank. We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor, which can be set between 0 and 1. We usually set d to 0.85 .……. C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Ref: Sergey Brin and Lawrence Page ”The Anatomy of a Large-Scale Hypertextual Web Search Engine” http://www-db.stanford.edu/~backrub/google.html

  5. How to use formula. e.g. 2 pages A and B, pointing to each other. A B

  6. Start with PR(A) = PR(B) =1 PR(A) = (1-d) + d * (PR(B)/C(B)) = (1-0.85) + 0.85 * (1/1) = 1 PR(B) = (1-d) + d * (PR(A)/C(A)) = (1-0.85) + 0.85 * (1/1) = 1

  7. Lets start with PR(A) = PR(B) = 10 After 1st iteration: PR(A) = (1-d) + d*(PR(B)/C(B)) = 0.15 + 0.85 * (10/1) = 8.65 PR(B) = (1-d) + d*(PR(A)/C(A)) = 0.15 + 0.85 * (8.65/1) = 7.50

  8. After 2nd iteration: PR(A) = (1-d) + d*(PR(B)/C(B)) = 0.15 + 0.85 * (7.50/1) = 6.527 PR(B) = (1-d) + d*(PR(A)/C(A)) = 0.15 + 0.85 * (6.527/1) = 5.698 And so on….. till?

  9. Ans: Iterations should be repeated till PR values converges…….. In this example ……..till PR(A) = PR(B) =1. Thus we can start with any values of PR, and should repeat iterations till PR values converges i.e. don’t change too much.

  10. Difference… Result of PR calculation. Google toolbar values

  11. ExamplesAssumption: We’ll take initial PR value of each page as 1.0

  12. Example 1 B PR(A) = (1-d) + d ( 0) = 0.15 PR(B) = (1-d) + d (0) = 0.15 A For practicing examples on PageRank use calculator: www.webworkshop.net/pagerank_calculator.php?lnks=2,10,15&iblprs=0.15,0.15,0.15,0.15&pgnms=&pgs=2&initpr=1&its=100&type=simple

  13. Example 2 PR (A) = (1-d) + d (PR(B) / C(B)) = 0.15 + 0.85 (1/1) = 1 PR (B) = (1-d) + d (0) = 0.15 Dangling links are links that go to pages that don't have any outbound links. Orphan pages are those, which don’t have any inbound link. A B

  14. Example 3 From here onwards I’ll represent final PR values after sufficient no. of iterations inside page. B 1.0 B 1.0 A 1.0 A 1.0 C 1.0 C 1.0

  15. Example 4 Observation: We can channel large proportion of PR of site to a particular page. B 0.575 A 1.85 C 0.575

  16. Example 5 Observation: We can reduce PR leak by increasing internal link structure. B 0.575 External Site1 1.0 A 1.0 C 0.575 External Site 2 0.638 B 1.255 External Site 1 1.0 A 2.6 C 1.255 External Site 2 1.215

  17. Example 5 Cont.. B 1.549 A 2.146 External Site 1 1.0 C 1.720 External Site 2 1.215

  18. How to increase PR? • By adding spam pages. • Join forum. • Submit to search engine directories. • Reciprocating links. • Contents.

  19. Adding spam pages. B 281.6 Spam 1 0.39 A 331.0 Spam 2 0.39 Spam 1000 0.39

  20. Conclusion. Even though formula for calculating PageRank seems to be difficult, it is easy to understand. But when a simple calculation is applied hundreds of times, the results can seem complicated. And we can not predict the result of these iterations. Surely, more practice can yield more observations. PageRank is important factor considered in Google ranking, but it is only one of the important factors considered. e.g. now a days Google is paying a lot of attention to the link’s anchor text while deciding relevancy of target page. But as Page Rank is also one of the important factor, one should be well aware of PageRank while designing the website.

  21. References. • http://www.webworkshop.net/pagerank.html •  http://www.iprcom.com/papers/pagerank/ • http://www-db.stanford.edu/~backrub/google.html • http://www.google.com/intl/en/technology/ • http://www.google-watch.org/pagerank.html

  22. ?

  23. Thanks

More Related