1 / 45

Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations. Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University. Giants. Search:. Results: 1. The Official Site of the San Francisco Giants. Results:

stacie
Download Presentation

Extrapolation Methods for Accelerating PageRank Computations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University

  2. Giants Search: Results: 1. The Official Site of the San Francisco Giants Results: 1. The Official Site of the New York Giants Motivation • Problem: • Speed up PageRank • Motivation: • Personalization • “Freshness” Note: PageRank Computations don’t get faster as computers do.

  3. 0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  4. Link Counts Taher’s Home Page Sep’s Home Page CS361 DB Pub Server CNN Yahoo! Linked by 2 Unimportant pages Linked by 2 Important Pages

  5. importance of page i importance of page j number of outlinks from page j pages j that link to page i Definition of PageRank • The importance of a page is given by the importance of the pages that link to it.

  6. 1/2 1/2 1 1 0.05 0.25 0.1 0.1 0.1 Definition of PageRank Sep Taher DB Pub Server CNN Yahoo!

  7. PageRank Diagram 0.333 0.333 0.333 Initialize all nodes to rank

  8. PageRank Diagram 0.167 0.333 0.333 0.167 Propagate ranks across links (multiplying by link weights)

  9. PageRank Diagram 0.5 0.333 0.167

  10. PageRank Diagram 0.167 0.5 0.167 0.167

  11. PageRank Diagram 0.333 0.5 0.167

  12. PageRank Diagram 0.4 0.4 0.2 After a while…

  13. importance of page i importance of page j number of outlinks from page j pages j that link to page i Computing PageRank • Initialize: • Repeat until convergence:

  14. .1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2 Matrix Notation

  15. .1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .2 Matrix Notation Find x that satisfies:

  16. Power Method • Initialize: • Repeat until convergence:

  17. Find x that satisfies: Find x that satisfies: A side note • PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET. • So the PageRank problem is really: not:

  18. Power Method • And the algorithm is really . . . • Initialize: • Repeat until convergence:

  19. 0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  20. Power Method Express x(0) in terms of eigenvectors of A u1 1 u2 a2 u3 a3 u4 a4 u5 a5

  21. Power Method u1 1 u2 a22 u3 a33 u4 a44 u5 a55

  22. Power Method u1 1 u2 a222 u3 a332 u4 a442 u5 a552

  23. Power Method u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

  24. Power Method u1 1 u2 0 u3 0 u4 0 u5 0

  25. Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A. u1 1 u2 a2 u3 a3 u4 a4 u5 a5 Why does it work? • Imagine our n x n matrix A has n distinct eigenvectors ui.

  26. All less than 1 Why does it work? • From the last slide: • To get the first iterate, multiply x(0) by A. • First eigenvalue is 1. • Therefore:

  27. u1 1 u2 a22 u3 a33 u4 a44 u5 a55 u1 1 u2 a222 u3 a332 u4 a442 u5 a552 Power Method u1 1 u2 a2 u3 a3 u4 a4 u5 a5

  28. Convergence • The smaller l2, the faster the convergence of the Power Method. u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

  29. Our Approach Estimate components of current iteratein the directions of second two eigenvectors, and eliminate them. u1 u2 u3 u4 u5

  30. Why this approach? • For traditional problems: • A is smaller, often dense. • l2 often close to l1, making the power method slow. • In our problem, • A is huge and sparse • More importantly, l2 is small1. • Therefore, Power method is actually much faster than other methods. 1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

  31. x(0) u1 u1 u2 u3 u4 u5 Using Successive Iterates

  32. u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) u1

  33. u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x(2) u1

  34. u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x(2) u1

  35. u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x’ = u1

  36. How do we do this? • Assume x(k) can be written as a linear combination of the first three eigenvectors (u1, u2, u3) of A. • Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

  37. Assume • Assume the x(k) can be represented by first 3 eigenvectors of A

  38. Linear Combination • Let’s take some linear combination of these 3 iterates.

  39. Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.

  40. Summary • We make an assumption about the current iterate. • Solve for dominant eigenvector as a linear combination of the next three iterates. • We use a few iterations of the Power Method to “clean it up”.

  41. 0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

  42. Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!

  43. Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)

  44. Take-home message • Speeds up PageRank by a fair amount, but not by enough for true Personalized PageRank. • Ideas are useful for further speedup algorithms. • Quadratic Extrapolation can be used for a whole class of problems.

  45. The End • Paper available at http://dbpubs.stanford.edu/pub/2003-16

More Related