1 / 1

Updating PageRank by Iterative Aggregation

Updating PageRank by Iterative Aggregation. Amy N. Langville and Carl D. Meyer Mathematics Department North Carolina State University {anlangvi,meyer}@ncsu.edu. The PageRank Problem. Convergence of the Iterative Aggregation Algorithm.  T =  T P. Solve. _.

Download Presentation

Updating PageRank by Iterative Aggregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Updating PageRank by Iterative Aggregation Amy N. Langville and Carl D. Meyer Mathematics Department North Carolina State University {anlangvi,meyer}@ncsu.edu The PageRank Problem Convergence of the Iterative Aggregation Algorithm  T = TP Solve _ i = importance of page i • Iterative aggregation converges to PageRank vector for all partitions S = G G. • There always exists a partition such that the asymptotic rate of convergence is strictly less than • the convergence rate of PageRank power method. P = Performance of the Iterative Aggregation Algorithm PageRank Power Iterative Aggregation Residual plot for Good Partition Iterations Time | G | Iterations Time Solution: Power Method NCState.dat  (k+1)T = (k)TP 10,000 pages 101,118 links The Updating Problem ~ ~ Given T, P,P, find T Residual plot for Bad Partition PageRank Power Iterative Aggregation Iterations Time | G | Iterations Time ~ P = Calif.dat 9,664 pages 16,150 links ~ Naïve Solution: full recomputation, power method on P on monthly basis Advantage The Iterative Aggregation Solution to Updating _ • This iterative aggregation algorithm can be combined with other PageRank acceleration • techniques to achieve even greater speedups. • Partition Nodes into two sets, G and G PageRank Power Power + Quad(10) Iterative Aggregation Iter. Aggregation + Quad(10) Iterations Time Iterations Time | G | Iterations Time Iterations Time _ • Aggregate: lump nodes in G into one supernode A = • Solve small |G+1|  | G+1| chain called A Residual Plot of 4 solution methods applied to NCState.dat using Quad(10), |G|=1000 • Disaggregate to get approximation to full-sized PageRank • Iterate Problems • Algorithm is very sensitive topartition. Much more theoretical work must be done to determine • which nodes go into G. • We need faster machines with more memory to test on larger datasets, >500K pages. Testing • requires storage of more vectors and matrices, such as stochastic complements and • censored vectors. • We need actual datasets that vary over time. Currently, we are creating artificial updates to datasets.

More Related