distributed pagerank computation based on iterative aggregation disaggregation methods n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods PowerPoint Presentation
Download Presentation
Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods

Loading in 2 Seconds...

play fullscreen
1 / 34

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods. Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen. Outline. Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods' - jory


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
distributed pagerank computation based on iterative aggregation disaggregation methods

Distributed PageRank ComputationBased on Iterative Aggregation-Disaggregation Methods

Yangbo Zhu, Shaozhi Ye and Xing Li

Tsinghua University, Beijing, China

ACM CIKM 2005, Bremen

outline
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
    • Motivation
    • Basic Idea
    • Algorithm
  • Experiments
  • Conclusion and Future Work
pagerank background
PageRank - Background

Ranking Web pages

  • Content-based methods
  • Link-based methods
    • PageRank [Page & Brin, 1998]
    • HITS [Kleinberg, 1998]
    • SALSA [Lempel & Moran, 2000]
pagerank intuition
PageRank - Intuition
  • Page A points to B means that the author of A recommends B.
  • A page is of high quality if it is
    • referred to by many other pages
    • referred to by pages of high quality
pagerank model
PageRank - Model
  • Random Surfer - Markov Chain
outline1
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
    • Motivation
    • Basic Idea
    • Algorithm
  • Experiments
  • Conclusion and Future Work
motivation
Motivation
  • Compass search engine confederation
basic idea
Basic Idea
  • Divide and conquer
  • Make use of the natural block structure of web graphs
dpc algorithm
DPC Algorithm
  • Step 1 - Initialization

Local nodes compute local PageRank vectors.

dpc algorithm cont
DPC Algorithm (cont.)
  • Step 2 - Aggregation

Central node computes the NodeRank vector.

dpc algorithm cont1
DPC Algorithm (cont.)
  • Step 3 - Disaggregation

Local nodes compute extended local PageRank vectors.

X: External nodes

dpc algorithm cont2
DPC Algorithm (cont.)
  • Step 4 - Central node computes the L1 distance between current global PageRank vector and previous one.
advantages
Advantages
  • DPC mainly consists of standard PageRank computation.
  • Small matrices fit into main memory.
  • Low communication overhead.
outline2
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
    • Motivation
    • Basic Idea
    • Algorithm
  • Experiments
  • Conclusion and Future Work
experimental setup
Experimental Setup
  • Simulation on a single Linux box.
  • Group web pages by sites.
  • For comparison
    • Classic power method
    • LPR-Ref-2 algorithm in [Wang, VLDB 2004]
data sets
Data Sets
  • ST01/03 - crawled in 2001/2003 by Stanford WebBase Project
  • CN04 - crawled in 2004 from web sites in China.
evaluation metrics
Evaluation Metrics
  • L1 distance
  • Kendall's τ-distance

if page i and j are in different order in the two ranking lists.

convergence rate
Convergence Rate

Number of iteration

for convergence

( )

outline3
Outline
  • Quick Review of PageRank
  • Distributed PageRank Computation
  • Experiments
  • Conclusion and Future Work
conclusion
Conclusion
  • A distributed PageRank computation algorithm based on iterative aggregation-disaggregation (IAD) methods with Block Jacobi smoothing.
  • Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method.
future work
Future Work
  • Implement DPC in distributed system. Integrate with Compass search engine confederation.
  • How to update PageRank vectors efficiently within DPC framework?
iad method notations
IAD Method - Notations
  • Aggregation matrix(n×N)
  • Disaggregation matrix(N×n)
dpc convergence analysis
DPC -Convergence Analysis
  • The global convergence of IAD method is still an open problem.
  • The difficulty partly comes from that the disaggregation step is non-linear.
  • The paper proves the global convergence of Block Jacobi method in PageRank scenario when n > 2.
experiments basic facts
Experiments - Basic Facts
  • Distribution over number of pages hosted by sites of different size
  • Distribution over size of sites
experiments communication overhead
Experiments - Communication Overhead

Pos(•) - Number of positive elements

L/U - Block strictly lower/upper triangular part of P

Power

LPR-Ref-2 / DPC