1 / 39

Link Analysis Ranking Algorithms on the World Wide Web

Link Analysis Ranking Algorithms on the World Wide Web.

becka
Download Presentation

Link Analysis Ranking Algorithms on the World Wide Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Link Analysis Ranking Algorithms on the World Wide Web Allan Borodin Computer Science Department, University of Toronto, and GammasiteGareth O. RobertsDepartment of Mathematics and Statistics, Lancaster University Jeffrey S. RosenthalDepartment of Statistics, University of TorontoPanayiotis TsaparasComputer Science Department, University of Toronto

  2. Link Analysis Ranking on the Web • View the Web as a graph • Each web page is a node • Each hyperlink is a directed edge • Underlying Intuition: • A link from node i to node j, denotes endorsement of node j as an authority on a topic • The Problem: Mine the Web Graph Secrets • Discover good authorities. • Rank nodes according to their authority weight

  3. Roadmap • Previous Work • Extensions of existing algorithms • A novel Bayesian Algorithm • Experimental Results • A Theoretical Framework • The Grand Finale

  4. Previous Algorithms • Page Rank [Brin and Page 1997] • Query independent • Random Surfer Model • Hubs and Authorities [Kleinberg 1997] • Query dependent • Kleinberg [Kleinberg 1997] • SALSA [Lempel and Moran 2000] • Other • [Henzinger and Bharat 1998] • [Rafiei and Mendelzon 2000] • PHITS [Cohn and Chang 2001]

  5. Hubs and Authorities • Create Root Set from text-based search engine. • Expand to Base Set • Construct underlying Graph • Remove intra-domain links

  6. Hubs and Authorities • Pages with double identity (hubs, authorities) • Good hubs point to many good authorities • Good authorities are pointed by many good hubs • Assign each page i an authority weight and a hub weight . • Target: Find good authorities

  7. Kleinberg Algorithm • Initialize all weights to 1. • Repeat until convergence • I operation: authorities “collect” the hub weights • O operation : hubs “collect” the authority weights • Normalize weights under some norm

  8. Kleinberg Algorithm (cont.) • Equivalent to SVD decomposition of adjacency matrix A • Authority weights converge to principal eigenvector of ATA • Hub weights converge to principal eigenvector of AAT

  9. SALSA • Replace the I and O operations with • I’ operation: authorities average the hub weights • O’ operation: hubs average the authority weights

  10. SALSA • Equivalent to a random walk on the bipartite graph • For a connected component the stationary distribution satisfies • For the whole graph, pick starting point uniformly at random. The authority weight of node i in component j

  11. pSALSA • Pick the initial node with probability proportional to the popularity (in-degree) of the node. • Perform the same random walk as in SALSA • Stationary distribution

  12. Kleinberg Algorithm and Random Walks • pSALSA is equivalent to a single I operation. • The nth step of the Kleinberg algorithm gives weight • The stationary distribution of a random walk with transition probabilities

  13. Roadmap • Previous Work • Extensions of existing algorithms • A novel Bayesian Algorithm • Experimental Results • A Theoretical Framework • The Grand Finale

  14. Hub-Averaging Algorithm • Asymmetric view of Hubs, Authorities • Good hubs point only to good authorities • Algorithm • Perform I operation of Kleinberg • Perform O’ operation of SALSA

  15. Threshold Algorithms • Hub Threshold Algorithm • I operation: Keep only the hub weights above average • Authority Threshold Algorithm • O operation: Keep only the top K authority weights • Full Threshold Algorithm • Apply Thresholds to both I and O operations

  16. Breadth First Search Algorithm • pSALSA: weights according to 1-neighborhood popularity • Kleinberg: weights according to global structure • BFS: Combine • Assign weights according to n-neighborhood popularity. • Visit neighbors in a BFS fashion, alternating between B and F steps • Apply exponentially decreasing weighting

  17. Roadmap • Previous Work • Extensions of existing algorithms • A novel Bayesian Algorithm • Experimental Results • A Theoretical Framework • The Grand Finale

  18. Bayesian Algorithm:The Model • Assign to each page i parameters • ( : “link tendency” parameter) • Probability of a link between i and j • Simplified Bayesian: • Assign prior distributions to the parameters

  19. Bayesian Algorithm:The Algorithm • Condition on the observed adjacency matrix A • Obtain posterior distribution using Bayes Rule • Compute the conditional means using Metropolis Algorithm • Output the conditional means of the authority parameters

  20. Roadmap • Previous Work • Extensions of existing algorithms • A novel Bayesian Algorithm • Experimental Results • A Theoretical Framework • The Grand Finale • http://www.cs.toronto.edu/~tsap/experiments

  21. Experimental Results • No undisputed “best” algorithm • No algorithm performs consistently well • There are queries where no algorithm performs well • There are queries where all algorithms perform well • Some algorithms are more “focused”, others more “spread” • Some algorithms are more prone to topic drift • The construction of the Base Set Graph is very important

  22. Experimental Results • Kleinberg • Converges to the most Tightly Knit Community (TKC phenomenon) • Prone to topic drift • pSALSA • Spreads the authority weight over different communities • May introduce spurious authorities • The two ends of the spectrum (east v.s. west) (genetic)

  23. Comparative Evaluation of Algorithms HThresh AThresh Hub-Avg BFS SBayesian FThresh Bayesian • Similarity: intersection over top ten • Simplified Bayesian very close to pSALSA • Threshold algorithms close to Kleinberg • Other algorithms range in the middle

  24. Roadmap • Previous Work • Extensions of existing algorithms • A Bayesian Approach • Experimental Results • A Theoretical Framework • The Grand Finale

  25. A Theoretical Framework • Link Analysis Ranking algorithm A • A(G)[j]: authority weight of jth page. • L-algorithm A: vector A(G) is normalized under L norm • Unnormalized algorithm: no normalization at any step • e.g. unnormalized pSALSA

  26. Monotonicity • Definition: An algorithm A is monotone if for any two nodes, j,k: • All algorithms we consider are monotone

  27. Similarity:Distance Measures • norm: • Distance between weight vectors • Distance between algorithms

  28. Similarity:Distance Measures • Rank Distance (counts the number of swapped pairs) • Distance between weight vectors • Distance between algorithms

  29. Similarity • Definition:Two -algorithms are similar if • Definition: Two algorithms are rank similar if • Definition: Two algorithms are rank matching if

  30. Similarity:Results • Hub-Averaging and pSALSA are neither similar, nor rank similar • Kleinberg and pSALSA are neither similar nor rank similar • Kleinberg and Hub-Averaging are neither similar nor rank similar

  31. Stability • Intuition: An algorithm is stable if small changes on the graph have small effect on the output • Let • Definition: An -algorithm A is stable if • Definition: An algorithm A is rank stable if

  32. Stability: Results • Kleinberg and Hub-Averaging are neither stable nor rank stable • pSALSA is stable, and rank stable

  33. Locality • Let • Local: • Pairwise local: • Rank Local:

  34. Locality: Results • Unnormalized pSALSA is local • pSALSA is rank local, and pairwise local • Theorem (Uniqueness of pSALSA) Any algorithm that is monotone, label independent and local is rank matching with pSALSA

  35. Future Work • Investigate the use of other statistical and machine learning techniques for link analysis • Expand and explore the Theoretical Framework • Investigate the similarity of Simplified Bayesian and pSALSA

More Related