1 / 55

Fast Counting of triangles in large networks: Algorithms and laws

Charalampos (Babis) Tsourakakis School of Computer Science Carnegie Mellon University http://www.cs.cmu.edu/~ctsourak. Fast Counting of triangles in large networks: Algorithms and laws. RPI Theory Seminar, 24 November 2008. Counting Triangles.

simone
Download Presentation

Fast Counting of triangles in large networks: Algorithms and laws

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Charalampos (Babis) Tsourakakis School of Computer ScienceCarnegie Mellon University http://www.cs.cmu.edu/~ctsourak Fast Counting of triangles in large networks: Algorithms and laws RPI Theory Seminar, 24 November 2008

  2. Counting Triangles • Given an undirected, simple graph G(V,E) a triangle is a set of 3 vertices such that any two of them by an edge of the graph. • Related Problems a) Decide if a graph is triangle-free. b) Count the total number of triangles δ(G). c) Count the number of triangles δ(v) that each vertex v participates at. d) List the triangles that each vertex v participates at. Our focus RPI, November 2008

  3. Why is triangle counting important*? • Social Network Analysis:“Friends of friends are friends” [WF94] • Web Spam Detection [BPCG08] • Hidden Thematic Structure of the Web [EM02] • Motif Detection e.g. biological networks [YPSB05] *few indicative reasons, from the graph mining perspective RPI, November 2008

  4. Why is triangle counting important? Furthermore, two often used metrics are: • Clustering Coefficient where: • Transitivity Ratio where: Triple at node v v Triangle RPI, November 2008

  5. Outline • Related Work • Proposed Method • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  6. Counting methods Dense graphs Sparse graphs RPI, November 2008

  7. Outline • Related Work • Proposed Method • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  8. Outline of the Proposed Method • EigenTriangle theorem • EigenTriangleLocal theorem • EigenTriangle algorithm • EigenTriangleLocal algorithm • Efficiency & Complexity • Power law degree distributions • Gershgorin discs • Real world network spectra RPI, November 2008

  9. Theorem [EigenTriangle] • Theorem The number of trianglesδ(G) in an undirected, simple graph G(V,E) is given by: where are the eigenvalues of the adjacency matrix of graph G. RPI, November 2008

  10. Proof • Call A the adjacency matrix of the graph. Consider the i-th diagonal element of A3, αii. This element is equal to the number of triangles vertex i participates at. So the trace is 6δ(G) because each triangle is counted 6 times (3 participating vertices and is also counted as i-j-k, and i-k-j). Furthermore, if Ax=λx, then λ3 is an eigenvalue of A3 (*) and vice versa if λ is an eigenvalue of A3 , then is an eigenvalue of A. * A3 x=AAAx=AAλx=λΑΑx=λΑλx=λ2Αx=λ3x RPI, November 2008

  11. Theorem [EigenTriangleLocal] • Theorem The number of triangles δ(i) vertex ipartipates at is equal to:where is the j-th entry of the i-th eigenvector • Proof [Sketch]Follows from the previous theorem and the fact that A is symmetric, therefore diagonalizable and also RPI, November 2008

  12. EigenTriangle Algorithm RPI, November 2008

  13. EigenTriangleLocal Algorithm Why are these two algorithms efficient? RPI, November 2008

  14. Skewed Degree Distributions • Skewed degree distribution ubiquitous in nature! Have been termed as “the signature of human activity”[FKP02] but appear as well to all other kind of networks, e.g. biological. See [N05][M04] for generative models of power law distributions. • Typically referred to as power-laws (even if sometimes we abuse the strict definition of a power law, i.e ). RPI, November 2008

  15. Examples of power laws • Newman [N05] demonstratedhow often power laws appearusing may different types ofnetworks, ranging from wordfrequencies to population ofcities. Many cities havea small population Few cities havea huge population RPI, November 2008

  16. Gershgorin’s Discs • Theorem Let B an arbitrary matrix. Then the eigenvalues λ of B are located in the union of the ndiscs • For a proof see Demmel [D97], p.82. RPI, November 2008

  17. Gershgorin Discs • Bounds on the airports network (Observe how loose) RPI, November 2008

  18. Typical real world spectra Political blogs Airports RPI, November 2008

  19. Top Eigenvalues • Zooming in the top eigenvalues and plotting the rank vs. the eigenvalue in log-log scale reveals that the top eigenvalues follow a power law [FFF99] • Some years later, Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] proved this fact. RPI, November 2008

  20. Our idea • Simple & clear: Use a low-rank approximation of A3 to estimate the diagonal elements and the trace. • Suggests also a way of thinking:Take advantage of special properties (e.g. power laws) to reduce the complexity of certain computational tasks in real-world networks. RPI, November 2008

  21. Summing up: Why does it work? • Almost symmetry of the spectrum around 0 for the bulk of the eigenvalues except the top ones is the first main reason. • Cubes amplify strongly this phenomenon! RPI, November 2008

  22. Complexity Analysis • Main computational bottleneck that determines the complexity is the Lanczos method. • Lanczos runs in linear time with respect to the non-zero entries of the matrix, i.e. the edges, assuming that we compute a few constant number of eigenvalues. • Convergence of Lanczos is fast due to the eigenvalue power law (see Kaniel-Paige theory [GL89]) RPI, November 2008

  23. Outline • Related Work • Proposed Method • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  24. Datasets RPI, November 2008

  25. Competitor: Node Iterator • Node Iterator algorithm considers each node at the time, looks at its neighbors and checks how many among them are connected among them. • Complexity: O(n ) • We report the results as the speedup that EigenTriangle algorithm gives compared to the running time of the Node Iterator . RPI, November 2008

  26. Results: #Eigenvalues vs. Speedup RPI, November 2008

  27. Results: #Edges vs. Speedup RPI, November 2008

  28. Main points • Some interesting facts for the two scatterplots: • Mean required approximations rank for at least 95% is 6.2 • Speedups are between 33.7x and 1159x. • The mean speedup is 250. • Notice the increasing speedup as the size of the network grows. RPI, November 2008

  29. Zooming in Zoomingin this point RPI, November 2008

  30. Evaluating the Local Counting Method • Pearson’s correlation coefficient ρ • Relative Reconstruction Error Political Blogs:RRE 7*10-4 ρ 99.97% RPI, November 2008

  31. #Eigenvalues vs. ρ for three networks Observe how a low rank results in almost optimal results.This holds for surprisingly manyreal world networks RPI, November 2008

  32. Outline • Related Work • Proposed Method • Experiments • Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  33. Triangle Participation Law • Plots the number of trianglesδ (x-axis) vs. the count of vertices with δ participating triangles. (a) (b) (c) a) EPINIONS, who trusts-whosb) ASN, social networkc) HEP_TH, collaboration network RPI, November 2008

  34. Degree Triangle Law • Plots the degree di(x-axis) vs. the mean number of triangles that nodes with degree diparticipate at. Epinions ASN RPI, November 2008

  35. Outline • Related Work • Proposed Method • Experiments • New Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  36. Kronecker Graphs • This model was introduced in [LCKF05]. It is based on the simple operation of the Kronecker product to generate graphs that mimic real world networks. • Deterministic Kronecker Graphs: Kronecker Product of the adjacency matrix at the current step k with the initiator adjacency matrix (typically small). • Stochastic Kronecker Graphs: Kronecker Product of the matrix at the current step k with the initiator matrix. Initiator matrix contains probabilities.For more details see [LF07]. RPI, November 2008

  37. Triangles in Kronecker Graphs • Some notation first:A: nxninitiatior adjacency matrix of the undirected, simple graph GA B = A[k] k-th Kronecker product λ=(λ1,...,λn) the eigenvalues of A Δ(GA), Δ(GΒ) #triangles of GA , GΒ • Theorem [KroneckerTRC] RPI, November 2008

  38. Proof • We use induction on the number of recursion steps k. For k=0 the theorem trivially holds. Assume now that KroneckerTRC holds now for some .Call C=A[r], D=A[r+1] and the eigenvalues of C, [μi]i=1..s.By the assumption The eigenvalues of D are given by the Kronecker product . By the EigenTriangle theorem, the number of triangles in D is given by: RPI, November 2008

  39. Proof Therefore KroneckerTRC holds for all .Q.E.D RPI, November 2008

  40. Outline • Related Work • Proposed Method • Experiments • New Triangle-related Laws • Triangles in Kronecker Graphs • Future Work & Open Problems RPI, November 2008

  41. Theoretical Challenge I:Spectra of real world networks • Can we prove things about the distribution of the eigenvalues, adopting a random graph model such as the expected degree model G(w) [CLV03]? • An analog to Wigner’s semicircle law for random Erdos-Renyi graphs (see Furedi-Komlos [FK81]) Spectrum of over 100000 Iterations[S07] RPI, November 2008

  42. Theoretical Challenge I:Spectra of real world networks Empirically, the rest of the spectrum: Triangular-likedistribution[FDBV01] Can we prove Something aboutthis empirical observation ? RPI, November 2008

  43. Theoretical Challenge II: Eigenvectors of real world networks • Things even “worse” than the case of spectra. Very few knowledge about the eigenvectors. Related work:See [P08] for random graphs. RPI, November 2008

  44. Theoretical Challenge III: Degree Triangle Law • Prove using the expected degree random graph model G(w) the pattern we saw (see [S04]) • Conjecture: The relationship we observed probably appears for some cases of the slope of the degree distribution. Further experiments, recently showed that for some graphs this pattern does not hold. RPI, November 2008

  45. Experimental Challenge I:Compare with Streaming Methods • Streaming or Semi-Streaming methods, perform one or O(1) passes over the graph. [YKS02][BFLSS06][BPCG08] Common Underlying Idea: Sophisticated sampling methods • Implement and compare. RPI, November 2008

  46. Practical Challenge I:Triangles in Large Scale Graph Mining • Many Giga-byte and Peta-byte sized graphs. How to handle these graphs? HADOOP • EigenTriangle algorithms are based just on simple matrix vector multiplications. Easy to parallelize in all sorts of architectures (distributed memory , shared memory). See [DHV93] for the details. RPI, November 2008

  47. PEGASUS: Peta-Graph Miningfrom the Triangle perspective • On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research. • Among others: Implement EigenTriangle algorithms in HADOOP and compare to other methods. • Find outliers in graphs with many billions of edges wrt triangles. Soon…Stay tuned! RPI, November 2008

  48. Curious about: RPI, November 2008

  49. Acknowledgements • Christos Faloutsos • Yiannis Koutis For the helpful discussions RPI, November 2008

  50. Acknowledgements • Maria Tsiarli For the PEGASUS logo RPI, November 2008

More Related