1 / 34

DOULION: Counting Triangles in Massive Graphs with a Coin

This paper presents a sampling approach called DOULION for efficiently estimating the number of triangles in massive graphs. It provides an outline of the motivation, related work, proposed method, results, and conclusion.

lesliee
Download Presentation

DOULION: Counting Triangles in Massive Graphs with a Coin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DOULION: Counting Triangles in Massive Graphs with a Coin Charalampos (Babis) Tsourakakis Carnegie Mellon UniversityKDD ‘09Paris Joint work with: U Kang, Gary L. Miller, Christos Faloutsos DOULION, KDD 09

  2. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  3. Why is Triangle Counting important? A C B [WF94)] • Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02]) • Motif Detection, (e.g., [YPSB05] ) • Web Spam Detection (Becchetti et.al. KDD ’08 [BBCG08]) Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” DOULION, KDD 09

  4. Personal Motivation [CET08] Political Blogs eigenvalues of adjacency matrix Keep only 3! 3 i-th eigenvector DOULION, KDD 09

  5. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  6. Counting methods Sparse graphs Matrix Multiplication not practical M. Latapy, Theory and Experiments Dense graphs DOULION, KDD 09

  7. Naive Sampling X=1 T3 X=0 T0 T1 T2 r independent samples of three distinct vertices DOULION, KDD 09

  8. Naive Sampling Then the following holds: with probability at least 1-δ Works Prohibitive for graphs with T3=o(n2). e.g., T3 n2logn r independent samples of three distinct vertices DOULION, KDD 09

  9. Buriol, Frahling, Leonardi, Marchetti-Spaccamela, Sohler k Sample uniformly at random an edge (i,j) and a node k in V-{i,j} ? ? i j Check if edges (i,k) and (j,k) exist in E(G) samples DOULION, KDD 09

  10. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  11. Our Sampling Approach G(V,E) 1/p i j HEADS! (i,j) “survives” DOULION, KDD 09

  12. Our Sampling Approach G(V,E) k m TAILS! (k,m) “dies” DOULION, KDD 09

  13. Sampling approach DOULION, KDD 09

  14. Our Sampling Approach on Kn Kn Gn,0.5 In Expectation Initially Weighted * DOULION, KDD 09

  15. Mean and Variance Δ=#triangles=k+(Δ-k) k non-edge-disjoint triangles X r.v, our estimate E[Χ]=Δ DOULION, KDD 09

  16. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  17. Doulion and NodeIterator Sparsify first and then use Node Iterator to count triangles. Node Iterator: Consider each node and count how many edges among its neighbors DOULION, KDD 09

  18. Expected Speedup Expected Speedup: 1/p2 Proof Let R be the running time of Node Iterator after the sparsification: Therefore, expected speedup: DOULION, KDD 09

  19. Some results (I) ~3M, ~35M ~400K, ~2.1M DOULION, KDD 09

  20. Some results (II) ~3.1M, ~37M ~3.6M, ~42M DOULION, KDD 09

  21. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  22. Conclusions New Sampling approach that counts triangles approximately. Basic analysis of the estimate (expectation, variance, expected speedup) Experimentation on many real world datasets where we showed that for p=constant we get high quality estimates and 1/p2constant speedups. DOULION, KDD 09

  23. Question Can p be smaller than constant? How small can we afford p to be and at the same time guarantee concentration? Could e.g., p be as small as 1/ ??? Motivation: DOULION, KDD 09

  24. Outline Motivation Related Work Proposed Method Results Conclusion Extra DOULION, KDD 09

  25. Approximate Triangle Counting Approximate Triangle CountingArxiv preprint http://arxiv.org/PS_cache/arxiv/pdf/0904/0904.3761v1.pdf C.E.T M.N. Kolountzakis G.L. Miller DOULION, KDD 09

  26. TheoremC.E.T, Kolountzakis, Miller 2009 Mildness, pick p=1 How to choosep? Concentration DOULION, KDD 09

  27. Wikipedia 2005 1,6M nodes 18,5M edges Practitioner’s Guide Pickp=1/ Keep doubling until concentration Concentration appears Concentration becomes stronger DOULION, KDD 09

  28. “Bad” Instances Remove edge (1,2) Remove any weighted edgew sufficiently large DOULION, KDD 09

  29. Thanks! http://www.cs.cmu.edu/~ctsourak/projects.html Code and datasets available graphminingtoolbox@gmail.com (HADOOP, MATLAB, JAVA implementations along with small real-world graphs, all datasets used are on the web) An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software environment and the complete set of instructions which generated the figures. Buckheit and Donoho[BD95] DOULION, KDD 09

  30. References • Efficient semi-streaming algorithms for local triangle counting in massive graphs Becchetti, Boldi, Castillio, Gionis[BBCG08] • Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast Ye, Peyser, Spencer, Bader [YPSB05] DOULION, KDD 09

  31. References Curvature of co-links uncovers hidden thematic layers in the World Wide Web Eckmann, Moses [EM02] DOULION, KDD 09

  32. References Fast Counting of Triangles in Large Real-World Networks: Algorithms and LawsC. Tsourakakis [BD95] Wavelab and reproducible research Buckheit, Donoho DOULION, KDD 09

  33. References Social Network Analysis: Methods and Applications Wasserman, Faust [WF94] Counting triangles in data streams Buriol, Frahling, Leonardi, Spaccamela, Sohler [BFLSS06] DOULION, KDD 09

  34. Doulion DOULION, KDD 09

More Related