1 / 29

Colibri: Fast Mining of Large Static and Dynamic Graphs

Colibri: Fast Mining of Large Static and Dynamic Graphs. Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong. Aug. 24-27, 2008, Las Vegas KDD 2008.

Download Presentation

Colibri: Fast Mining of Large Static and Dynamic Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Colibri: Fast Mining of Large Static and Dynamic Graphs Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008

  2. Graphs are everywhere! • Q: How to find patterns? • e.g., community, anomaly, etc.

  3. Motivation • Q: How to find patterns? • e.g., community, anomaly, etc. • A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. X X A L M R ~ ~

  4. LRA for Graph Mining: Example Adj. matrix: A L M R John X X ICDM Tom KDD Conf. Cluster Bob Interaction Carl ISMB Au. clusters Van ~ ~ RECOMB Roy Author Conf. Recon. error is high  ‘Carl’ is abnormal

  5. Challenges • How to get (L, M, R) + Efficiently (both time and space); + Intuitively (easy for interpretation); + Dynamically (track patterns over time)?

  6. Roadmap Motivation Existing Methods SVD CUR/CX Proposed Methods: Colibri Experimental Results Conclusion

  7. Matrix & Column Space 3 1 1 1 0 0 • Matrix • Column Space of a Matrix B = b1 , b2 are vectors in 3-d space! b1 b2 b2 b1

  8. Projection, Projection Matrix & Core Matrix v + B BTB BT v X X X = Core Matrix ~ ~ v v Projection matrix of B Projection of v An arbitrary vector

  9. Singular-Value-Decomposition (SVD) … …. …. …. …. a1 a2 a3 am u1 uk v1 x x … … … … … … … … vk V: right singular vectors … … … ~ … ~ A: n x m U: left singular vectors

  10. SVD: How to • #1: Find the left matrix U, where • #2: Project A into the column space of U Projection Matrix of Column Space of U

  11. SVD: drawbacks A U V • Efficiency • Time • Space (U, V) are dense • Interpretation • Dynamic: not easy = 1st singular vector 2nd singular vector

  12. CUR (CX) decomposition …. …. …. …. x x … … … U R … … … • Sample Columns from A to form C • Project A onto the col. Space of C ~ … … … ~ C A: n x m

  13. CUR (CX): advantages • Efficiency (better than SVD) • Time • (c is # of sampled col.s) • Space (C, R) are sparse • Interpretation

  14. CUR (CX): drawbacks • Redundancy in C, wasting both time and space • Dynamic: not easy • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red…

  15. Roadmap Motivation Existing Methods Colibri Colibri-S for static graphs Colibri-D for dynamic graphs Experimental Results Conclusion

  16. Colibri-S: Basic Idea Colibri-S CUR (CX) x Original Matrix x …. …. …. M R • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red… L We want the Col.s in L are linearly independent with each other!

  17. Input Output …. …. …. …. L = : Linearly Ind. Col.s -1 ? LT L M= = Core Matrix …. Initially Sampled matrix C Q: How to find L & M from C efficiently? R = LT x A = ….

  18. A: Find L & M iteratively! Initial Sampled Matrix c …. … Current L & M Redundant ? discard v For each col. v in C Project it on L Expand L & M

  19. Colibri-S vs. CUR(CX) • Quality: • Colibri-S = CUR(CX) • Time: • Colibri-S >= CUR(CX) • Space • Colibri-S >= CUR(CX) • Illustrations CUR (CX) Colibri-S

  20. Colirbri-D for dynamic graphs Mt Rt t Lt Initially sampled matrix Mt+1 Rt+1 ? Lt+1 t+1 Q: How to update L and M efficiently?

  21. Colibri-D: How-To Selected Redundant Selected Redundant Mt Rt t Lt Initially sampled matrix Mt+1 Rt+1 ? t+1 Lt+1 Changed from t

  22. Colibri-D: How-To Mt Lt Selected Redundant Selected Redundant t ~ M Unchanged Cols! ~ Subspace by blue cols at t+1 L Initially sampled matrix t+1 Mt+1 Lt+1

  23. Roadmap Motivation Existing Methods Colibri Experimental Results Conclusion

  24. Experimental Setup • Datasets • Network traffic • 21,837 sources/destinations • 1,222 consecutive hours • 22,800 edges per hour • Accuracy: • Accu = • Space Cost:

  25. Performance of Colibri-S CUR CUR • Accuracy • Same 91%+ • Time • 12x of CMD • 28x of CUR • Space • ~1/3 of CMD • ~10% of CUR CMD CMD Ours Ours Time Space

  26. More Evaluation on Colibri-S Log Time (Sec) CUR CMD Colibri-S Approximation Accuracy

  27. Performance of Colibri-D Time CMD Colibri-S Colibri-D # of changed cols Colibri-D achieves up to 112x speedups

  28. A Family of Low-Rank Approximationfor Fast Graph Mining • Colibri-S • For static graphs • Remove redundancy • Significant saving in time & space by “free” • Colibri-D • For dynamic graphs • Explores “smoothness” • Up to 112x than best known methods

  29. Poster tonight! Thank you! www.cs.cmu.edu/~htong

More Related