290 likes | 327 Views
Learn about efficient graph mining techniques using low-rank approximation for adjacency matrices, featuring the Colibri method for both static and dynamic graphs. Discover how to find patterns like communities and anomalies effectively with advanced algorithms. Speaker presentation by Hanghang Tong at KDD 2008.
E N D
Colibri: Fast Mining of Large Static and Dynamic Graphs Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008
Graphs are everywhere! • Q: How to find patterns? • e.g., community, anomaly, etc.
Motivation • Q: How to find patterns? • e.g., community, anomaly, etc. • A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. X X A L M R ~ ~
LRA for Graph Mining: Example Adj. matrix: A L M R John X X ICDM Tom KDD Conf. Cluster Bob Interaction Carl ISMB Au. clusters Van ~ ~ RECOMB Roy Author Conf. Recon. error is high ‘Carl’ is abnormal
Challenges • How to get (L, M, R) + Efficiently (both time and space); + Intuitively (easy for interpretation); + Dynamically (track patterns over time)?
Roadmap Motivation Existing Methods SVD CUR/CX Proposed Methods: Colibri Experimental Results Conclusion
Matrix & Column Space 3 1 1 1 0 0 • Matrix • Column Space of a Matrix B = b1 , b2 are vectors in 3-d space! b1 b2 b2 b1
Projection, Projection Matrix & Core Matrix v + B BTB BT v X X X = Core Matrix ~ ~ v v Projection matrix of B Projection of v An arbitrary vector
Singular-Value-Decomposition (SVD) … …. …. …. …. a1 a2 a3 am u1 uk v1 x x … … … … … … … … vk V: right singular vectors … … … ~ … ~ A: n x m U: left singular vectors
SVD: How to • #1: Find the left matrix U, where • #2: Project A into the column space of U Projection Matrix of Column Space of U
SVD: drawbacks A U V • Efficiency • Time • Space (U, V) are dense • Interpretation • Dynamic: not easy = 1st singular vector 2nd singular vector
CUR (CX) decomposition …. …. …. …. x x … … … U R … … … • Sample Columns from A to form C • Project A onto the col. Space of C ~ … … … ~ C A: n x m
CUR (CX): advantages • Efficiency (better than SVD) • Time • (c is # of sampled col.s) • Space (C, R) are sparse • Interpretation
CUR (CX): drawbacks • Redundancy in C, wasting both time and space • Dynamic: not easy • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red…
Roadmap Motivation Existing Methods Colibri Colibri-S for static graphs Colibri-D for dynamic graphs Experimental Results Conclusion
Colibri-S: Basic Idea Colibri-S CUR (CX) x Original Matrix x …. …. …. M R • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red… L We want the Col.s in L are linearly independent with each other!
Input Output …. …. …. …. L = : Linearly Ind. Col.s -1 ? LT L M= = Core Matrix …. Initially Sampled matrix C Q: How to find L & M from C efficiently? R = LT x A = ….
A: Find L & M iteratively! Initial Sampled Matrix c …. … Current L & M Redundant ? discard v For each col. v in C Project it on L Expand L & M
Colibri-S vs. CUR(CX) • Quality: • Colibri-S = CUR(CX) • Time: • Colibri-S >= CUR(CX) • Space • Colibri-S >= CUR(CX) • Illustrations CUR (CX) Colibri-S
Colirbri-D for dynamic graphs Mt Rt t Lt Initially sampled matrix Mt+1 Rt+1 ? Lt+1 t+1 Q: How to update L and M efficiently?
Colibri-D: How-To Selected Redundant Selected Redundant Mt Rt t Lt Initially sampled matrix Mt+1 Rt+1 ? t+1 Lt+1 Changed from t
Colibri-D: How-To Mt Lt Selected Redundant Selected Redundant t ~ M Unchanged Cols! ~ Subspace by blue cols at t+1 L Initially sampled matrix t+1 Mt+1 Lt+1
Roadmap Motivation Existing Methods Colibri Experimental Results Conclusion
Experimental Setup • Datasets • Network traffic • 21,837 sources/destinations • 1,222 consecutive hours • 22,800 edges per hour • Accuracy: • Accu = • Space Cost:
Performance of Colibri-S CUR CUR • Accuracy • Same 91%+ • Time • 12x of CMD • 28x of CUR • Space • ~1/3 of CMD • ~10% of CUR CMD CMD Ours Ours Time Space
More Evaluation on Colibri-S Log Time (Sec) CUR CMD Colibri-S Approximation Accuracy
Performance of Colibri-D Time CMD Colibri-S Colibri-D # of changed cols Colibri-D achieves up to 112x speedups
A Family of Low-Rank Approximationfor Fast Graph Mining • Colibri-S • For static graphs • Remove redundancy • Significant saving in time & space by “free” • Colibri-D • For dynamic graphs • Explores “smoothness” • Up to 112x than best known methods
Poster tonight! Thank you! www.cs.cmu.edu/~htong