1 / 18

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection. Hanghang Tong and Ching-Yung Lin. April 28-30, 2011. Large Graphs are Everywhere!. -----. Q: How to find patterns? e.g., community, anomaly, etc. Terrorist Network [Krebs 2002]. Food Web [2007].

zwi
Download Presentation

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang Tong and Ching-Yung Lin SIAM-DM 2011, Mesa AZ, USA, April 28-30, 2011

  2. Large Graphs are Everywhere! ----- • Q: How to find patterns? • e.g., community, anomaly, etc. Terrorist Network [Krebs2002] Food Web [2007] Internet Map [Koren 2009] Social Network [Newman 2005] Protein Network [Salthe2004] Web Graph

  3. Matrix Tool for Finding Graph Patterns • A Typical Procedure: Residual matrix Low-rank matrices Adj. Matrix A Graph A = F x G + R 3

  4. Matrix Tool for Finding Graph Patterns • A Typical Procedure: Residual matrix Low-rank matrices Adj. Matrix A Graph A = F x G + R anomalies community An Illustrative Example 4

  5. Improve Interpretation by Non-negativity • A Typical Procedure: • An Example Interpretation by Non-negativity community Non-negative Matrix Factorization F >= 0; G >= 0 (for community detection) Adjacency Matrix A A = F x G + R Graph anomalies Non-negative Residual Matrix Factorization R(i,j) >= 0; for A(i,j) > 0 (for anomaly detection) This Paper 5

  6. Anomaly Detection on Graphs • Social Networks • `Popularity contest’ • Computer Networks • Spammer, Port Scanner, Vulnerable Machines, etc • Financial Transaction Networks • Fraud transaction (e.g., money-laundry ring), scammer • Criminal Networks • New criminal trend • Tele-communication Networks • Tele-marketer Key Observation: Abnormal Behavior  Actual Activities

  7. Optimization Formulation Weighted Frobenius Form Common in Any Matrix Factorization Weight • General Case 8

  8. Optimization Formulation Weighted Frobenius Form Common in Any Matrix Factorization Weight Unique in This Paper Non-negative residual • General Case 9

  9. Optimization Formulation • 0/1 Weight Matrix (Major Focus of the Paper) 0/1 weight Common in Any Matrix Factorization Unique in This Paper Non-negative residual

  10. Optimization Formulation with 0/1 Weight Matrix • NrMF with 0/1 Weight Matrix • Q: How to find ‘optimal’ F and G? • D1: Quality  C1: non-convexity of opt. objective • D2: Scalability  C2: large size of the graph 11

  11. Optimization Method: Batch Mode • Basic Idea 1: Alternating • Basic Idea 2: Separation Not convex wrt F and G, jointly But convex if fixing either F or G argminG s.t.. argminG s.t.. i, For each j Standard Quadratic Programming Prob. Overall Complexity: Polynomial  Can we do better? 12

  12. Optimization Method: Incremental Mode Adjacency Matrix A • Basic Idea 1: Recursive • Basic Idea 2: Alternating • Basic Idea 3: Separation Initialize: R=A Rank-1 Approximation Do r times QP for a single variable w/ boundary constrains Update Residual Matrix R Can be solved in constant time Output Final Residual Matrix Overall Complexity: Linear wrt # of edges 13

  13. Experimental Evaluation Effectiveness Efficiency Accuracy Wall-clock Time Anomaly Type # of edges 14

  14. Batch Method vs. Incremental Method Log Wall-clock time (sec.) Batch Method Incremental Method Data Set 16

  15. Conclusion • Problem Formulation: Non-negative Residual Matrix Factorization • a new matrix factorization for interpretable graph anomaly detection • Optimization Methods • Batch: straight-forward, polynomial time complexity • Incremental: linear time complexity • Future Work • Other interpretable properties (sparseness) for anomaly detection • Matrix Factorization w/ Total Non-negativity 17

  16. Thank you! htong@us.ibm.com (We are hiring at IBM Research!) 18

  17. Visual Comparison 19

  18. low q up q low up

More Related