1 / 102

Anomaly Detection and Virus Propagation in Large Graphs

Anomaly Detection and Virus Propagation in Large Graphs. Christos Faloutsos CMU. Thank you!. Dr. Ching-Hao (Eric) Mao Prof. Kenneth Pao. Outline. Part 1: anomaly detection OddBall (anomaly detection) Belief Propagation Conclusions Part 2: influence propagation.

fawzia
Download Presentation

Anomaly Detection and Virus Propagation in Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU

  2. Thank you! • Dr. Ching-Hao (Eric) Mao • Prof. Kenneth Pao Faloutsos, Prakash, Chau, Koutra, Akoglu

  3. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu

  4. OddBall: Spotting Anomaliesin Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science PAKDD 2010, Hyderabad, India

  5. Main idea For each node, • extract ‘ego-net’ (=1-step-away neighbors) • Extract features (#edges, total weight, etc etc) • Compare with the rest of the population Faloutsos, Prakash, Chau, Koutra, Akoglu

  6. What is an egonet? egonet ego Faloutsos, Prakash, Chau, Koutra, Akoglu

  7. Selected Features • Ni: number of neighbors (degree) of ego i • Ei: number of edges in egonet i • Wi: total weight of egonet i • λw,i: principal eigenvalue of the weighted adjacency matrix of egonet I Faloutsos, Prakash, Chau, Koutra, Akoglu

  8. Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu

  9. Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu

  10. Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu

  11. Near-Clique/Star Andrew Lewis (director) Faloutsos, Prakash, Chau, Koutra, Akoglu

  12. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu

  13. E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] Faloutsos, Prakash, Chau, Koutra, Akoglu

  14. E-bay Fraud detection Faloutsos, Prakash, Chau, Koutra, Akoglu

  15. E-bay Fraud detection Faloutsos, Prakash, Chau, Koutra, Akoglu

  16. E-bay Fraud detection - NetProbe Faloutsos, Prakash, Chau, Koutra, Akoglu

  17. Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code?’) Faloutsos, Prakash, Chau, Koutra, Akoglu

  18. Outline • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions Faloutsos, Prakash, Chau, Koutra, Akoglu

  19. PATENT PENDING SDM 2011, Mesa, Arizona Polonium: Tera-Scale Graph Mining and Inference for Malware Detection Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept

  20. Polonium: The Data 60+ terabytes of dataanonymously contributedby participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges Faloutsos, Prakash, Chau, Koutra, Akoglu

  21. Polonium: Key Ideas • Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware • Use “guilt-by-association” (i.e., homophily) • E.g., files that appear on machines with many bad files are more likely to be bad • Scalability: handles 37 billion-edge graph Faloutsos, Prakash, Chau, Koutra, Akoglu

  22. Polonium: One-Interaction Results Ideal 84.9% True Positive Rate1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware Faloutsos, Prakash, Chau, Koutra, Akoglu

  23. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu

  24. Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

  25. Problem Definition:GBA techniques ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? ? ? Faloutsos, Prakash, Chau, Koutra, Akoglu

  26. Homophily and Heterophily homophily heterophily NOTall methods handle heterophily BUT proposed method does! Step 1 All methods handle homophily Step 2 Faloutsos, Prakash, Chau, Koutra, Akoglu

  27. Are they related? • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them Faloutsos, Prakash, Chau, Koutra, Akoglu

  28. Are they related? YES! • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them Faloutsos, Prakash, Chau, Koutra, Akoglu

  29. Correspondence of Methods 1 1 1 d1 d2 d3 0 1 0 1 0 1 0 1 0 ? 0 1 1 prior labels/ beliefs final labels/ beliefs adjacency matrix Faloutsos, Prakash, Chau, Koutra, Akoglu

  30. Results: Scalability runtime (min) # of edges (Kronecker graphs) FABP is linear on the number of edges. Faloutsos, Prakash, Chau, Koutra, Akoglu

  31. Results (5): Parallelism % accuracy FABP ~2x faster & wins/ties on accuracy. runtime (min) Faloutsos, Prakash, Chau, Koutra, Akoglu

  32. Conclusions • Anomaly detection: hand-in-hand with pattern discovery (‘anomalies’ == ‘rare patterns’) • ‘OddBall’ for large graphs • ‘NetProbe’ and belief propagation: exploit network effects. • FaBP: fast & accurate Faloutsos, Prakash, Chau, Koutra, Akoglu

  33. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu

  34. Influence propagation in large graphs - theorems and algorithms B. AdityaPrakash http://www.cs.cmu.edu/~badityap Christos Faloutsos http://www.cs.cmu.edu/~christos Carnegie Mellon University

  35. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Faloutsos, Prakash, Chau, Koutra, Akoglu

  36. Dynamical Processes over networks are also everywhere! Faloutsos, Prakash, Chau, Koutra, Akoglu

  37. Why do we care? • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Social Collaboration ........ Faloutsos, Prakash, Chau, Koutra, Akoglu

  38. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Faloutsos, Prakash, Chau, Koutra, Akoglu

  39. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Faloutsos, Prakash, Chau, Koutra, Akoglu

  40. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Faloutsos, Prakash, Chau, Koutra, Akoglu

  41. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Faloutsos, Prakash, Chau, Koutra, Akoglu

  42. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Faloutsos, Prakash, Chau, Koutra, Akoglu

  43. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Faloutsos, Prakash, Chau, Koutra, Akoglu

  44. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Faloutsos, Prakash, Chau, Koutra, Akoglu

  45. In this talk Given propagation models: Q1: Will an epidemic happen? ANALYSIS Understanding Faloutsos, Prakash, Chau, Koutra, Akoglu

  46. In this talk Q2: How to immunize and control out-breaks better? POLICY/ ACTION Managing Faloutsos, Prakash, Chau, Koutra, Akoglu

  47. Outline • Part 1: anomaly detection • Part 2: influence propagation • Motivation • Epidemics: what happens? (Theory) • Action: Who to immunize? (Algorithms) Faloutsos, Prakash, Chau, Koutra, Akoglu

  48. A fundamental question Strong Virus Epidemic? Faloutsos, Prakash, Chau, Koutra, Akoglu

  49. example (static graph) Weak Virus Epidemic? Faloutsos, Prakash, Chau, Koutra, Akoglu

  50. Problem Statement # Infected above (epidemic) below (extinction) time Separate the regimes? Find, a condition under which • virus will die out exponentially quickly • regardless of initial infection condition Faloutsos, Prakash, Chau, Koutra, Akoglu

More Related