large graph mining patterns tools and cascade analysis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Large Graph Mining – Patterns, Tools and Cascade analysis PowerPoint Presentation
Download Presentation
Large Graph Mining – Patterns, Tools and Cascade analysis

Loading in 2 Seconds...

play fullscreen
1 / 118

Large Graph Mining – Patterns, Tools and Cascade analysis - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Large Graph Mining – Patterns, Tools and Cascade analysis. Christos Faloutsos CMU. Thank you!. Michael Katehakis Hui Xiong Spiros Papadimitriou Luz Kosar. Roadmap. Introduction – Motivation Why ‘big data’ Why (big) graphs? Problem#1: Patterns in graphs Problem#2: Tools

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Large Graph Mining – Patterns, Tools and Cascade analysis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Large Graph Mining –Patterns, Tools and Cascade analysis Christos Faloutsos CMU

    2. Thank you! • Michael Katehakis • HuiXiong • Spiros Papadimitriou • Luz Kosar C. Faloutsos (CMU)

    3. Roadmap • Introduction – Motivation • Why ‘big data’ • Why (big) graphs? • Problem#1: Patterns in graphs • Problem#2: Tools • Problem#3: Scalability • Conclusions C. Faloutsos (CMU)

    4. Why ‘big data’ • Why? • What is the problem definition? C. Faloutsos (CMU)

    5. Main message:Big data: often > experts • ‘Super Crunchers’ Why Thinking-By-Numbers is the New Way To Be Smart by Ian Ayres, 2008 • Google won the machine translation competition 2005 • http://www.itl.nist.gov/iad/mig//tests/mt/2005/doc/mt05eval_official_results_release_20050801_v3.html C. Faloutsos (CMU)

    6. Problem definition – big picture Tera/Peta-byte data Analytics Insights, outliers C. Faloutsos (CMU)

    7. Problem definition – big picture Tera/Peta-byte data Analytics Insights, outliers Main emphasis in this talk C. Faloutsos (CMU)

    8. Roadmap • Introduction – Motivation • Why ‘big data’ • Why (big) graphs? • Problem#1: Patterns in graphs • Problem#2: Tools • Problem#3: Scalability • Conclusions C. Faloutsos (CMU)

    9. Graphs - why should we care? Food Web [Martinez ’91] >$10B revenue >0.5B users Internet Map [lumeta.com] C. Faloutsos (CMU)

    10. T1 D1 ... ... DN TM Graphs - why should we care? • IR: bi-partite graphs (doc-terms) • web: hyper-text graph • ... and more: C. Faloutsos (CMU)

    11. Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • ‘viral’ marketing • Supplier-supply business chains (-> instabilities) • .... • Subject-verb-object -> graph • Many-to-many db relationship -> graph C. Faloutsos (CMU)

    12. Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Tools • Problem#3: Scalability • Conclusions C. Faloutsos (CMU)

    13. Problem #1 - network and graph mining • What does the Internet look like? • What does FaceBook look like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? C. Faloutsos (CMU)

    14. Problem #1 - network and graph mining • What does the Internet look like? • What does FaceBook look like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? • To spot anomalies (rarities), we have to discover patterns C. Faloutsos (CMU)

    15. Problem #1 - network and graph mining • What does the Internet look like? • What does FaceBook look like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? • To spot anomalies (rarities), we have to discover patterns • Large datasets reveal patterns/anomalies that may be invisible otherwise… C. Faloutsos (CMU)

    16. Graph mining • Are real graphs random? C. Faloutsos (CMU)

    17. Laws and patterns • Are real graphs random? • A: NO!! • Diameter • in- and out- degree distributions • other (surprising) patterns • So, let’s look at the data C. Faloutsos (CMU)

    18. Solution# S.1 • Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) C. Faloutsos (CMU)

    19. -0.82 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) C. Faloutsos (CMU)

    20. Solution# S.2: Eigen Exponent E • A2: power law in the eigenvalues of the adjacency matrix Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue C. Faloutsos (CMU)

    21. Many more power laws • # of sexual contacts • Income [Pareto] –’80-20 distribution’ • Duration of downloads [Bestavros+] • Duration of UNIX jobs (‘mice and elephants’) • Size of files of a user • … • ‘Black swans’ C. Faloutsos (CMU)

    22. Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Tools C. Faloutsos (CMU)

    23. Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles C. Faloutsos (CMU)

    24. Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? C. Faloutsos (CMU)

    25. Triangle Law: #S.3 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles Epinions C. Faloutsos (CMU)

    26. Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax2) Q: Can we do that quickly? A: C. Faloutsos (CMU)

    27. Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax2) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( li3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E) C. Faloutsos (CMU)

    28. Triangle Law: Computations [Tsourakakis ICDM 2008] details 1000x+ speed-up, >90% accuracy C. Faloutsos (CMU)

    29. Triangle counting for large graphs? ? ? ? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] C. Faloutsos (CMU) 29

    30. Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] C. Faloutsos (CMU) 30

    31. Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] C. Faloutsos (CMU) 31

    32. Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] C. Faloutsos (CMU) 32

    33. Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Tools • … C. Faloutsos (CMU)

    34. Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) C. Faloutsos (CMU)

    35. T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: • diameter ~ O(log N) • diameter ~ O(log log N) • What is happening in real data? C. Faloutsos (CMU)

    36. T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: • diameter ~ O(log N) • diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time C. Faloutsos (CMU)

    37. T.1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 • 2.9 M nodes • 16.5 M edges diameter time [years] C. Faloutsos (CMU)

    38. T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) C. Faloutsos (CMU)

    39. T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! • But obeying the ``Densification Power Law’’ C. Faloutsos (CMU)

    40. T.2 Densification – Patent Citations • Citations among patents granted • @1999 • 2.9 M nodes • 16.5 M edges • Each year is a datapoint E(t) 1.66 N(t) C. Faloutsos (CMU)

    41. Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Tools • … C. Faloutsos (CMU)

    42. T.3 : popularity over time # in links lag: days after post 3 2 1 Post popularity drops-off – exponentially? @t @t + lag C. Faloutsos (CMU)

    43. T.3 : popularity over time # in links (log) days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? C. Faloutsos (CMU)

    44. T.3 : popularity over time # in links (log) -1.6 days after post (log) • Post popularity drops-off – exponentially? • POWER LAW! • Exponent? -1.6 • close to -1.5: Barabasi’s stack model • and like the zero-crossings of a random walk C. Faloutsos (CMU)

    45. -1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature437, 1251 (2005) . [PDF] Prob(RT > x) (log) Response time (log) C. Faloutsos (CMU)

    46. -1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature437, 1251 (2005) . [PDF] C. Faloutsos (CMU)

    47. Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Tools • Belief Propagation • Tensors • Spike analysis • Problem#3: Scalability • Conclusions C. Faloutsos (CMU)

    48. E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] C. Faloutsos (CMU)

    49. E-bay Fraud detection C. Faloutsos (CMU)

    50. E-bay Fraud detection C. Faloutsos (CMU)