1 / 41

VoG : Summarizing and Understanding Large Graphs

VoG : Summarizing and Understanding Large Graphs. Danai Koutra Jilles Vreeken U Kang Christos Faloutsos. SDM, 24-26 April 2014, Philadelphia, USA. Problem Definition: Graph Summarization. Given : a graph. a succinct summary with possibly overlapping subgraphs. Find : . ≈

bedros
Download Presentation

VoG : Summarizing and Understanding Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VoG: Summarizing and Understanding Large Graphs Danai Koutra JillesVreeken U Kang Christos Faloutsos SDM, 24-26 April 2014, Philadelphia, USA

  2. Problem Definition:Graph Summarization Given: a graph a succinct summary with possibly overlappingsubgraphs Find: ≈ important graph structures. Danai Koutra – SDM’14

  3. Why graph summarization? Visualization Guiding attention Danai Koutra – SDM’14

  4. Why graph summarization? Graph Understanding Danai Koutra – SDM’14

  5. Application: Wikipediacontroversy I don’t see anything!  Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14

  6. Application: Wikipediacontroversy Stars: admins, bots, heavy users Bipartite cores: edit wars vandals Kiev vs. Kyiv Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14

  7. Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)

  8. Main Idea • Use a graph vocabulary: • Best graph summary •  optimal compression (MDL) Shortest lossless description Danai Koutra – SDM’14

  9. Background Minimum Description Length Principle M • Given a set of models M, • the best model MεM is argminL(M) + L(D|M) M M # bits for the data using M # bits for M Danai Koutra – SDM’14

  10. Example Minimum Description Length Principle L(M) + L(D|M) errors a1 x + a0 VS. { } a10x10+ a9x9+ … + a0 [Example adapted from Akoglu’12] Danai Koutra – SDM’14

  11. Formally:Minimum Graph Description Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Error E Adjacency A Model M Danai Koutra – SDM’14

  12. Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)

  13. VoG: Overview ≈? MDL cost argmin ≈ Danai Koutra – SDM’14

  14. VoG: Overview Summary some criterion Danai Koutra – SDM’14

  15. Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)

  16. We need candidatestructures… … How can we get them? Danai Koutra – SDM’14

  17. Step 1: Graph Decomposition Coulduse: ANY graph decomposition method Weadapted a node reordering method: SlashBurn Danai Koutra – SDM’14

  18. SlashBurn-based Graph Decomposition • Slash top-k hubs, burn edges • Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures Notice that the structures can overlap! GCC Before After Danai Koutra – SDM’14

  19. We got candidatestructures. Now, how can we ‘label’ them? Danai Koutra – SDM’14

  20. Step 2: Graph Labeling ≈? 1 MDL cost 2 argmin ≈ Danai Koutra – SDM’14

  21. Graph Representation DETAILS 1 . . . hub? 1 n “best” node split? 45 n missing edges? “best” node ordering? 80 Danai Koutra – SDM’14

  22. Graph Representation DETAILS Hub: top-deg node Spokes: the rest n=7 hub LN(|st|−1) +logn+ log( ) +L(E+ ) + L(E− ) Star structure Errors # of spokes n−1 spokes IDs missing hub ID extra |st|−1 6 Danai Koutra – SDM’14

  23. DETAILS Graph Representation Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Bipartite graph structure Errors + logn+ log( ) +L(E+ ) + L(E− ) # of rednodes # of blue nodes n−1 their IDs missing extra |st|−1 Danai Koutra – SDM’14

  24. Graph Representation DETAILS 1 . . . 1 Longest path: NP-hard Heuristic: BFS + local search n 45 n 80 + Chain structure Errors missing extra Danai Koutra – SDM’14

  25. Step 2: Graph Labeling ≈? MDL Cost L(m) + L(e) argmin ≈ Danai Koutra – SDM’14

  26. Step 3: Summary Assembly P L A I N Summary Danai Koutra – SDM’14

  27. Concepts DETAILS = # bits as structure - # bits as noise compression gain Savings Danai Koutra – SDM’14

  28. Step 3: Summary Assembly TOP-k Summary Danai Koutra – SDM’14

  29. Concepts DETAILS Summary Encoding cost L(M)=LN(|M|+1)+log()+Σ(-logP(x(s)|M)+L(s)) # of structures per type # of structures per type |M|+1 # of structures # of structures for each structure its encoding length for each structure its encoding length |Ω|+1 s its type : 1 : 1 : 1 its connectivity 3 Danai Koutra – SDM’14

  30. Step 3: Summary Assembly Greedy&Forget DETAILS L(M) … structures Danai Koutra – SDM’14

  31. Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)

  32. Application: Enron Top-1 NBC Top-3 Stars klay kenneth.lay @enron.com Ski excursion CEO jeff.skilling@enron.com Danai Koutra – SDM’14

  33. Runtime VOG is near-linear on the number of edges of the input graph. Danai Koutra – SDM’14

  34. Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)

  35. Conclusions • Formulation: info-theoretic graph summarization approach • Algorithm: VoG is near-linear on the edges • Experiments on real graphs Danai Koutra – SDM’14

  36. Code www.cs.cmu.edu/~dkoutra/SRC/vog.tar Danai Koutra – SDM’14

  37. Thank you! Questions? www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra

  38. Backup Slides: Future Work • Our vocabulary was: • Extend to more complicated structures, e.g.: “jellyfish” [Tauro’01] Danai Koutra

  39. Backup Slides: Future Work • Combine graph decomposition methods • Community detection • Spectral method • METIS • SUBDUE • Cross-Associations • Eigenspokes • … Danai Koutra – SDM’14

  40. Backup slides:Quantitative Analysis 4292729 bits as noise Real graphs have structure (we save bits by encoding structures). Danai Koutra – SDM’14

  41. Backup slides:Quantitative Analysis Main structures: Danai Koutra – SDM’14

More Related