1 / 19

Discovery-Driven Graph Summarization

Discovery-Driven Graph Summarization. ICDE 2010. Ning Zhang , Yuanyuan Tian , Jignesh M. Patel University of Wisconsin-Madison, IBM Almaden Research Center, USA. Presented by Sung Eun, Park 9/26/2010. Intelligent Database Systems Lab. School of Computer Science & Engineering

vesta
Download Presentation

Discovery-Driven Graph Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovery-Driven Graph Summarization ICDE 2010 Ning Zhang , Yuanyuan Tian , Jignesh M. Patel University of Wisconsin-Madison, IBM Almaden Research Center, USA Presented by Sung Eun, Park 9/26/2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business Technology Seoul National University Seoul, Korea

  2. Contents • Introduction & Preliminaries • k-SNAP Summarization • Efficient Aggregation for Graph Summarization(Sigmod’08) • Categorization of Numerical Attributes • CANAL algorithm • Try to Merge Groups and Select Cutoffs • Automatic Discovery of Interesting Summaries • Measuring Interestingness of Summaries • Automatic Discovery of Interesting Summaries • Experimental Results • Conclusion

  3. Introduction • Large Graph datasets are ubiquitous! • Graph summarization can assist in uncovering useful insights about the patterns hidden in the underlying data • Proposed Graph Summarization Approach (previous work)

  4. Preliminaries • Proposed Graph Summarization Approach (previous work) • A summary graph by grouping nodes based on user-selected node attributes and relationships DB DB AI a3 a1 a2 5->LP 10->MP 30->HP DB AI OS DB AI a6 a4 a5 DB a2 a3 AI a5 23->HP 18->MP 28->HP a6 10->MP 30->HP 8->LP 28->HP

  5. Preliminaries • Proposed Graph Summarization Approach (previous work) • SNAP : Nodes of each group are homogeneous with respect to user-selected attributes and relationships. • May result in a large number of small groups, in the worst case each node may end up an individual group. • k-SNAP : users can control the number of groups in the summary graph as k. → <k-SNAP: Top-down approach> < k-SNAP : different resoultion>

  6. Preliminaries • k-SNAP • Low/Moderately/Highly Cited groups • Link represents the participation rate Strong relationship Weak relationship • participation rate=

  7. Preliminaries • Δ-measure • How different it is to a hypothetical “ideal summary” • Given a graph G, the Δ-measure of a grouping of nodes Φ = {G1, G2, ..., Gk} is defined as follows: • Small Δ value indicates good summary For Every pair of groups , sum… Differences to the ideal summary

  8. Introduction • k-SNAP • Produces summaries which themselves are also graphs • Two limitations • k-SNAP uses categorical node attributes, but even for domain experts, providing clear cutoffs is tricky and not always possible. →introduces CANAL alg’m that automatically categorize numerical attributes values based on both the attributes values and the link structures of nodes in the graph • k-SNAP allows summaries with different resolutions, but users may have to go through a large number of summaries until some interesting summaries are found. → propose a measure to assess the interestingness of summaries

  9. Automatic Discovery of Interesting Summaries • CANAL algorithm • Input: Graph G, Numerical node attribute value a, Desired number category : C • Intuition : Find cutoffs that increases Δ value the most. • Every iteration.. Until there is only one group left • Pick the adjacent pair that has the most similar relationship pattern to the other groups • Merge the pair and calculate Δ increases after • Pick C-1 cutoffs that has the biggest Δ increases Calculate Δ increases nodes that containssame attribute value G1 G2 G3 G4 Gk Numerical node attribute Value … …

  10. Automatic Discovery of Interestingness Summaries

  11. Automatic Discovery of Interestingness Summaries

  12. Experimental Setting • Datasets • DBLP DB Dataset : Bibliography data • Coauthorship graph(undirected) • NODE : authors, 7,445 nodes • EDGE : coauthorship, 19,971 edges • Attirbute : publication number • CiteSeer Dataset • Citation graphs(directed) • NODE : article and a directed • EDGE : a citation • Attirbute : Number of citations

  13. Experimental Result • Effectiveness of the CANAL algorithm • cutoffs generated by CANAL vs. cutoffs manually selected in the previous work • The cutoffs produced by CANAL results in Δ/k values that are very close to the manually selected ones. Δ measure (good summary has small value)

  14. Experimental Result • Efficiency of the CANAL Algorithm • Execution time of the CANAL algorithm nicely scales with increasing data sizes • C=3, note that different C values do not significantly affect the run time of the CANAL algorithm, because all the cutoff candidates are considered.

  15. Experimental Result • Effectiveness of the Interestingness Measure • Interestingness of summaries

  16. Experimental Result • Two types of Interesting summaries Overall Summary • Conciseness is relatively small : k=4 • Coverage : the nodes participating to the strong relationship ↑& - • Diversity : ↑ as much as LP of size 2680 & -

  17. Experimental Result Only strongly collaborate to HP • Two types of Interesting summaries Informative Summary Only show weak relationship • Conciseness : a little bigger than “overall summary” • Coverage : ↑ as much as new LP group of size 1261,- • Diversity : ↑ as much as new LP group of size 1261, -

  18. Conclusion • Overcome two key limitations in the previous work • introduces CANAL algorithm that automatically categorize numerical attributes values based on both the attributes values and the link structures of nodes in the graph • propose a measure to assess the interestingness of summaries

  19. Q&A Thank you

More Related