1 / 40

Graph OLAP: Towards Online Analytical Processing on Graphs

Graph OLAP: Towards Online Analytical Processing on Graphs. Chen Chen , Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago. Outline. Motivation Framework Efficient Computation

derica
Download Presentation

Graph OLAP: Towards Online Analytical Processing on Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph OLAP: Towards Online Analytical Processing on Graphs Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center University of Illinois at Chicago

  2. Outline • Motivation • Framework • Efficient Computation • Experiments • Conclusion

  3. Online Analytical Processing • Jim Gray, 1997 • OLAP as a powerful analytical tool

  4. The Usefulness of OLAP • Multi-dimensional • Different perspectives • Multi-level • Different granularities • Can we offer roll-up/drill-down and slice/dice on graph data? • Traditional OLAP cannot handle this, because they ignore links among data objects

  5. The Prevalence of Graphs • Chemical compounds, computer vision objects, circuits, XML • Especially various information networks • Biological networks • Bibliographic networks • Social networks • World Wide Web (WWW)

  6. Applications • WWW • >= 3 billion nodes, >= 50 billion arcs • Facebook • >= 100 million active users • Combining topological structures and node/edge attributes • Great challenge to view and analyze them • We propose Graph OLAP to tackle this issue

  7. Scenario #1 • A bibliographic network • The collaboration patterns among researchers for SIGMOD 2004

  8. Scenario #2

  9. Outline • Motivation • Framework • Data Model • Two types of Graph OLAP • Dimension, Measure and OLAP operations • Efficient Computation • Experiments • Conclusion

  10. Data Model • We have a collection of network snapshots G= {G1, G2, . . . , GN} • Each snapshot Gi= (I1,i, I2,i, . . . , Ik,i; Gi) • I1,i, I2,i, . . . , Ik,i are k informational attributes describing the snapshot as a whole • Gi= (Vi, Ei) is an attributed graph, with attributes attached with its nodes Viand edges Ei • Since G1, G2, . . . , GNonly represent different observations of a network, V1, V2, . . . , VNactually correspond to the same set of objects

  11. Two Types of OLAP • Informational OLAP (abbr. I-OLAP) • Topological OLAP (abbr. T-OLAP)

  12. Informational OLAP • Dimensions come from informational attributes attached at the whole snapshot level, so-called Info-Dims • e.g., scenario #1

  13. I-OLAP Characteristics • Overlay multiple pieces of information • Do not change the objects whose interactions are being looked at • In the underlying snapshots, each node is a researcher • In the summarized view, each node is still a researcher

  14. Topological OLAP • Dimensions come from the node/edge attributes inside individual networks, so-called Topo-Dims • e.g., scenario #2

  15. T-OLAP Characteristics • Zoom in/Zoom out • Network topology changed: “generalized” nodes and “generalized” edges • In the underlying network, each node is a researcher • In the summarized view, each node becomes an institute that comprises multiple researchers

  16. Measures in Graph OLAP • Measure is an aggregated graph • I-aggregated graph • T-aggregated graph • Other measures like node count, average degree, etc. can be treated as derived • Graph plays a dual role • Data source • Aggregate measure

  17. Generality of the Framework • Measures could be complex • e.g., maximum flow, shortest path, centrality • Combine I-OLAP and T-OLAP into a hybrid case

  18. Graph OLAP Operations

  19. Outline • Motivation • Framework • Efficient Computation • Measure classification • Optimizations • Constraint pushing • Experiments • Conclusion

  20. Two Categories of Strategies • Top-down • Generalized cells later • How to combine and leverage intermediate results? • Bottom-up • Generalized cells first • How to early-stop?

  21. Measure Classification • How to combine and leverage intermediate results? • Distributive • The computation of high-level cells can be directly built on low-level cells • Algebraic • Not distributive, but can be easily derived from several distributive measures • Holistic • Neither distributive nor algebraic

  22. Examples • Distributive: collaboration frequency • Use distributiveness to drive computation up the cuboid lattice • Algebraic: maximum flow • Will prove later • Semi-distributive • Holistic: centrality • Need to go down to the raw data and start from scratch

  23. Optimizations • Special measures may have special properties that can help optimize the calculations • We discuss two of them here, with regard to I-OLAP • Localization • Attenuation

  24. Localization • During computation, only a neighborhood of the networks needs to be consulted • e.g., the collaboration frequency of “R. Agrawal” and “R.Srikant” for [sigmod, all-years] only depends on their collaboration frequencies in each SIGMOD conferences • Perfect (i.e., 0-neighborhood) localization • k-neighborhood is less ideal, but still useful • e.g., # of common friends shared by “R. Agrawal” and “R.Srikant”

  25. Attenuation • Consider the transporting capability (i.e., maximum flow) from source S to destination T • Multiple transportation networks, each one is operated by a separate company • With regard to I-OLAP, each network is a “snapshot”, and overlaying more than one snapshots means to share link capacities among companies

  26. Attenuation • Data graph C • Node: cities • Edge: capacity of a link • Measure graph F • Node: cities • Edge: when maximum flow is transmitted, the quantity that passes through a link

  27. Attenuation • Maximum flow is algebraic • F can be derived from C • Just run the maximum flow algorithm • The capacity graph C is obviously distributive • Lemma • Let F be a flow in C and let CFbe its residual graph, where residual means that CF= C - F, then F′ is a maximum flow in CFif and only if F + F′ is a maximum flow in C

  28. Attenuation • Consider two snapshots that are overlaid • Maximum flow F1, F2 already calculated from C1, C2 • Without attenuation • Compute the overall maximum flow F from C1+ C2 • With attenuation • Take F1+ F2 as basis • Compute the residual maximum flow F′ from (C1 - F1) + (C2 - F2), and augment it onto F1 + F2 • Thus, our input attenuates from C1 + C2 to (C1 + C2 ) - (F1 + F2 ), which substantially decreases the efforts

  29. Constraint Pushing • Iceberg graph cube • Partial materialization • Satisfying some interestingness requirement • Push the constraints • Anti-monotone • e.g., maximum flow |f| ≥ δ|f| • Monotone • e.g., diameter d ≥ δd

  30. Outline • Motivation • Framework • Efficient Computation • Experiments • Conclusion

  31. OLAP a Bibliographic Network • We get the coauthorship data from DBLP • Measure • Information Centrality • Two Info-Dims • Area • Database (DB): PODS/SIGMOD/VLDB/ICDE/EDBT • Data Mining (DM): ICDM/SDM/KDD/PKDD • Information Retrieval (IR): SIGIR/WWW/CIKM • Time

  32. OLAP a Bibliographic Network

  33. Efficiency • A test that computes maximum flow as the measure • Synthetically generate flow networks • Details in the paper, with each “snapshot” representing an individual player in the transportation industry • Like the Multi-Way method, calculate low-level cells before merging them into high-level ones • One takes advantage of the attenuation heuristic • The other does not

  34. Efficiency

  35. Outline • Motivation • Framework • Efficient Computation • Experiments • Conclusion

  36. Conclusion • We propose a Graph OLAP framework to perform multi-dimensional, multi-level analysis on network data • Measure is an aggregated graph • Informational/Topological dimensions lead to I-OLAP, T-OLAP

  37. Conclusion • Mainly focusing on I-OLAP, we discuss how a graph cube can be efficiently computed and materialized • distributive, algebraic, holistic • Optimizations: localization, attenuation • Constraint pushing

  38. Future Works • Technical issues for T-OLAP • Selective drilling and discovery-driven InfoNet-OLAP

  39. Thank You!

More Related