1 / 21

The Block Two-Level Erdös-Rényi (BTER) Graph Model

The Block Two-Level Erdös-Rényi (BTER) Graph Model. Ali Pinar, C. Seshadri , and Tamara G. Kolda Sandia National Laboratories. U.S. Department of Energy Office of Advanced Scientific Computing Research. U.S. Department of Defense Defense Advanced Research Projects Agency.

preston
Download Presentation

The Block Two-Level Erdös-Rényi (BTER) Graph Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Block Two-Level Erdös-Rényi (BTER) Graph Model Ali Pinar, C. Seshadri, and Tamara G. Kolda Sandia National Laboratories U.S. Department of EnergyOffice of Advanced Scientific Computing Research U.S. Department of Defense DefenseAdvanced Research Projects Agency Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. Pinar - MMDS12

  2. Why model massive graphs? • Enable sharing of surrogate data • Computer network traffic • Social networks • Financial transactions • Testing graph algorithms • Scalability • Versatility (e.g., vary degree distributions) • Characterizing algorithm performance • Verification & validation • Insight into… • Generative process • Community structure • Comparison • Evolution • Uncertainty Block Two-Level Erdös-Rényi(BTER) graph; image courtesy of NurcanDurak. Pinar - MMDS12

  3. Model Desiderata Actor Collaboration • Captures heavy-tailed degree distribution • Not necessarily power law • Captures community structure • Measured indirectly through clustering coefficient, and other measures • Able to “fit” real-world data • Reproduce degree distribution • Reproduce community structure • Scales up to ~240 nodes • Motivated by GRAPH500 benchmark WWW Power Grid A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5349):509-512, 1999. M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113, 2004. http://www.graph500.org/ Pinar - MMDS12

  4. Preserving Degree Distribution • Majority of models are amazingly bad at matching degree distributions • Tend to focus on matching “spirit” of the distribution • May not even have the capability of matching some distributions • Exception is the Chung-Lu (CL) model [Chung & Lu, 2002] • Also known as configuration model [Newman, 2003] or weighted Erdős-Rényi (ER) model • Premise: Probability of edge is proportional to degrees of endpoints • Pr(eij) = didj/(2m) • Can be implemented in a scalable way by repeated edge insertions. The CL model is nearly perfect at matching desired degree distributions. Pinar - MMDS12

  5. Community Structure in Graphs • Numerous community finding algorithms exist • Difficult to validate • Trouble in finding full range of sizes • Instead, use related measures like clustering coefficient • Triangles arise because of community structure • What “community” structure must be presentto ensure a high clustering coefficient, especially for low-degree nodes? • How many communities? • What do they look like? M.E.J. Newman and M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113, 2004. Pinar - MMDS12

  6. Clustering Coefficients Clustering Coefficient ti = # triangles at vertex i di = degree of vertex i Global Clustering Coeff. Pinar - MMDS12

  7. Clustering coefficients are higher for lower degree vertices Pinar - MMDS12

  8. Model building approach • Find features that restrict the space and identifythe structure imposed by these features • Given • askewed degree distribution • high clustering coefficients • What structure should we expect? • Let’s take it to an extreme, i.e., clustering coefficients ~1. • Implication 1: The graph must be a union of cliques. • Implication 2: The sizes of the cliques (communities) will have the same distribution as the degree distribution. • The challenge is in characterizing the gray area: clustering coefficients are high but not 1 Pinar - MMDS12

  9. Building the basis for a model Seshadhri, Kolda, & Pinar, Phys. Rev. Let. E 2012 Empirical Observations • Degree distributions are heavy tailed. • Clustering coefficients are highest for small degree vertices. Theoretical Analysis Theorem: If a community has s edges then there must be Ω(√s) vertices with degree Ω(√s). We are not only trying to build a formal model, we are trying to formalize the model building process itself. Hypothesis: Real-world interaction networks consist of a scale –free collection of relatively dense Erdős-Rényi graphs. Pinar - MMDS12

  10. Verifying the model Hypothesis: Real-world interaction networks consist of a scale –free collection of dense Erdős-Rényi graphs. Prediction: the number of communities grows with the number of nodes. Prediction: Within a community, the degree variance should be small. Verification: Vertex degree is correlated to the average degree of neighbors. And degrees of triangle vertices are close. Verification: The sizes of the communities are small, and do not grow (or grow very slowly) for larger graphs. (e.g., Dunbar number). Pinar - MMDS12

  11. Matching the Clustering Coefficients • Random pairings cannot close wedges! • Models that achieve high clustering coefficients require history to guide future edge insertions • To get high clustering coefficient without history, we have to pre-group the nodes into affinity blocks • Our theory: CL graph with high clustering coefficient must have dense ER block at its core • Each affinity block is a near-clique • Simplifying assumption: Non-overlapping blocks • Implies blocks must comprise nodes of similar degree • Can tune clustering coefficient to be appropriate for the degree by changing connectivity of the block 1 2 4 3 If nodes of different degrees are in the same block, then the high-degree nodes will have low clustering coefficients. Pinar - MMDS12

  12. Preprocessing: Determining Blocks heterogeneous • Input: Desired degree-distribution • Assume nodes sorted by degree, least to greatest • Ignore degree-1 vertices in this phase • Algorithm: • Group nodes in order • Bulk assign, if possible • Output: Compressed information about blocks • Size of each block • Starting index of each block • “Weight” of each block (depends on connectivity) • O(duniq) information, if compressed • Observe: If degree distribution is power law, then so is the distribution of the block sizes • Evocative of Dunbar’s number: Max block size = 100 for γ=2 2 2 3 2 2 3 3 2 homogeneous 2 3 3 2 2 3 3 2 2 3 3 2 2 3 3 2 Pinar - MMDS12

  13. BTER: Block Two-Level Erdős-Rényi Phase 1: Erdős-Rényi graphs in each block. User-specified connectivity determines number of links within each block. A formula is given that depends on lowest-degree node in block. Preprocessing: Create explicit affinity blocks of nodes with same (or nearly same) degree. The blocks are determined by the degree distribution. Phase 2:CL (aka weighted ER) model on “excess” degree, creates connections across blocks. Can run in tandem to Phase 1 by worked with expected excess degree. Pinar - MMDS12

  14. Scalable BTER: Independent Edges Choose phase 1 or 2? Choose block proportional to number of “samples” per block Create Phase 2 edgeusing CL model on “excess degree” Create Block 1 edge per ER model with connectivity ρ1 Create Block K edge per ER model with connectivity ρK Choose 1st endpoint Choose 2nd endpoint Requires O(n) data to determine the various probabilities, and can be compressed to O(duniq). Choose 1st endpoint Choose 2nd endpoint Choose 1st endpoint Choose 2nd endpoint Pinar - MMDS12

  15. Visualization of BTER Adjacency Matrix Red = Phase 1 Blue = Phase 2 Pinar - MMDS12

  16. Visualization of BTER Graph Nodes colored by degree (darker=higher) Phase 1 = Blue Phase 2 = Green Image courtesy of Nurcan Durak. Pinar - MMDS12

  17. Co-authorship (ca-AstroPh) • 18,771 nodes 396,100 edges; based on arxiv • BTER Phase 1 edges: 290,268 Phase 2 edges: 112,808 • Global clustering coefficient Original: 0.32, BTER: 0.31, CL: 0.01 • Normalized size of the largest connected component: Original: 0.95, BTER: 0.86, CL: 1.0 Eigenvalues are not determined by the degree distribution. Pinar - MMDS12

  18. Trust Network • 75,879vertices, 811,480edges; based on Epinion web site; edges represent trust between two users • BTER Phase 1 Edges: 300,162 Phase 2 Edges: 515,192 • Global clustering coefficientOriginal: 0.07, BTER: 0.07, CL: 0.03 • Normalized size of the largest connected componentOriginal: 1.0, BTER: 0.96, CL: 0.98 BTER provides good matches to real data, even when the clustering coefficients are not too high Pinar - MMDS12

  19. Concluding Remarks • Models are a critical challenge in network analysis • The challenge is not in building a formal model, but formalizing the modeling process itself • We propose the Block Two-Level Erdös-Rényi (BTER) • New theory says there must be many dense subgraphs for high clustering coefficient • New BTER model explicitly creates dense communities using ER • Exceptional similarities to real data in terms of clustering coefficients and eigenvalues • The code is available at http://www.sandia.gov/~tgkolda/bter_supplement/. • Scalable versions (Hadoopand MPI) will be available soon • For more information, • Ali Pinar apinar@sandia.gov Pinar - MMDS12

  20. A new workshop • SIAM Workshop on Network Science • Proposed; under review • Dates: July 7-8, 2013 • Place: San Diego, CA • Co-located with SIAM Annual Meeting • Contact: • Ali Pinar (apinar@sandia.gov), Sandia National Labs • MadhavMarathe(mmarathe@vbi.vt.edu), Virginia Tech Pinar - MMDS12

  21. Relevant Publications • Modeling • C. Seshadhri, T. Kolda, and A. Pinar, “The Blocked Two-Level ErdosRenyi Graph Model,” Phys. Rev. E 2012. • C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth analysis of Stochastic Kronecker Graphs," submitted. • A. Pinar, C. Seshadhri, and T. Kolda, “The Similarity of Stochastic Kronecker Graphs to Edge-Configuration Models,” SDM’12 • C. Seshadhri, A. Pinar, and T. Kolda, “An In Depth study of Stochastic Kronecker Graphs,” ICDM’12 • Generating a random graph • J. Ray, A. Pinar, and C. Sehadhri, Are we there yet? When to stop a Markov chain while generating random graphs," submitted. • I. Stanton and A. Pinar, “Constructing and uniform sampling graphs with prescribed joint degree distribution using Markov Chains,” to appear in ACM JEA. • I. Stanton and A. Pinar, “Sampling graphs with prescribed joint degree distribution using Markov Chains,” ALENEX’11. • Community structure • C. Seshadhri, A. Pinar, and T. Kolda, “Fast Triangle Counting through Wedge Sampling," submitted. • M. Rocklin and A. Pinar, “On Clustering on Graphs with Multiple Edge Types,” Internet Math. 2012. • M. Rocklin and A. Pinar, “Latent Clustering on Graphs with Multiple Edge Types,” Proc. 8th Workshop on Algorithms and Models for the Web Graph WAW’ 11. • M. Rocklin and A. Pinar, “Computing an Aggregate Edge-weight function for Clustering Graphs with Multiple Edge Types,” WAW’10. Pinar - MMDS12

More Related