1 / 25

CS728 Lecture 5 Generative Graph Models and the Web

CS728 Lecture 5 Generative Graph Models and the Web. Importance of Generative Models. Gives insight into the graph formation process: Anomaly detection – abnormal behavior, evolution Predictions – predicting future from the past Simulations and evaluation of new algorithms

dot
Download Presentation

CS728 Lecture 5 Generative Graph Models and the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS728Lecture 5Generative Graph Models and the Web

  2. Importance of Generative Models Gives insight into the graph formation process: • Anomaly detection – abnormal behavior, evolution • Predictions – predicting future from the past • Simulations and evaluation of new algorithms • Graph sampling – many real world graphs like the web are too large and complex to deal with • Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures

  3. Graph Models: Waxman Models • Used for models of clustering in Internet-like topologies and networks with long and short edges • The vertices are distributed at random in a plane. • An edge is added between each pair of vertices with probability p. p(u,v) =  * exp( -d / (*L) ), 0 ,  1. • L is the maximum distance between any two nodes. • Increase in alpha increases the number of edges in the graph. • Increase in beta increases the number of long edges relative to short edges. • d is the Euclidean distance from u to v in Waxman-1. • d is a random number between [0, L] in Waxman-2.

  4. Graph Models: Configuration Model • Random Graph from given degree sequence • Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence • Solution: Place di stubs onto vertex I Choose pairs of stubs at random

  5. Problem: we may construct graphs with loops and multiedges • To prevent this there must be enough “absorbing” residual degree capacity. • Algorithm: • Maintain list of nodes sorted by residual degrees d(v) • Repeat until all nodes have been chosen: • pick arbitrary vertex v • add edges from v to d(v) vertices of highest residual degree • update residual degrees To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v) Works OK, But is there a more ‘natural’ generative model?

  6. Generative Graph models: Preferential attachment • Price’s Model [65] : Physics citations – “cummulative advantage” • Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto) • Matthew effect / Matilda effect: sociology • Barabasi and Albert 99: Preferential attachment: • Add a new node, create d out-links • Probability of linking a node is proportional to its current degree • Simple explanation of power-law degree distributions

  7. Issues with preferential attachment and Power-laws • Barabasi model fixed constant m for out-degree • Price’s model directed with m mean out-degree • Probability of adding a new edge is proportional to its (in) degree k • problem at the start degree 0 • Price’s model: prop to deg + 1 • Analysis: prob a node has degree k • pk ~ k-3 (Barabasi model) • pk ~ k-(2+1/m) power-law with exponent 2-3 (Price) • Exercise: give pseudocode that generates such a graph in linear time

  8. Variations on the PA Theme • Clustering, Small-World and Ageing • Copying Model • Alpha and beta Models • Temporal Evolution • Densification

  9. Graph models: Copying model • Copying model • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: • Add a node and choose the number of edges to add • Choose a random vertex and “copy” its links (neighbors) • Also generates power-law degree distributions • Generates communities - clustering

  10. Graph Models: The Alpha Model Watts (1999) a model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend. For a range of a values: • The world is small (average path length is short), and • Groups tend to form (high clustering coefficient). Probability of linkage as a function of number of mutual friends (a is 0 in upper left, 1 in diagonal, and ∞ in bottom right curves.)

  11. Graph Models: The Beta Model Watts and Strogatz (1998) “Link Rewiring” b = 0 b = 1 b = 0.125 People know their neighbors, and a few distant people. Clustered and “small world” People know others at random. Not clustered, but “small world” People know their neighbors. Clustered, but not a “small world”

  12. Graph Models: The Beta Model Watts and Strogatz (1998) First five random links reduce the average path length of the network by half, regardless of N! Both a and b models reproduce short-path results of random graphs, but also allow for clustering. Small-world phenomena occur at threshold between order and chaos. Clustering coefficient / Normalized path length Clustering coefficient (C) and average path length (L) plotted against b

  13. Other Related Work • Hybrid models: Beta + Waxman on grid • Huberman and Adamic, 1999: Growth dynamics of the world wide web • Argue against Barabasi model for its age dependence • Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph • Watts, Dodds, Newman, 2002: Identity and search in social networks • Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation • …

  14. Statistics • Statistics of common networks: Large k = large c? Small c = large d?

  15. Modeling Ageing and Temporal Evolution • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is guess for E(t+1) =? 2 * E(t) • A: over-doubled?

  16. Temporal Evolution of Graphs • Densification Power Law • networks appear denser over time • the number of edges grows faster than the number of nodes – average degree is increasing a … densification exponent or equivalently

  17. Graph Densification • Densification Power Law • Densification exponent: 1 ≤ a ≤ 2: • a=1: linear growth – constant out-degree (assumed in the literature so far) • a=2: quadratic growth – clique • Let’s see the real graphs!

  18. Densification – ArXiv citation graph in Physics • Citations among physics papers • 1992: • 1,293 papers, 2,717 citations • 2003: • 29,555 papers, 352,807 citations • For each month M, create a graph of all citations up to month M E(t) 1.69 N(t)

  19. Densification – Patent Citations • Citations among patents granted • 1975 • 334,000 nodes • 676,000 edges • 1999 • 2.9 million nodes • 16.5 million edges • Each year is a datapoint E(t) 1.66 N(t)

  20. Densification – Internet Autonomous Systems • Graph of Internet • 1997 • 3,000 nodes • 10,000 edges • 2000 • 6,000 nodes • 26,000 edges • One graph per day E(t) 1.18 N(t)

  21. Evolution of the Diameter • Prior work on Power Law graphs hints at Slowlygrowing diameter: • diameter ~ O(log N) • diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time • As the network grows the distances between nodes slowly decrease

  22. Diameter – ArXiv citation graph diameter • Citations among physics papers • 1992 –2003 • One graph per year time [years]

  23. Diameter – “Patents” diameter • Patent citation network • 25 years of data time [years]

  24. Diameter – Autonomous Systems diameter • Graph of Internet • One graph per day • 1997 – 2000 number of nodes

  25. Next Time: Densification – Possible Explanations • Generative models to capture the Densification Power Law and Shrinking diameters • 2 proposed models: • Community Guided Attachment – obeys Densification • Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)

More Related