CS728 Lecture 5 Generative Graph Models and the Web

CS728Lecture 5Generative Graph Models and the Web

Importance of Generative Models Gives insight into the graph formation process: • Anomaly detection – abnormal behavior, evolution • Predictions – predicting future from the past • Simulations and evaluation of new algorithms • Graph sampling – many real world graphs like the web are too large and complex to deal with • Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures

Graph Models: Waxman Models • Used for models of clustering in Internet-like topologies and networks with long and short edges • The vertices are distributed at random in a plane. • An edge is added between each pair of vertices with probability p. p(u,v) =  * exp( -d / (*L) ), 0 ,  1. • L is the maximum distance between any two nodes. • Increase in alpha increases the number of edges in the graph. • Increase in beta increases the number of long edges relative to short edges. • d is the Euclidean distance from u to v in Waxman-1. • d is a random number between [0, L] in Waxman-2.

Graph Models: Configuration Model • Random Graph from given degree sequence • Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence • Solution: Place di stubs onto vertex I Choose pairs of stubs at random

Problem: we may construct graphs with loops and multiedges • To prevent this there must be enough “absorbing” residual degree capacity. • Algorithm: • Maintain list of nodes sorted by residual degrees d(v) • Repeat until all nodes have been chosen: • pick arbitrary vertex v • add edges from v to d(v) vertices of highest residual degree • update residual degrees To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v) Works OK, But is there a more ‘natural’ generative model?

Generative Graph models: Preferential attachment • Price’s Model [65] : Physics citations – “cummulative advantage” • Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto) • Matthew effect / Matilda effect: sociology • Barabasi and Albert 99: Preferential attachment: • Add a new node, create d out-links • Probability of linking a node is proportional to its current degree • Simple explanation of power-law degree distributions

Issues with preferential attachment and Power-laws • Barabasi model fixed constant m for out-degree • Price’s model directed with m mean out-degree • Probability of adding a new edge is proportional to its (in) degree k • problem at the start degree 0 • Price’s model: prop to deg + 1 • Analysis: prob a node has degree k • pk ~ k-3 (Barabasi model) • pk ~ k-(2+1/m) power-law with exponent 2-3 (Price) • Exercise: give pseudocode that generates such a graph in linear time

Variations on the PA Theme • Clustering, Small-World and Ageing • Copying Model • Alpha and beta Models • Temporal Evolution • Densification

Graph models: Copying model • Copying model • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: • Add a node and choose the number of edges to add • Choose a random vertex and “copy” its links (neighbors) • Also generates power-law degree distributions • Generates communities - clustering

Graph Models: The Alpha Model Watts (1999) a model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend. For a range of a values: • The world is small (average path length is short), and • Groups tend to form (high clustering coefficient). Probability of linkage as a function of number of mutual friends (a is 0 in upper left, 1 in diagonal, and ∞ in bottom right curves.)

Graph Models: The Beta Model Watts and Strogatz (1998) “Link Rewiring” b = 0 b = 1 b = 0.125 People know their neighbors, and a few distant people. Clustered and “small world” People know others at random. Not clustered, but “small world” People know their neighbors. Clustered, but not a “small world”

Graph Models: The Beta Model Watts and Strogatz (1998) First five random links reduce the average path length of the network by half, regardless of N! Both a and b models reproduce short-path results of random graphs, but also allow for clustering. Small-world phenomena occur at threshold between order and chaos. Clustering coefficient / Normalized path length Clustering coefficient (C) and average path length (L) plotted against b

Other Related Work • Hybrid models: Beta + Waxman on grid • Huberman and Adamic, 1999: Growth dynamics of the world wide web • Argue against Barabasi model for its age dependence • Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph • Watts, Dodds, Newman, 2002: Identity and search in social networks • Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation • …

Statistics • Statistics of common networks: Large k = large c? Small c = large d?

Modeling Ageing and Temporal Evolution • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is guess for E(t+1) =? 2 * E(t) • A: over-doubled?

Temporal Evolution of Graphs • Densification Power Law • networks appear denser over time • the number of edges grows faster than the number of nodes – average degree is increasing a … densification exponent or equivalently

Graph Densification • Densification Power Law • Densification exponent: 1 ≤ a ≤ 2: • a=1: linear growth – constant out-degree (assumed in the literature so far) • a=2: quadratic growth – clique • Let’s see the real graphs!

Densification – ArXiv citation graph in Physics • Citations among physics papers • 1992: • 1,293 papers, 2,717 citations • 2003: • 29,555 papers, 352,807 citations • For each month M, create a graph of all citations up to month M E(t) 1.69 N(t)

Densification – Patent Citations • Citations among patents granted • 1975 • 334,000 nodes • 676,000 edges • 1999 • 2.9 million nodes • 16.5 million edges • Each year is a datapoint E(t) 1.66 N(t)

Densification – Internet Autonomous Systems • Graph of Internet • 1997 • 3,000 nodes • 10,000 edges • 2000 • 6,000 nodes • 26,000 edges • One graph per day E(t) 1.18 N(t)

Evolution of the Diameter • Prior work on Power Law graphs hints at Slowlygrowing diameter: • diameter ~ O(log N) • diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time • As the network grows the distances between nodes slowly decrease

Diameter – ArXiv citation graph diameter • Citations among physics papers • 1992 –2003 • One graph per year time [years]

Diameter – “Patents” diameter • Patent citation network • 25 years of data time [years]

Diameter – Autonomous Systems diameter • Graph of Internet • One graph per day • 1997 – 2000 number of nodes

Next Time: Densification – Possible Explanations • Generative models to capture the Densification Power Law and Shrinking diameters • 2 proposed models: • Community Guided Attachment – obeys Densification • Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)

CS728 Lecture 5 Generative Graph Models and the Web

CS728 Lecture 5 Generative Graph Models and the Web

Presentation Transcript

Models of Generative Grammar

Universal Network Structure and Generative Models

Generative Models For Text

Power Law and Its Generative Models

Generative and Discriminative Models in Text Classification

Generative Models vs. Discriminative models

Linear Classification Models: Generative

CS728: Internet Studies and Web Algorithms

Generative Models

Graph Models

Generative Models for Image Understanding

Generative Models for Crowdsourced Data

The web graph

Generative Models

Models of Generative Grammar

Generative Models for the Web Graph

Generative Models for Image Analysis

CS728 Clustering the Web Lecture 13

CS728 Web Clustering II Lecture 14

Lecture 5 Graph Theory

CS728 Web Indexes

CS728: Internet Studies and Web Algorithms