CS728 Lecture 5 Generative Graph Models and the Web

1 / 25

# CS728 Lecture 5 Generative Graph Models and the Web - PowerPoint PPT Presentation

CS728 Lecture 5 Generative Graph Models and the Web. Importance of Generative Models. Gives insight into the graph formation process: Anomaly detection – abnormal behavior, evolution Predictions – predicting future from the past Simulations and evaluation of new algorithms

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CS728 Lecture 5 Generative Graph Models and the Web' - dot

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CS728Lecture 5Generative Graph Models and the Web

Importance of Generative Models

Gives insight into the graph formation process:

• Anomaly detection – abnormal behavior, evolution
• Predictions – predicting future from the past
• Simulations and evaluation of new algorithms
• Graph sampling – many real world graphs like the web are too large and complex to deal with
• Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures
Graph Models: Waxman Models
• Used for models of clustering in Internet-like topologies and networks with long and short edges
• The vertices are distributed at random in a plane.
• An edge is added between each pair of vertices with probability p.

p(u,v) =  * exp( -d / (*L) ), 0 ,  1.

• L is the maximum distance between any two nodes.
• Increase in alpha increases the number of edges in the graph.
• Increase in beta increases the number of long edges relative to short edges.
• d is the Euclidean distance from u to v in Waxman-1.
• d is a random number between [0, L] in Waxman-2.
Graph Models: Configuration Model
• Random Graph from given degree sequence
• Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence
• Solution:

Place di stubs onto vertex I

Choose pairs of stubs at random

Problem: we may construct graphs with loops and multiedges
• To prevent this there must be enough “absorbing” residual degree capacity.
• Algorithm:
• Maintain list of nodes sorted by residual degrees d(v)
• Repeat until all nodes have been chosen:
• pick arbitrary vertex v
• add edges from v to d(v) vertices of highest residual degree
• update residual degrees

To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v)

Works OK, But is there a more ‘natural’ generative model?

Generative Graph models: Preferential attachment
• Price’s Model [65] : Physics citations – “cummulative advantage”
• Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto)
• Matthew effect / Matilda effect: sociology
• Barabasi and Albert 99: Preferential attachment:
• Probability of linking a node is proportional to its current degree
• Simple explanation of power-law degree distributions
Issues with preferential attachment and Power-laws
• Barabasi model fixed constant m for out-degree
• Price’s model directed with m mean out-degree
• Probability of adding a new edge is proportional to its (in) degree k
• problem at the start degree 0
• Price’s model: prop to deg + 1
• Analysis: prob a node has degree k
• pk ~ k-3 (Barabasi model)
• pk ~ k-(2+1/m) power-law with exponent 2-3 (Price)
• Exercise: give pseudocode that generates such a graph in linear time
Variations on the PA Theme
• Clustering, Small-World and Ageing
• Copying Model
• Alpha and beta Models
• Temporal Evolution
• Densification
Graph models: Copying model
• Copying model
• [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]:
• Add a node and choose the number of edges to add
• Choose a random vertex and “copy” its links (neighbors)
• Also generates power-law degree distributions
• Generates communities - clustering
Graph Models: The Alpha Model

Watts (1999)

a model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend.

For a range of a values:

• The world is small (average path length is short), and
• Groups tend to form (high clustering coefficient).

Probability of linkage as a function

of number of mutual friends

(a is 0 in upper left,

1 in diagonal,

and ∞ in bottom right curves.)

Graph Models: The Beta Model

Watts and Strogatz (1998)

b = 0

b = 1

b = 0.125

People know

their neighbors,

and a few distant people.

Clustered and

“small world”

People know

others at

random.

Not clustered,

but “small world”

People know

their neighbors.

Clustered, but

not a “small world”

Graph Models: The Beta Model

Watts and Strogatz (1998)

First five random links reduce the average path length of the network by half, regardless of N!

Both a and b models reproduce short-path results of random graphs, but also allow for clustering.

Small-world phenomena occur at threshold between order and chaos.

Clustering coefficient /

Normalized path length

Clustering coefficient (C) and average

path length (L) plotted against b

Other Related Work
• Hybrid models: Beta + Waxman on grid
• Huberman and Adamic, 1999: Growth dynamics of the world wide web
• Argue against Barabasi model for its age dependence
• Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph
• Watts, Dodds, Newman, 2002: Identity and search in social networks
• Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation
Statistics
• Statistics of common networks:

Large k = large c?

Small c = large d?

Modeling Ageing and Temporal Evolution
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that

N(t+1) = 2 * N(t)

• Q: what is guess for

E(t+1) =? 2 * E(t)

• A: over-doubled?
Temporal Evolution of Graphs
• Densification Power Law
• networks appear denser over time
• the number of edges grows faster than the number of nodes – average degree is increasing

a … densification exponent

or

equivalently

Graph Densification
• Densification Power Law
• Densification exponent: 1 ≤ a ≤ 2:
• a=1: linear growth – constant out-degree (assumed in the literature so far)
• a=2: quadratic growth – clique
• Let’s see the real graphs!
Densification – ArXiv citation graph in Physics
• Citations among physics papers
• 1992:
• 1,293 papers,

2,717 citations

• 2003:
• 29,555 papers, 352,807 citations
• For each month M, create a graph of all citations up to month M

E(t)

1.69

N(t)

Densification – Patent Citations
• Citations among patents granted
• 1975
• 334,000 nodes
• 676,000 edges
• 1999
• 2.9 million nodes
• 16.5 million edges
• Each year is a datapoint

E(t)

1.66

N(t)

Densification – Internet Autonomous Systems
• Graph of Internet
• 1997
• 3,000 nodes
• 10,000 edges
• 2000
• 6,000 nodes
• 26,000 edges
• One graph per day

E(t)

1.18

N(t)

Evolution of the Diameter
• Prior work on Power Law graphs hints at Slowlygrowing diameter:
• diameter ~ O(log N)
• diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
• As the network grows the distances between nodes slowly decrease
Diameter – ArXiv citation graph

diameter

• Citations among physics papers
• 1992 –2003
• One graph per year

time [years]

Diameter – “Patents”

diameter

• Patent citation network
• 25 years of data

time [years]

Diameter – Autonomous Systems

diameter

• Graph of Internet
• One graph per day
• 1997 – 2000

number of nodes

Next Time: Densification – Possible Explanations
• Generative models to capture the Densification Power Law and Shrinking diameters
• 2 proposed models:
• Community Guided Attachment – obeys Densification
• Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)