cs728 lecture 5 generative graph models and the web
Download
Skip this Video
Download Presentation
CS728 Lecture 5 Generative Graph Models and the Web

Loading in 2 Seconds...

play fullscreen
1 / 25

CS728 Lecture 5 Generative Graph Models and the Web - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

CS728 Lecture 5 Generative Graph Models and the Web. Importance of Generative Models. Gives insight into the graph formation process: Anomaly detection – abnormal behavior, evolution Predictions – predicting future from the past Simulations and evaluation of new algorithms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS728 Lecture 5 Generative Graph Models and the Web' - dot


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
importance of generative models
Importance of Generative Models

Gives insight into the graph formation process:

  • Anomaly detection – abnormal behavior, evolution
  • Predictions – predicting future from the past
  • Simulations and evaluation of new algorithms
  • Graph sampling – many real world graphs like the web are too large and complex to deal with
  • Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures
graph models waxman models
Graph Models: Waxman Models
  • Used for models of clustering in Internet-like topologies and networks with long and short edges
  • The vertices are distributed at random in a plane.
  • An edge is added between each pair of vertices with probability p.

p(u,v) =  * exp( -d / (*L) ), 0 ,  1.

  • L is the maximum distance between any two nodes.
  • Increase in alpha increases the number of edges in the graph.
  • Increase in beta increases the number of long edges relative to short edges.
  • d is the Euclidean distance from u to v in Waxman-1.
  • d is a random number between [0, L] in Waxman-2.
graph models configuration model
Graph Models: Configuration Model
  • Random Graph from given degree sequence
  • Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence
  • Solution:

Place di stubs onto vertex I

Choose pairs of stubs at random

slide5
Problem: we may construct graphs with loops and multiedges
  • To prevent this there must be enough “absorbing” residual degree capacity.
  • Algorithm:
  • Maintain list of nodes sorted by residual degrees d(v)
  • Repeat until all nodes have been chosen:
    • pick arbitrary vertex v
    • add edges from v to d(v) vertices of highest residual degree
    • update residual degrees

To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v)

Works OK, But is there a more ‘natural’ generative model?

generative graph models preferential attachment
Generative Graph models: Preferential attachment
  • Price’s Model [65] : Physics citations – “cummulative advantage”
  • Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto)
  • Matthew effect / Matilda effect: sociology
  • Barabasi and Albert 99: Preferential attachment:
    • Add a new node, create d out-links
    • Probability of linking a node is proportional to its current degree
  • Simple explanation of power-law degree distributions
issues with preferential attachment and power laws
Issues with preferential attachment and Power-laws
  • Barabasi model fixed constant m for out-degree
  • Price’s model directed with m mean out-degree
  • Probability of adding a new edge is proportional to its (in) degree k
    • problem at the start degree 0
    • Price’s model: prop to deg + 1
    • Analysis: prob a node has degree k
      • pk ~ k-3 (Barabasi model)
      • pk ~ k-(2+1/m) power-law with exponent 2-3 (Price)
  • Exercise: give pseudocode that generates such a graph in linear time
variations on the pa theme
Variations on the PA Theme
  • Clustering, Small-World and Ageing
  • Copying Model
  • Alpha and beta Models
  • Temporal Evolution
  • Densification
graph models copying model
Graph models: Copying model
  • Copying model
  • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]:
    • Add a node and choose the number of edges to add
    • Choose a random vertex and “copy” its links (neighbors)
  • Also generates power-law degree distributions
  • Generates communities - clustering
graph models the alpha model
Graph Models: The Alpha Model

Watts (1999)

a model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend.

For a range of a values:

  • The world is small (average path length is short), and
  • Groups tend to form (high clustering coefficient).

Probability of linkage as a function

of number of mutual friends

(a is 0 in upper left,

1 in diagonal,

and ∞ in bottom right curves.)

graph models the beta model
Graph Models: The Beta Model

Watts and Strogatz (1998)

“Link Rewiring”

b = 0

b = 1

b = 0.125

People know

their neighbors,

and a few distant people.

Clustered and

“small world”

People know

others at

random.

Not clustered,

but “small world”

People know

their neighbors.

Clustered, but

not a “small world”

graph models the beta model12
Graph Models: The Beta Model

Watts and Strogatz (1998)

First five random links reduce the average path length of the network by half, regardless of N!

Both a and b models reproduce short-path results of random graphs, but also allow for clustering.

Small-world phenomena occur at threshold between order and chaos.

Clustering coefficient /

Normalized path length

Clustering coefficient (C) and average

path length (L) plotted against b

other related work
Other Related Work
  • Hybrid models: Beta + Waxman on grid
  • Huberman and Adamic, 1999: Growth dynamics of the world wide web
    • Argue against Barabasi model for its age dependence
  • Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph
  • Watts, Dodds, Newman, 2002: Identity and search in social networks
  • Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation
statistics
Statistics
  • Statistics of common networks:

Large k = large c?

Small c = large d?

modeling ageing and temporal evolution
Modeling Ageing and Temporal Evolution
  • N(t) … nodes at time t
  • E(t) … edges at time t
  • Suppose that

N(t+1) = 2 * N(t)

  • Q: what is guess for

E(t+1) =? 2 * E(t)

  • A: over-doubled?
temporal evolution of graphs
Temporal Evolution of Graphs
  • Densification Power Law
    • networks appear denser over time
    • the number of edges grows faster than the number of nodes – average degree is increasing

a … densification exponent

or

equivalently

graph densification
Graph Densification
  • Densification Power Law
  • Densification exponent: 1 ≤ a ≤ 2:
    • a=1: linear growth – constant out-degree (assumed in the literature so far)
    • a=2: quadratic growth – clique
  • Let’s see the real graphs!
densification arxiv citation graph in physics
Densification – ArXiv citation graph in Physics
  • Citations among physics papers
  • 1992:
    • 1,293 papers,

2,717 citations

  • 2003:
    • 29,555 papers, 352,807 citations
  • For each month M, create a graph of all citations up to month M

E(t)

1.69

N(t)

densification patent citations
Densification – Patent Citations
  • Citations among patents granted
  • 1975
    • 334,000 nodes
    • 676,000 edges
  • 1999
    • 2.9 million nodes
    • 16.5 million edges
  • Each year is a datapoint

E(t)

1.66

N(t)

densification internet autonomous systems
Densification – Internet Autonomous Systems
  • Graph of Internet
  • 1997
    • 3,000 nodes
    • 10,000 edges
  • 2000
    • 6,000 nodes
    • 26,000 edges
  • One graph per day

E(t)

1.18

N(t)

evolution of the diameter
Evolution of the Diameter
  • Prior work on Power Law graphs hints at Slowlygrowing diameter:
    • diameter ~ O(log N)
    • diameter ~ O(log log N)
  • What is happening in real data?
  • Diameter shrinks over time
    • As the network grows the distances between nodes slowly decrease
diameter arxiv citation graph
Diameter – ArXiv citation graph

diameter

  • Citations among physics papers
  • 1992 –2003
  • One graph per year

time [years]

diameter patents
Diameter – “Patents”

diameter

  • Patent citation network
  • 25 years of data

time [years]

diameter autonomous systems
Diameter – Autonomous Systems

diameter

  • Graph of Internet
  • One graph per day
  • 1997 – 2000

number of nodes

next time densification possible explanations
Next Time: Densification – Possible Explanations
  • Generative models to capture the Densification Power Law and Shrinking diameters
  • 2 proposed models:
    • Community Guided Attachment – obeys Densification
    • Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)
ad