190 likes | 314 Views
José Rolim. Generative Models for the Web Graph. Aim. Reproduce emergent properties: Distribution site size Connectivity of the Web Power law distriubutions Small World Properties. Classical Model Random Graphs. Erdos-Renyi Graph G(n,p) n number of nodes p probability of connextion
E N D
José Rolim Generative Models for the Web Graph
Aim • Reproduce emergent properties: • Distribution site size • Connectivity of the Web • Power law distriubutions • Small World Properties
Classical ModelRandom Graphs • Erdos-Renyi Graph G(n,p) • n number of nodes • p probability of connextion • pc threshold probability • p < pc -many disconnected components • p=pc - a large connected component • p=1 – a complete graph
Limitations • To model the web graph: • Constant number of nodes • Same probability among sommets • etc, etc
Web Page Growth Model • Sites with short term (daily) size fluctuations proportional to their size • Assume an overall growth rate a such that: • S(t+1)=a(1+vb)S(t) • S(t)= # pages of site s at time t • v=+-1 – Bernouilli variable avec prob. 0.5 • b= absolute rate of daily fluctuations
Web Page Growth • Donc: S(T)= aT S(0) πT0 (1+nib) ou: • logS(T)=Tloga+logS(0)+ΣT0 log(1+nib)= l log(1+b)+(T-l)log(1-b) • l= # positive fluctuations • Therefore: S(T) has a lognormal distribution or follows a power law:
Web page growth • Probability P(s) of a site to have s pages: • P(S)=ΣiP(s/bi)P(bi)= Σici/Sgi = c/Sg • Power Law • g has been experimentaly evaluated for the web as between 1.6 and 2.0
Small world models • Properties: • Sparse • Cliquishness • Small Diameter • Two models • Edge-reassigning small world network • Edge addition small world network
Edge reassigning model • Evolution starts with a ring of n nodes and each node connected to d nearest neighbors • Then each edge is randomly reassigned to distant nodes with probability p in a round robin fashion • See example page 10 with n=10 and d=4
Edge addition model • At the original ring additional edges are added randomly giving an expected number • p.d.n/2 new edges • p probability of addition of an edge • See example page 13 • Criticism to small world: • No newpages neither deletion of pages • No deletion of links
Rich get richer • Preferential attachement model • Start with a null graph with no nodes • At each time step add a new node and connect it to m nodes selected randomly with probability proportional to their degree • See ex. page 16
Important measures • Average diameter • Cliquishness ( measure the average density of local connections): • Take a node v sith degree d • Its d neighbors have max=d.(d-1)/2 links • Let cv=real number of links / max • C= Σv cv/V.
Remarks on rich get richer • Reproduces the power law of number of links. • Eg: the probability of a page i to have degree di is A/dic • A is proportional to the square of the network • c is a constant • c was found empirically to be 2.9 and theoretically 3
Criticism on Rich Get Richer • Does not allow reconnection of existing edges • Addition of new edges take place only when new nodes are added
Copy models • At each time step a node is added • With prob. p a new edge is created between this node and a randomly chosen node • With prob. 1-p: we choose randomly a node and uniformly one of the out edges and we link the new node to the node that this chosen edge enters.
Remarks • Why is called copy? • There are more elaborated models which allow addition of more than a edge each time • It is also a sort of « rich get richer »
Applications • Distributed search algorithms • Subgraph patterns and communities • Robusteness and vulnerability • Page rank algorithms