1 / 26

Algorithms for Large Data Sets

Algorithms for Large Data Sets. Ziv Bar-Yossef. Lecture 7 May 14, 2006. http://www.ee.technion.ac.il/courses/049011. Web Structure I : Power Laws and Small World Phenomenon. Outline. Power laws The preferential attachment model Small-world networks The Watts-Strogatz model.

marlin
Download Presentation

Algorithms for Large Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006 http://www.ee.technion.ac.il/courses/049011

  2. Web Structure I:Power Laws and Small World Phenomenon

  3. Outline • Power laws • The preferential attachment model • Small-world networks • The Watts-Strogatz model

  4. Observed Phenomena • Few multi-billionaires, but many with modest income [Pareto, 1896] • Few frequent words, but many infrequent words [Zipf, 1932] • Few “mega-cities” but many small towns [Zipf, 1949] • Few web pages with high degree, but many with low degree [Kumar et al, 99] [Barabási & Albert, 99] All the above obey power laws.

  5. Power Law (Pareto) Distribution •  > 0: shape parameter (“slope”) • k > 0: location parameter • Ex: (k = $1000,  = 2) • 1/100 earn ≥ $10,000 • 1/10,000 earn ≥ $100,000 • 1/1,000,000 earn ≥ $1,000,000

  6. Power Law Properties • PDF: • Infinite mean for  ≤ 1 • Infinite variance for  ≤ 2 • When X is discrete,

  7. Power Law Graphs Linear Scale Plot Log-Log Plot Slope = - - 1

  8. Scale-Free Distributions • Power laws are invariant to scale • Ex: (k = arbitrary,  = 2) • 1/100 earn ≥ 10k • 1/10,000 earn ≥ 100k • 1/1,000,000 earn ≥ 1000k

  9. Heavy Tailed Distributions • In many “classical” distributions • Ex: normal, exponential • In power law distributions “light tail” “heavy tail”

  10. Zipf’s Law • Size of r-th largest city is • Equivalent to a power law: • X = size of a city • Change variables:

  11. Power Laws and the Internet • Web Graph • In- and out-degrees (in slope: ~2.1, out slope: ~2.7) [Kumar et al. 99, Barabási & Albert 99, Broder et al 00] • Sizes of connected components [Broder et al 00] • Website sizes [Huberman & Adamic 99] • Internet graph • Degrees [Faloutsos3 99] • Eigenvalues [Mihail & Papadimitriou 02] • Traffic • Number of visits to websites

  12. Power Laws and Graphs • If X is a random web page, then • What random graph model explains this phenomenon?

  13. Erdős-Rényi Random Graphs • Gn,p • n: size of the graph (fixed) • p: edge existence probability (fixed): • Every pair u,v is connected by an edge with probability p. • Theorem [Erdős & Rényi, 60] For any node x in Gn,p,

  14. Preferential Attachment [Barabási & Albert 99] • A novel random graph model • Initialization: graph starts with a single node with two self loops. • Growth: At every step a new node v is added to the graph. v has a self loop and connects to one neighbor. • Preferential attachment: v connects to u with probability • The rich get richer / The winner takes it all

  15. Why Does it Work? • : # of nodes whose indegree = k after t steps • k > 1: • Expected growth: • k = 1:

  16. Why Does it Work? (2) • Fact: After sufficiently many steps, reaches a “steady state”. • ck = value of at the steady state. • Since at steady state, • Hence, • Therefore:

  17. Why Does it Work? (3) • Then: • And: • Therefore:

  18. Six Degrees of Separation[Stanley Milgram, 67] • “Random starters” at Nebraska, Kansas, etc. • Destinations: in Boston • Intermediaries send postcards to Milgram • Findings: average of 6 postcards • “Conclusion”: every two people in the US are connected by a path of length ~ 6

  19. Small-World Networks • Average diameter: length of shortest path from u to v, averaged over all pairs u,v • Clustering coefficient: fraction of neighbors of v that are neighbors of each other, averaged over all v • Small-world network: a sparse graph with average diameter O(log n) and a constant clustering coefficient

  20. The Web as a Small World Network Low diameter • Study of a synthetic web graph model [Albert, Jeong, Barabási 99] • Average diameter of the Web is ~19 • Grows logarithmically with size of the Web. • Study of a large crawl [Broder et al 00] • Average diameter of the SCC is ~ 16 • Maximum diameter of the SCC is ≥ 28 • Diameter of host graph [Adamic 99] • Average diameter of SCC: ~4 • High clustering coefficient • Clustering coefficient of host graph [Adamic 99] • Clustering coefficient: ~0.08 (compared to 0.001 in a comparable random graph)

  21. Model for Small-World Networks[Watts & Strogatz 98] • One extreme: random networks • Low diameter • Low clustering coefficient • Other extreme: “regular” networks (e.g., a lattice) • High clustering coefficient • High diameter • Small-world: interpolation between the two • Low diameter • High clustering coefficient • Regularity: social networking • Randomness: individual interests

  22. Random Network The model: • n vertices • Every pair u,v is connected by an edge with probability p = d/n Properties: • Expected number of edges: ~dn • Graph is connected w.h.p • Diameter: O(log n) w.h.p. • Clustering coefficient: ~ p = d/n = o(1)

  23. Ring Lattice The model: • n vertices on a circle • Every vertex has d neighbors: the d/2 vertices to its right and the d/2 vertices to its left Properties: • Number of edges: dn/2 • Graph is connected • Diameter: O(n/d) • Clustering coefficient:

  24. Random Rewiring • Start from a ring lattice • for i = 1 to d/2 do • for v = 1 to n do • Pick i-th clockwise nearest neighbor of v • With probability p, replace this neighbor by a random vertex

  25. Analysis • If p = 0, ring lattice • High clustering coefficient • High diameter • If p = 1, random network • Logarithmic diameter • Low clustering coefficient • However, • Diameter goes down rapidly as p grows • Clustering coefficient goes down slowly as p grows • Therefore, for small p, we get a small-world network. • Logarithmic diameter • High clustering coefficient

  26. End of Lecture 7

More Related