240 likes | 354 Views
This presentation delves into the concept of small world networks, focusing on the structures and behaviors of decentralized search mechanisms within peer-to-peer (P2P) systems. It explores the intricacies of friendship links among LiveJournal bloggers and conducts simulations akin to Milgram's experiment on social networks. Key models, such as Kleinberg's, are evaluated for predicting navigation efficiency. Additionally, we discuss critical techniques like consistent hashing for file location in P2P networks and the significance of power laws in network structures, revealing their counterintuitive nature compared to random graphs.
E N D
Small World: decentralized search (slide credits: Leskovec, Adamic, Metaxas, and authors of corresponding papers)
Small world in LJ • LiveJournal site, c. 2004: • 1.3M bloggers, who can list • Friends (other LJ bloggers) • Location • Interests, … • 500k LJ bloggers list home town and state that can be geomapped (to lat & long) • Only approximate (to within the city) • About 4M “friendship” links between these bloggers • mostly reciprocal links • 385k bloggers are in one connected component • In-degree/out-degree plots are heavy tailed
In silico Small World Expt • Simulate Milgram’s experiment • Pick random start node u and target t • Repeat until message is at u’s hometown: • If u is closer to t than any of t’s friends: • Give up (failing)/Forward to random people in u’s hometown • Else: • Pass the message to the friend of u closest to t, geographically
Result • Similar to Milgram • 18% (blue) or 80% (red) finish rate (vs. milgram: 30%) • Mean length 4 (blue)or 16(red):here they just reach his hometown (vs. milgram: 6) • Can we explain using Kleinberg’s model?
Looking at the geographical distribution of friendship links Avg. user has ~2.5 unif. random friends and ~5.5 geo distributed Mixture of power law (local connections) and uniformly-distributed long-range links. Difference in East and West coast link probabilities Problem: Kleinberg’s paper predicts that short paths are not locally findable with Prob(uv) = 1/Z d(u,v) -1.2
Improved model • Basic intuition: • need to account for population density • ranku (v) = |{w: d(uw)< d(uv)}| • New probability • Pr(random (uv) edge) ranku (v)-r • For 2d-grid, using r=1 is same as using exponent 2 in Kleinberg’s model
Fitting data • Under rank model, optimum exponent = 1 • Observed exponent ~1.2
Group structure based models b=3 • Nodes belong to multiple foci, probability of edge depends on size of smallest common foci (q) • Pr(edge) q-r Theorem: If r = 1 and outdegree is polylogarithmic, can search in O(log n) Individuals classified into hierarchies. Theorem: If a = 1 and outdegree is polylogarithmic, can search in O(log n) hij = height of lowest common ancestor h [Kleinberg, 2001]
Why do they look this way? • Why would networks have navigability as a property? • Caveats: not clear how universal it is • Is there an evolutionary model? • Sandberg & Clarke [2007]: • Start with a grid + uniform random edges • Choose a (s,t) pair uniformly, start routing locally from s • With some prob, each node in path rewires long range link to directly point to t • Simulation shows that it reaches r=1 for 1d-grid
Client-server vs. p2p Systems In p2p - Files can be located anywhere on the network. - Nodes join and leave - Central repository is not present - BitTorrent, Kazaa, Gnutella - How can we quickly locate where a file is?
Chord • K files are assigned to N peers • Need to hash for uniform load • How can we design a hash table when nodes (i.e. hash buckets) can join and leave • Do not want to rehash all keys: O(K) work • Great idea: consistent hashing • Karger, Lehman, Leighton, Panigrahy, Levine, Lewin ‘97.
Consistent Hashing • Hash both nodes and files into m-bit IDs and arrange them in a circle • Each file stored in the node following it • If any node joins or leaves, expected file location changes = O(K/N)
Finding a file • Each node keeps a list of s neighbors following it • If neighbors are all active, just forward the lookup until we reach potential location • Expected steps O(N)
Faster Search • Each node n keeps a list of fingers • For eachi, corresponding finger is minm node in [n+2i-1, n+2i] • Similar to “long range links” in small world models
Faster Search • Use the fingers to make as much progress as possible without overshooting • In each step, we halve the distance • Expected steps = O(log N) • Technical issues related to how to update fingers, successors when nodes leave, join etc….
Power laws • x-axis: degree • y-axis: what fraction of nodes have this degree • Degree distribution is very different from what is expected for random graphs • Quite a few nodes with very high degree, lot of nodes with small degree
Log log plot • Log-log axis: both x and y axis are in log • x-axis: degree • y-axis: what fraction of nodes with this degree • Shows fitted best line log(y) = A - c log(x) y = Bx-c , y x-c
Structure of the web: power laws (Broder et al.)
Exponential vs. power law [slide courtesy Leskovec]
Other power laws • Pareto 1897 – wealth distribution • Lotka 1926 – scientific output • Yule 1920s – species in a genera • Zipf 1940s – word frequency • Simon 1950s – city population
Why are they surprising? • Because they do not follow the intuition of Central Limit Theorem • For instance, one possible way to model degree is that each node decides to link with fixed probability • X = X1 + X2 + …. + Xn, X’s are i.i.d. • This would give normal distribution in limit • Pr(X = k) normal