Small World: decentralized search

Small World: decentralized search (slide credits: Leskovec, Adamic, Metaxas, and authors of corresponding papers)

Small world in LJ • LiveJournal site, c. 2004: • 1.3M bloggers, who can list • Friends (other LJ bloggers) • Location • Interests, … • 500k LJ bloggers list home town and state that can be geomapped (to lat & long) • Only approximate (to within the city) • About 4M “friendship” links between these bloggers • mostly reciprocal links • 385k bloggers are in one connected component • In-degree/out-degree plots are heavy tailed

In silico Small World Expt • Simulate Milgram’s experiment • Pick random start node u and target t • Repeat until message is at u’s hometown: • If u is closer to t than any of t’s friends: • Give up (failing)/Forward to random people in u’s hometown • Else: • Pass the message to the friend of u closest to t, geographically

Result • Similar to Milgram • 18% (blue) or 80% (red) finish rate (vs. milgram: 30%) • Mean length 4 (blue)or 16(red):here they just reach his hometown (vs. milgram: 6) • Can we explain using Kleinberg’s model?

Looking at the geographical distribution of friendship links Avg. user has ~2.5 unif. random friends and ~5.5 geo distributed Mixture of power law (local connections) and uniformly-distributed long-range links. Difference in East and West coast link probabilities Problem: Kleinberg’s paper predicts that short paths are not locally findable with Prob(uv) = 1/Z d(u,v) -1.2

Improved model • Basic intuition: • need to account for population density • ranku (v) = |{w: d(uw)< d(uv)}| • New probability • Pr(random (uv) edge)  ranku (v)-r • For 2d-grid, using r=1 is same as using exponent 2 in Kleinberg’s model

Fitting data • Under rank model, optimum exponent = 1 • Observed exponent ~1.2

Group structure based models b=3 • Nodes belong to multiple foci, probability of edge depends on size of smallest common foci (q) • Pr(edge)  q-r Theorem: If r = 1 and outdegree is polylogarithmic, can search in O(log n) Individuals classified into hierarchies. Theorem: If a = 1 and outdegree is polylogarithmic, can search in O(log n) hij = height of lowest common ancestor h [Kleinberg, 2001]

Why do they look this way? • Why would networks have navigability as a property? • Caveats: not clear how universal it is • Is there an evolutionary model? • Sandberg & Clarke [2007]: • Start with a grid + uniform random edges • Choose a (s,t) pair uniformly, start routing locally from s • With some prob, each node in path rewires long range link to directly point to t • Simulation shows that it reaches r=1 for 1d-grid

Decentralized search in p2p systems

Client-server vs. p2p Systems In p2p - Files can be located anywhere on the network. - Nodes join and leave - Central repository is not present - BitTorrent, Kazaa, Gnutella - How can we quickly locate where a file is?

Chord • K files are assigned to N peers • Need to hash for uniform load • How can we design a hash table when nodes (i.e. hash buckets) can join and leave • Do not want to rehash all keys: O(K) work • Great idea: consistent hashing • Karger, Lehman, Leighton, Panigrahy, Levine, Lewin ‘97.

Consistent Hashing • Hash both nodes and files into m-bit IDs and arrange them in a circle • Each file stored in the node following it • If any node joins or leaves, expected file location changes = O(K/N)

Finding a file • Each node keeps a list of s neighbors following it • If neighbors are all active, just forward the lookup until we reach potential location • Expected steps O(N)

Faster Search • Each node n keeps a list of fingers • For eachi, corresponding finger is minm node in [n+2i-1, n+2i] • Similar to “long range links” in small world models

Faster Search • Use the fingers to make as much progress as possible without overshooting • In each step, we halve the distance • Expected steps = O(log N) • Technical issues related to how to update fingers, successors when nodes leave, join etc….

Power Laws

Power laws • x-axis: degree • y-axis: what fraction of nodes have this degree • Degree distribution is very different from what is expected for random graphs • Quite a few nodes with very high degree, lot of nodes with small degree

Log log plot • Log-log axis: both x and y axis are in log • x-axis: degree • y-axis: what fraction of nodes with this degree • Shows fitted best line log(y) = A - c log(x) y = Bx-c , y  x-c

Structure of the web: power laws (Broder et al.)

Exponential vs. power law [slide courtesy Leskovec]

Other power laws • Pareto 1897 – wealth distribution • Lotka 1926 – scientific output • Yule 1920s – species in a genera • Zipf 1940s – word frequency • Simon 1950s – city population

Why are they surprising? • Because they do not follow the intuition of Central Limit Theorem • For instance, one possible way to model degree is that each node decides to link with fixed probability • X = X1 + X2 + …. + Xn, X’s are i.i.d. • This would give normal distribution in limit • Pr(X = k) normal

Small World: decentralized search

Small World: decentralized search

Presentation Transcript

Adversarial Search

Lada Adamic, HP Labs, Palo Alto, CA

SPEAKING IN SMALL GROUPS

Search Capabilities and Features in SharePoint 2010

Introduction to AI Lecture 3: Uninformed Search

Introduction to Small-World Networks and Scale-Free Networks

Search and Rescue Operations

The User is the Computer: From Decentralized Systems to Social Computing

Introduction to Web Browsers and Basic Search Strategies Using Search Engines

Basic search methods

Outline: Problem solving and search

NOVA SCOTIA GROUND SEARCH AND RESCUE ASSOCIATION

Chapter 5 Real Time Heuristic Search

How to Build a Search Engine

Web and Search Engines

Course: Engineering Artificial Intelligence

Small-World Brain Networks

How to search with the PATENTSCOPE search system

Search Engines