An Overview of Gnutella. History. The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network was spurred on by Napster's threatened legal demise in early 2001. . A generic view. object1. object2. peer. peer.
The Gnutella network is a fully distributed
alternative to the centralized Napster.
Initial popularity of the network was spurred
on by Napster's threatened legal demise in
No central authority.
Gnutella is a protocol for distributed search
Servent: A Gnutella node. Each servent is both a server and a client.
Hops: a hop is a pass through an intermediate node
TTL: how many hops a packet can go before it dies
(default setting is 7 in Gnutella)
Simple idea , but lacks scalability, since query flooding wastes bandwidth.
Sometimes, existing objects may not be located due to limited TTL.
Subsequently, various improved search strategies have been proposed.
The topology is dynamic, I.e. constantly changing. How do
we model a constantly changing topology? Usually, we begin
with a static topology, and later account for the effect of churn.
Modeling topology(measurements provide useful inputs)
Power law graph
A random graph G(n, p) is constructed by starting with
a set of n vertices, and adding edges between pairs of
nodes at random. Every possible edge occurs independently
with probability p.
Q. Is Gnutella topology a random graph?
Gnutella topology is actually a power-law graph. (Also called
What is a power-law graph? The number of nodes with degree
k = c.k - r
(Contrast this with Gaussian distribution where the number of nodes
with degree k = c. 2 - k. )
Many graphs in the nature exhibit power-law characteristics. Examples, world-wide web
(the number of pages that have k in-links is proportional to k - 2), Thefraction of scientific
papers that receive k citations is k -3 etc.
How many telephone
numbers receive calls from k
different telephone numbers?
# of telephone numbers
from which calls were made
# of telephone numbers called
proportion of nodes
number of neighbors
power-law link distribution
data provided by Clip2
Nodes join at different times.
The more connections a node has, the more likely
it is to acquire new connections (“Rich gets richer”).
Popular webpages attract new pointers.
It has been mathematically shown that such a growth
process produces power-law network
Let p(d) be the probability that a random walk on a d-D lattice returns to the origin. In 1921, Pólya proved that,
(1) p(1)=p(2)=1, but
(2) p(d)<1 for d>2
There are similar results
on two walkers meeting
each other via random walk
Existence of a path does
not necessarily mean that
such a path can be discovered
Delay = discovery time in hops
Overhead = total distance covered by the walker
Both should be as small as possible.
For a single random walker, these are equal.
K random walkers is a compromise.
For search by flooding, if delay = hthen
overhead = d + d2 + … + dhwhere
d = degree of a node.
Letp = Population of the object. i.e. the fraction of nodes hosting the object
T = TTL (time to live)
Expected hop count E(h) =
1.p + 2.(1-p).p + 3(1-p)2.p + …+ T.(1-p)T-1.p
= 1/p. (1-(1-p)T) - T(1-p)T
With a large TTL, E(h) = 1/p, which is intuitive.
With a small TTL, there is a risk that search will time out before an existing object is located.
Assume they all k walkers start in unison. Probability that none could
find the object after one hop = (1-p)k. The probability. that none succeeded after T hops = (1-p)kT. So the probability that at least one walker succeeded is 1-(1-p)kT.A typical assumption is that the search is abandoned as soon as at least one walker succeeds
As k increases, the overhead increases, but the delay decreases. There is a tradeoff.
Biased walk utilizing node degree heterogeneity.
Utilizing structural properties like random graph, power-law graphs, or small-world properties
Topology adaptation for faster search
Introducing two layers in the graph structure using supernodes
Each node keeps track of the indices of the files belonging to its
immediate neighbors. As a result, high capacity / high degree nodes
can provide useful clues to a large number of search queries.
Each node records the degree of the neighboring nodes. Search
easily gravitates towards high degree nodes that hold more clues.
Deterministic biased walk
This growing surge in popularity revealed the limits of
the initial protocol's scalability. In early 2001, variations
on the protocol improved the scalability. Instead of
treating every user as client and server, some users
were treated as "ultrapeers” or “supernodes,” routing
search requests and responses for users connected
Powerful nodes (supernodes) act as local index servers, and
client queries are propagated to other supernodes. Two-layered