hybrid search schemes for unstructured peer to peer networks random walks in peer to peer networks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007 PowerPoint Presentation
Download Presentation
Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

Loading in 2 Seconds...

play fullscreen
1 / 43

Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007 - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” “Random Walks in Peer-to-Peer Networks”. Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007' - yin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hybrid search schemes for unstructured peer to peer networks random walks in peer to peer networks

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”“Random Walks in Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

Presented by Paul Bogdan

February 28th, 2007

slide2

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

outline
Outline
  • Random Graph Models
  • Flooding and Normalization
  • Random Walks and Replication
  • Generalized Search Schemes
  • Experimental evaluation
motivation
Motivation
  • Flooding + small time-to-live (TTL) performs well in regular graphs
      • Performance metric: number of exchanged messages/distinct response
      • Its performance decreases: when TTL increases or for irregular networks
  • Random Walk performs better than flooding
      • scalability, granularity
  • Hybrid + Generalized search schemes:
      • Random Walks with lookahead, Random Walks with 1-step replication
contribution
Contribution
  • Random walks (RW) with shallow flooding offer good performance (analytic justification)

R1: In a random graph model with O(n) nodes of constant degree and

O(n1/2) nodes of degree O(n1/2) the expected time to discover Ω(n) is O(n1/2).

R2: Random Walks with look-ahead 1 or 1-step replication perform better

when there is discrepancy on the degrees of the underlying topology.

  • Normalized Flooding (NF) solution

R3: NF achieves comparable performance to flooding in regular graphs.

R4: NF with 1-step replication achieves performance comparable to RW

with 1-step replication.

R5: Local information of the network (nodes degree) offers global benefit.

  • Generalized Search Schemes
random graph models
Random Graph Models
  • Random Regular Graphs – Gn,d

Gn,d represents a graph with n nodes and each node is of degree d.

Gn,d has a sum of degree D = nd .

  • Random Graphs with super-nodes - Gn,d,α,β

Given α and βconstants, Gn,d,α,βdenotes a graphs with αn1/2 of degreeβn1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices).

Gn,d,α,βhas a sum of degree D = (αβ+d)n.

flooding and normalization
Flooding and Normalization
  • Theorem 3.1.: Let us consider Gn,drandom regular graph, flooding scenario from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2

Claims:

(1)

(2)

(3)

slide8
(1)
  • Proof:
slide9
(2)
  • Proof:
slide11
(3)
  • Proof:
flooding and normalization1
Theorem 3.2.: Let Gn,d,α,β be a random graph with supernodes and a flooding scenario from node v of degree d with time-to-live τ.

Claim: For some τ = O(log log n), the number of distinct responses isΩ(n).

Proof:

Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1.

Assumption: this set (of visited nodes) doesn’t contain a large degree vertex.

From d-regular graphs we know that this set contains at least (d - 1)τ-1 edges.

The probability that no vertex in Γ(Sτ-1(v)) is bounded by (d/(d+αβ))(d - 1)^(τ-1) = (d/(d+αβ))clog n so within the first O(loglog n) steps we see a large vertex.

Flooding and Normalization
flooding and normalization2
Flooding and Normalization
  • Theorem 3.3. : Let Gn,d,α,β be a random graph with supernodes, a normalized

flooding scenario from node v with TTL . Then the number of distinct

responses is Ω((d - 1)τ-1) and the number of messages per response is O(1).

Proof:

From Theorem 3.1. the number of minigroups seen is (d - 1)τ-1

The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ)

LetXi, i = 1,…,N be random variables with P[ Xi=1]=pi and P[Xi=0]=1-pi

Using the above Chernoff bound the probability that less than Q/2 are seen is

vanishingly small.

random walks and replication
Random Walks and Replication
  • Random Walk with Look-Ahead:
      • a random walk with shallow flooding on each step of the walk
      • RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2))
  • Theorem 4.2.: Let Gn,d,α,β be a random graph with supernodes and consider a

random walk from a node v. Then, in 1-step replication scenario, the expected

number of messages and response time to obtain distinct

responses is

slide15
Theorem 4.3.: Let Gn,d,α,β be a random graph with supernodes and consider

Normalized flooding from v with TTL τ≈ (log n)/(2*log(d-1)). Then, in 1-step

replication scenario, the number of distinct responses is at least

and the number of messages is at most

Proof:

The number of minigroups seen is(d - 1)τ– 1 and using the Chernoff bounds

there will be minigroups corresponding to large vertices.

generalized search schemes
Generalized Search Schemes
  • Searching procedure:
      • A node of degreedinitiates a search based on a budgetk

budget = number of messages that are propageted in the network

      • Among its d neighbors the node picks certain quantities k1,k2,…,kd such that k1 + k2 + … + kd = k
      • For every neighbor i the master node forwards the message with budget ki (forki = 0 the message is not transmitted)
      • Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0
      • Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget
      • Random Walks + Flooding
experimental evaluation
Experimental Evaluation
  • Methodology
    • Performance Metrics
      • Median and Mean number of distinct peers discovered (hits)
      • Minimum, Maximum, Standard Deviation of the number of hits
      • Number of messages
      • Granularity of number of messages
      • Response time
    • Topologies
      • Random d-Regular Graphs
      • Power Law Graphs
      • Bimodal topologies
      • Clustered topologies
normalized flooding nf
Normalized Flooding (NF)
  • Mean number of unique peers discovered as a function of the initial TTL
  • NF and Standard Flooding behave similarly in Regular Graphs
  • NF controls the number of messages and provides higher efficiency
normalized flooding nf1
Normalized Flooding (NF)
  • The number of unique peers increases exponentially with TTL in NF case
  • The number of peers increases faster than exponentially with TTL in topologies with high degrees
random walk with lookahead rwla
Random Walk with LookAhead (RWLA)
  • RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered)
  • RWLA response time is much smaller compared to standard RW
edge criticality searching with weights
Edge Criticality & Searching with weights
  • Generalized Searching performs similarly to Standard Flooding in regular graphs
  • Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.
conclusions
Conclusions
  • Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs
  • RW with 1-step replication performs better than RW and NF in irregular graphs
  • Open for improvements:
      • Generalized schemes (analytic investigation)
      • Quantifying Directional flooding
random walks in peer to peer p2p networks

“Random Walks in Peer-to-Peer (P2P) Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

outline1
Outline
  • Motivation
  • Statistical Estimation and Random Walks (RW)
  • Searching
      • Methodology and Topologies importance
  • Construction and Summary
motivation1
Motivation
  • Random Walks (RW) were proposed for constructing searching and topology maintenance protocols in P2P networks
      • RW improve searching performance as compared to flooding (Cao et al., 2002)
      • A RW approach to constructing and maintaining unstructured topologies provides good connectivity properties (i.e. constant degree, constant expansion)
  • Claim: RW approach is a good candidate
      • to simulate uniform sampling
      • the number of simulation steps required can be as low as the number of samples in independent uniform sampling
  • Searching and Overlay Topology Construction
      • RW searching performs better than flooding for the same number of messages and for cluster and slow dynamic topologies
      • Construction of P2P networks by random walks
statistical estimation random walks
Statistical Estimation & Random Walks
  • Coupon collection and Chernoff bounds
      • n - type of coupons & each time one is drawn (uniformly distributed)
      • Tn - time by which we extracted coupons belonging to all n types
      • Tαn - time by which we encountered αn distinct types, 0 < α < 1
      • X1,…,Xk independent Bernoulli trials, P[Xi=1]=piand P[Xi=0]=1-pi
      • p -probability that a random drawn object has a particular property
      • the probability that the property is found in substantially fewer draws than its frequency in the search space and the quality of the estimator X/k are bounded by
statistical estimation random walks1
Statistical Estimation & Random Walks
  • Random Walks (RW), Convergence and Cover Time
      • G = (V,E) undirected graph, |V| = n, and di- degree of vertex I
      • Aij -adjacency matrix, P -transition matrix which satisfies
      • f: V→{0,1} which satisfies
      • Convergence rate metric - the rate at which the RW approaches the stationary distribution
      • Cover time metric - the time by which all nodes were visited
      • Trajectory sample average - the rate at which the value of f averaged over successive vertices of the RW trajectory approaches p
statistical estimation random walks2
Statistical Estimation & Random Walks
  • Convergence rate is related to the second eigenvalue of P

(1)

      • yt – the vertex that the RW visited at time t
  • Cover time

(2)

  • Trajectory sample average

(3)

(1) :[ 11], (2) :[ 12, 13] , (3) :[ 3, 4, 5, 6]

statistical estimation random walks3
Statistical Estimation & Random Walks
  • Second Eigenvalue, Expansion and Conductance
      • S subset of V, C(S) cutset of V (i.e. edges with one point in S and the other one in V\S), vol(S) (i.e. the sum of degrees of vertices in S)
      • Expansion
      • Conductance
      • Known bound

[ 11, 14, 15, 16, 17, 18, 19]

searching
Searching
  • Performance metrics for Flooding and RW
      • average number of distinct copies of an item located in the search
      • number of messages used by the searching algorithm
  • RW performs better than flooding if
      • multiple search requests for the same item with slow-changing topology
      • peer clustering ( see [20, 21, 22, 23, 24, 25] for details)
  • Searching analysis
      • Methodology
      • Flat topologies with Uniformly Distributed Content
      • Topologies with Peer Clustering
      • Re-issuing the Same Query
      • Real topologies
searching methodology
Searching - Methodology
  • Performance Metrics
      • mean of the number of distinct copies (i.e. Mean)
      • discrepancy around the mean (i.e. Std) and the failure probability
  • Cost
      • number of messages or queries performed during search
  • Peer-to-peer topologies ( ≈ 1 million nodes)
      • Flat regular expanders, Two tier topologies with clustering, Power law graphs, Samples from real topologies
  • Dynamic topologies
      • rewiring
  • Content placement
      • Content clustering affects the performance of searching
searching flat topologies
Searching – Flat Topologies
  • Experiment:
      • one request in a network of 500K peers
      • Mean hits,Minimum # of hits and Std are similar for Flooding and RW
      • the entire distribution of hits is similar for Flooding and RW
searching topologies with peer clustering
Searching -Topologies with Peer Clustering
  • Cluster topology consists of
      • 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph
  • Number of hits for RW is more concentrated around the mean compared to Flooding
searching reissuing the same query
Searching - Reissuing the Same Query
  • Experiment setup – repeat 4 times the below procedure
      • each peer sends a request and waits for response
      • between requests 2% of the links are rewired
      • each peer initiates a new searching
  • RW have better performance than Flooding
      • Mean Hits and Failure Probability
searching reissuing the same query1
Searching - Reissuing the Same Query
  • Performance of successive searches depends
      • on the number of topology changes considered between consecutive searches
  • Performance of Flooding increases as the rate of topological changes increases
  • RW Performance remains the same for small variations
searching real topologies
Searching – Real Topologies
  • The number of hits for RW is more concentrated around the mean than in Flooding
  • P2P have good expansion properties
construction
Construction
  • P2P network construction concerns with:
      • peers arrive and leave the network dynamically
      • strong and weak decentralization
      • low network overhead per addition or deletion
baseline construction of expander graphs
Baseline Construction of Expander Graphs
  • ABASE (undirected graph) consists of:
      • n vertices where each one chooses randomly d vertices
      • total number of edges = nd and expected vertex degree = 2d
  • Theorem 4.1. Let G(V,E) a graph constructed by ABASE.

Then, G is an expander with high probability and for positive

constant α < 1

baseline construction of expander graphs with constant overhead in random bits
Baseline Construction of Expander Graphs with Constant Overhead in Random Bits
  • A’BASE constructionalgorithm:
      • start a RW at a random vertex on H (constant degree expander graph)
      • when ABASE needs a random number this is taken from the RW on H
  • Theorem 4.2. Let G(V,E) a graph constructed by A’BASE.

There are positive constants α, 0 < β < 0.5 such that any

subset S of at least β|V| and at most 0.5|V| has cutset

expansionαalmost surely.

distributed construction of expanders with constant overhead on network resources
Distributed Construction of Expanders with Constant Overhead on Network Resources
  • A’H – construction
      • d daemons , one for each Hamilton cycle
      • a new arriving node, it contacts the daemon associated with the i-th Hamilton cycle
      • it attaches after c number of steps between the peer that currently hosts daemon iand one of its neighbors in the cycle i
distributed construction of expanders with constant overhead on network resources1
Distributed Construction of Expanders with Constant Overhead on Network Resources
  • A’M – construction
      • d daemons , one for each Hamilton cycle
      • the arrival of a new arriving node consists of two X and Y nodes; X and Y contact the central server to discover the location of the d daemons
      • X becomes the neighbor of daemon i and Y the neighbor of the initial daemon’s neighbor
summary
Summary
  • For Searching
      • Random Walks (RW) are superior to Flooding
  • For Construction
      • RW add new peers with constant overhead
  • Open Problems
      • Strong Decentralized Construction algorithm
      • Can we handle better deletions and expansions of small sets?
      • How the P2P network parameters (e.g. capacities) affect the performance of RW?