210 likes | 381 Views
Week 2: P2P Overlay Networks. Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit.edu http://www.cs.rit.edu/~jmk. Client: Lookup (“title”). N2. N1. N3. N7. N4. N5. N8. N6.
E N D
Week 2: P2P Overlay Networks Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit.edu http://www.cs.rit.edu/~jmk
Client: Lookup (“title”) N2 N1 N3 N7 N4 N5 N8 N6 • Structured P2P: Can we find a target efficiently, specifically in O(log n) steps? Peer-to-Peer (P2P) Networks • Lookup problem: Given a network of peers and a keyword, how can we find the data? Publisher Key = “Davinci Code” File = mpg data Internet Client Lookup(“Davinci Code”) File = mpg data
Distributed Hash Table (DHT) • Question:Can we do the lookup in the logarithmic time? If yes, how? • In CS 3, we know how to do this if numbers are sorted in an array. Run binary search! • In P2P networks, we first compute node identifiers and key identifiers using a hash function. • Key-identifier = SHA-1(keyword) • Node-identifier = SHA-1(IP address) • We then arrange nodes and data items in a certain order in the same ID space. This is called DHT! • Now, how can we map key IDs to node IDs to support the lookup? • At the same time, the system must be scalable, robust, self-organizing, and completely decentralized.
CAN: Insert/Retrieve Files • Node I wants to add file F with keyword K. • First, get the target coordinates x and y. • Second, route to the target (x, y). • Third, store F at the target. • How can we retrieve F later? • Answer: get the target coordinates using the same hash functions. I y = hy(K) (K,F) x = hx(K)
CAN: Routing • Greedy approach • Select one of the neighbors which gets closer to the target. (a,b) (x,y)
CAN: Node Join new node I • A new node contacts a random node I initially. • I routes the new node to other randomly picked target (p, q), which is node X. • The node X zone is split in half in which the new node takes one half. (p,q) new node X
How Good is CAN? • Suppose that there are n nodes and d dimensions. • Scalable? • Inserting a new node affects only a single other node and its immediate neighbors • A node only maintains state for its immediate neighboring nodes. The number of neighbors is 2d. • Efficient? • Average routing path: (dn1/d)/4 hops. • Robust? • Resilient to node and link failures • No single point of failure since the system is completely distributed.
Chord: Basic Lookup Key 5: K5 A key is stored at its successor N105 Node 105: K20 N32 Circular 7-bit ID space Which node is K80 stored at? N90
N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 Lookup takes O(log(N)) steps Chord: Advanced Lookup N120 1/2 1/4 1/8 1/16 1/32 N80 Finger i points to successor of n+2^i
N25 N36 K30 K36 wants to join N40 K30, K38 Chord: Join N25 K36 wants to join N40 K30, K38
How Good is Chord? • Suppose that there are n nodes. • Scalable? • O(log(n)) states per lookup • Efficient? • O(log(n)) messages (or steps) per lookup • Robust? • Survives massive failures
Is DHT a Silver Bullet? • What is the dominant P2P applications? • Mass-market file sharing mostly for music, video, and software • Potential problems with these applications? • Wide range of heterogeneity • Large transient user population • Flooding does not scale; how about DHTs? • DHTs scale, especially for finding the location of a given filename. • One problem with DHTs: require lots of extra work including caching, keyword searching • Do we really need DHTs for mass-market file sharing? • NOT necessarily! • DHTs are great at finding rare files, but most queries are about popular files.
Suggested Solution: GIA • Can we make Gnutella-like P2P systems scalable? • Our answer is yes, and we designed GIA! • Idea • Unstructured (based on flooding), but take node capacity into account. • High-capacity nodes have room for more queries; why not sending more queries to them? • This will work only if: • High-capacity nodes have correspondingly more answers. • High-capacity nodes are easily reachable from other nodes.
GIA (Gianduia) Design • Make high-capacity nodes easily reachable • Dynamic topology adaptation • Make high-capacity nodes have more answers • One-hop replication • Search efficiently • Biased random walks • Prevent overloaded nodes • Active flow control Query
Dynamic Topology Adaptation • Goals • High capacity nodes are high-degree nodes. • Low capacity nodes are close to higher capacity ones. new node high-capacity node • How does a new node join? • A new node selects the node with max capacity (> its own capacity) among a small number of randomly selected nodes. • Uses a level of satisfaction as a function of capacity, degree, and age. • Neighbors must have outgoing capacity to handle forwarded queries.
One-hop Replication • Content information is exchanged during connection, and updated incrementally. • High-capacity peers can act as a proxy for low capacity peers. • Nodes keep an index of their neighbors’ shared files. high-capacity node maintains indices for both itself and neighbors high-capacity node
Active Flow Control • Sender sends queries to a neighbor only if that neighbor accepts queries. • How to know whether or not accept queries? • If that neighbor sends a token… • Active flow control periodically assigns tokens to neighbors. • Token allocation rate varies on query processing capability and buffer queue.
Search Protocol • Biased random walk • Rather than forwarding incoming queries to randomly chosen neighbors, a node selects the highest capacity neighbor for which it has flow-control tokens and sends the query to that neighbor. • Use GUID to send queries to different paths. • TTL and max_responses bound propagation. • Advantage: reduce flooding • Disadvantage: sensitive to peer failures
Transient Behavior • GIA outperforms under heavy churn.
Summary • GIA: scalable Gnutella • 3-5 orders of magnitude improvement in system capacity. • Unstructured approach is good enough • DHTs may be overkill!