1 / 21

Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

Week 2: P2P Overlay Networks. Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit.edu http://www.cs.rit.edu/~jmk. Client: Lookup (“title”). N2. N1. N3. N7. N4. N5. N8. N6.

bob
Download Presentation

Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 2: P2P Overlay Networks Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit.edu http://www.cs.rit.edu/~jmk

  2. Client: Lookup (“title”) N2 N1 N3 N7 N4 N5 N8 N6 • Structured P2P: Can we find a target efficiently, specifically in O(log n) steps? Peer-to-Peer (P2P) Networks • Lookup problem: Given a network of peers and a keyword, how can we find the data? Publisher Key = “Davinci Code” File = mpg data Internet Client Lookup(“Davinci Code”) File = mpg data

  3. Distributed Hash Table (DHT) • Question:Can we do the lookup in the logarithmic time? If yes, how? • In CS 3, we know how to do this if numbers are sorted in an array. Run binary search! • In P2P networks, we first compute node identifiers and key identifiers using a hash function. • Key-identifier = SHA-1(keyword) • Node-identifier = SHA-1(IP address) • We then arrange nodes and data items in a certain order in the same ID space. This is called DHT! • Now, how can we map key IDs to node IDs to support the lookup? • At the same time, the system must be scalable, robust, self-organizing, and completely decentralized.

  4. CAN: Insert/Retrieve Files • Node I wants to add file F with keyword K. • First, get the target coordinates x and y. • Second, route to the target (x, y). • Third, store F at the target. • How can we retrieve F later? • Answer: get the target coordinates using the same hash functions. I y = hy(K) (K,F) x = hx(K)

  5. CAN: Routing • Greedy approach • Select one of the neighbors which gets closer to the target. (a,b) (x,y)

  6. CAN: Node Join new node I • A new node contacts a random node I initially. • I routes the new node to other randomly picked target (p, q), which is node X. • The node X zone is split in half in which the new node takes one half. (p,q) new node X

  7. How Good is CAN? • Suppose that there are n nodes and d dimensions. • Scalable? • Inserting a new node affects only a single other node and its immediate neighbors • A node only maintains state for its immediate neighboring nodes. The number of neighbors is 2d. • Efficient? • Average routing path: (dn1/d)/4 hops. • Robust? • Resilient to node and link failures • No single point of failure since the system is completely distributed.

  8. Chord: Basic Lookup Key 5: K5 A key is stored at its successor N105 Node 105: K20 N32 Circular 7-bit ID space Which node is K80 stored at? N90

  9. N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 Lookup takes O(log(N)) steps Chord: Advanced Lookup N120 1/2 1/4 1/8 1/16 1/32 N80 Finger i points to successor of n+2^i

  10. N25 N36 K30 K36 wants to join N40 K30, K38 Chord: Join N25 K36 wants to join N40 K30, K38

  11. How Good is Chord? • Suppose that there are n nodes. • Scalable? • O(log(n)) states per lookup • Efficient? • O(log(n)) messages (or steps) per lookup • Robust? • Survives massive failures

  12. Is DHT a Silver Bullet? • What is the dominant P2P applications? • Mass-market file sharing mostly for music, video, and software • Potential problems with these applications? • Wide range of heterogeneity • Large transient user population • Flooding does not scale; how about DHTs? • DHTs scale, especially for finding the location of a given filename. • One problem with DHTs: require lots of extra work including caching, keyword searching • Do we really need DHTs for mass-market file sharing? • NOT necessarily! • DHTs are great at finding rare files, but most queries are about popular files.

  13. Suggested Solution: GIA • Can we make Gnutella-like P2P systems scalable? • Our answer is yes, and we designed GIA! • Idea • Unstructured (based on flooding), but take node capacity into account. • High-capacity nodes have room for more queries; why not sending more queries to them? • This will work only if: • High-capacity nodes have correspondingly more answers. • High-capacity nodes are easily reachable from other nodes.

  14. GIA (Gianduia) Design • Make high-capacity nodes easily reachable • Dynamic topology adaptation • Make high-capacity nodes have more answers • One-hop replication • Search efficiently • Biased random walks • Prevent overloaded nodes • Active flow control Query

  15. Dynamic Topology Adaptation • Goals • High capacity nodes are high-degree nodes. • Low capacity nodes are close to higher capacity ones. new node high-capacity node • How does a new node join? • A new node selects the node with max capacity (> its own capacity) among a small number of randomly selected nodes. • Uses a level of satisfaction as a function of capacity, degree, and age. • Neighbors must have outgoing capacity to handle forwarded queries.

  16. One-hop Replication • Content information is exchanged during connection, and updated incrementally. • High-capacity peers can act as a proxy for low capacity peers. • Nodes keep an index of their neighbors’ shared files. high-capacity node maintains indices for both itself and neighbors high-capacity node

  17. Active Flow Control • Sender sends queries to a neighbor only if that neighbor accepts queries. • How to know whether or not accept queries? • If that neighbor sends a token… • Active flow control periodically assigns tokens to neighbors. • Token allocation rate varies on query processing capability and buffer queue.

  18. Search Protocol • Biased random walk • Rather than forwarding incoming queries to randomly chosen neighbors, a node selects the highest capacity neighbor for which it has flow-control tokens and sends the query to that neighbor. • Use GUID to send queries to different paths. • TTL and max_responses bound propagation. • Advantage: reduce flooding • Disadvantage: sensitive to peer failures

  19. Performance

  20. Transient Behavior • GIA outperforms under heavy churn.

  21. Summary • GIA: scalable Gnutella • 3-5 orders of magnitude improvement in system capacity. • Unstructured approach is good enough • DHTs may be overkill!

More Related