1 / 42

Peer-to-Peer Networks

Peer-to-Peer Networks. Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea. What is Peer-to-Peer About?. Distribution Decentralized control Self-organization Symmetric communication P2P networks do two things: Map objects onto nodes

mora
Download Presentation

Peer-to-Peer Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer Networks Outline Overview Pastry OpenDHT Contributions from Peter Druschel & Sean Rhea CS 561

  2. What is Peer-to-Peer About? • Distribution • Decentralized control • Self-organization • Symmetric communication • P2P networks do two things: • Map objects onto nodes • Route requests to the node responsible for a given object CS 561

  3. Object Distribution 2128 - 1 O • Consistent hashing[Karger et al. ‘97] • 128 bit circular id space • nodeIds(uniform random) • objIds (uniform random) • Invariant: node with numerically closest nodeId maintains object objId nodeIds CS 561

  4. Object Insertion/Lookup 2128 - 1 O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Route(X) CS 561

  5. Routing: Distributed Hash Table d471f1 Properties • log16 N steps • O(log N) state d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc CS 561

  6. Leaf Sets • Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively. • routing efficiency/robustness • fault detection (keep-alive) • application-specific local coordination CS 561

  7. Routing Procedure if (D is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (RouteTab[l,d] exists) forward to RouteTab[l,d] else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node CS 561

  8. Routing Integrity of overlay: • guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: • No failures: < log16N expected, 128/b + 1 max • During failure recovery: • O(N) worst case, average case much better CS 561

  9. Node Addition d471f1 d467c4 d462ba d46a1c d4213f New node: d46a1c Route(d46a1c) d13da3 65a1fc CS 561

  10. Node Departure (Failure) Leaf set members exchange keep-alive messages • Leaf set repair (eager): request set from farthest live node in set • Routing table repair (lazy): get table from peers in the same row, then higher rows CS 561

  11. PAST: Cooperative, archival file storage and distribution • Layered on top of Pastry • Strong persistence • High availability • Scalability • Reduced cost (no backup) • Efficient use of pooled resources CS 561

  12. PAST API • Insert - store replica of a file at k diverse storage nodes • Lookup - retrieve file from a nearby live storage node that holds a copy • Reclaim - free storage associated with a file Files are immutable CS 561

  13. PAST: File storage fileId Insert fileId CS 561

  14. k=4 fileId Insert fileId PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size) CS 561

  15. PAST: File Retrieval C k replicas Lookup file located in log16 N steps (expected) usually locates replica nearest client C fileId CS 561

  16. DHT Deployment Today PAST (MSR/Rice) Overnet (open) i3 (UCB) CFS (MIT) OStore (UCB) pSearch (HP) PIER (UCB) Coral (NYU) KademliaDHT ChordDHT Pastry DHT TapestryDHT CANDHT KademliaDHT ChordDHT BambooDHT • Every application deploys its own DHT • (DHT as a library) IP connectivity CS 561

  17. DHT Deployment Tomorrow? PAST (MSR/Rice) Overnet (open) i3 (UCB) CFS (MIT) OStore (UCB) pSearch (HP) PIER (UCB) Coral (NYU) KademliaDHT ChordDHT PastryDHT TapestryDHT CANDHT KademliaDHT ChordDHT BambooDHT DHT indirection OpenDHT: one DHT, shared across applications (DHT as a service) IP connectivity CS 561

  18. Two Ways To Use a DHT • The Library Model • DHT code is linked into application binary • Pros: flexibility, high performance • The Service Model • DHT accessed as a service over RPC • Pros: easier deployment, less maintenance CS 561

  19. The OpenDHT Service • 200-300 Bamboo [USENIX’04] nodes on PlanetLab • All in one slice, all managed by us • Clients can be arbitrary Internet hosts • Access DHT using RPC over TCP • Interface is simple put/get: • put(key, value) — stores value under key • get(key) — returns all the values stored under key • Running on PlanetLab since April 2004 • Building a community of users CS 561

  20. OpenDHT Applications CS 561

  21. Recognize Fingerprint? An Example Application: The CD Database Compute Disc Fingerprint Album & Track Titles CS 561

  22. An Example Application: The CD Database Type In Album and Track Titles Album & Track Titles No Such Fingerprint CS 561

  23. A DHT-Based FreeDB Cache • FreeDB is a volunteer service • Has suffered outages as long as 48 hours • Service costs born largely by volunteer mirrors • Idea: Build a cache of FreeDB with a DHT • Add to availability of main service • Goal: explore how easy this is to do CS 561

  24. DHT DHT Cache Illustration New Albums Disc Fingerprint Disc Info Disc Fingerprint CS 561

  25. Is Providing DHT Service Hard? • Is it any different than just running Bamboo? • Yes, sharing makes the problem harder • OpenDHT is shared in two senses • Across applications  need a flexible interface • Across clients  need resource allocation CS 561

  26. Sharing Between Applications • Must balance generality and ease-of-use • Many apps want only simple put/get • Others want lookup, anycast, multicast, etc. • OpenDHT allows only put/get • But use client-side library, ReDiR, to build others • Supports lookup, anycast, multicast, range search • Only constant latency increase on average • (Different approach used by DimChord [KR04]) CS 561

  27. Sharing Between Clients • Must authenticate puts/gets/removes • If two clients put with same key, who wins? • Who can remove an existing put? • Must protect system’s resources • Or malicious clients can deny service to others • The remainder of this talk CS 561

  28. Fair Storage Allocation • Our solution: give each client a fair share • Will define “fairness” in a few slides • Limits strength of malicious clients • Only as powerful as they are numerous • Protect storage on each DHT node separately • Global fairness is hard • Key choice imbalance is a burden on DHT • Reward clients that balance their key choices CS 561

  29. Two Main Challenges • Making sure disk is available for new puts • As load changes over time, need to adapt • Without some free disk, our hands are tied • Allocating free disk fairly across clients • Adapt techniques from fair queuing CS 561

  30. Making Sure Disk is Available • Can’t store values indefinitely • Otherwise all storage will eventually fill • Add time-to-live (TTL) to puts • put (key, value)  put (key, value, ttl) • (Different approach used by Palimpsest [RH03]) CS 561

  31. Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring time B Starves Making Sure Disk is Available • TTLs prevent long-term starvation • Eventually all puts will expire • Can still get short term starvation: CS 561

  32. max Sum must be < max capacity Reserved for future puts. Slope = rmin TTL space size Candidate put 0 time now max Making Sure Disk is Available • Stronger condition: Be able to accept rmin bytes/sec new data at all times CS 561

  33. max max TTL TTL size space space size 0 0 time time now now max max Making Sure Disk is Available • Stronger condition: Be able to accept rmin bytes/sec new data at all times CS 561

  34. Queue full: reject put Per-client put queues Wait until can accept without violating rmin Select most under- represented Not full: enqueue put The Big Decision: Definition of “most under-represented” Fair Storage Allocation Store and send accept message to client CS 561

  35. Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves! time Defining “Most Under-Represented” • Not just sharing disk, but disk over time • 1-byte put for 100s same as 100-byte put for 1s • So units are bytes  seconds, call them commitments • Equalize total commitments granted? • No: leads to starvation • A fills disk, B starts putting, A starves up to max TTL CS 561

  36. Client A arrives fills entire of disk Client B arrives asks for space B catches up with A time A & B share available rate Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time” CS 561

  37. Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time” • Mechanism inspired by Start-time Fair Queuing • Have virtual time, v(t) • Each put gets a start time S(pci) and finish time F(pci) F(pci) = S(pci) + size(pci)  ttl(pci) S(pci) = max(v(A(pci)) - , F(pci-1)) v(t) = maximum start time of all accepted puts CS 561

  38. Fairness with Different Arrival Times CS 561

  39. Fairness With Different Sizes and TTLs CS 561

  40. Performance • Only 28 of 7 million values lost in 3 months • Where “lost” means unavailable for a full hour • On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to PL kernel bug, only lost one value CS 561

  41. Performance • Median get latency ~250 ms • Median RTT between hosts ~ 140 ms • But 95th percentile get latency is atrocious • And even median spikes up from time to time CS 561

  42. The Problem: Slow Nodes • Some PlanetLab nodes are just really slow • But set of slow nodes changes over time • Can’t “cherry pick” a set of fast nodes • Seems to be the case on RON as well • May even be true for managed clusters (MapReduce) • Modified OpenDHT to be robust to such slowness • Combination of delay-aware routing and redundancy • Median now 66 ms, 99th percentile is 320 ms • using 2X redundancy CS 561

More Related