1 / 43

CS 268: Lecture 22 DHT Applications

CS 268: Lecture 22 DHT Applications. Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776. (Presentation based on slides from Robert Morris and Sean Rhea). Outline.

quyn-gray
Download Presentation

CS 268: Lecture 22 DHT Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 (Presentation based on slides from Robert Morris and Sean Rhea)

  2. Outline • Cooperative File System (CFS) • Open DHT

  3. Target CFS Uses • Serving data with inexpensive hosts: • open-source distributions • off-site backups • tech report archive • efficient sharing of music node node node Internet node node

  4. How to mirror open-source distributions? • Multiple independent distributions • Each has high peak load, low average • Individual servers are wasteful • Solution: aggregate • Option 1: single powerful server • Option 2: distributed service • But how do you find the data?

  5. Design Challenges • Avoid hot spots • Spread storage burden evenly • Tolerate unreliable participants • Fetch speed comparable to whole-file TCP • Avoid O(#participants) algorithms • Centralized mechanisms [Napster], broadcasts [Gnutella] • CFS solves these challenges

  6. CFS Architecture • Each node is a client and a server • Clients can support different interfaces • File system interface • Music key-word search server client client server Internet node node

  7. Client-server interface • Files have unique names • Files are read-only (single writer, many readers) • Publishers split files into blocks • Clients check files for authenticity Insert file f Insert block FS Client server server Lookup block Lookup file f node node

  8. Server Structure DHash DHash Chord Chord Node 1 Node 2 • DHash stores, balances, replicates, caches blocks • DHash uses Chord [SIGCOMM 2001] to locate blocks

  9. Chord Hashes a Block ID to its Successor Block ID Node ID N10 B112, B120, …, B10 B100 N100 Circular ID Space N32 B11, B30 B65, B70 N80 N60 B33, B40, B52 • Nodes and blocks have randomly distributed IDs • Successor: node with next highest ID

  10. DHash/Chord Interface • lookup() returns list with node IDs closer in ID space to block ID • Sorted, closest first Lookup(blockID) List of <node-ID, IP address> DHash server Chord finger table with <node IDs, IP address>

  11. DHash Uses Other Nodes to Locate Blocks N5 N10 N110 N20 N99 1. 2. N40 3. N50 N80 N60 N68 Lookup(BlockID=45)

  12. Storing Blocks • Long-term blocks are stored for a fixed time • Publishers need to refresh periodically • Cache uses LRU cache Long-term block storage disk:

  13. Replicate blocks at r successors N5 N10 N110 N20 N99 Block 17 N40 N50 N80 N68 N60 • Node IDs are SHA-1 of IP Address • Ensures independent replica failure

  14. Lookups find replicas N5 N10 N110 2. N20 1. 3. N99 Block 17 N40 4. RPCs: 1. Lookup step 2. Get successor list 3. Failed block fetch 4. Block fetch N50 N80 N68 N60 Lookup(BlockID=17)

  15. First Live Successor Manages Replicas N5 N10 N110 N20 N99 Copy of 17 Block 17 N40 N50 N80 N68 N60 • Node can locally determine that it is the first live successor

  16. DHash Copies to Caches Along Lookup Path N5 N10 N110 1. N20 N99 2. N40 4. RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache N50 N80 3. N60 N68 Lookup(BlockID=45)

  17. Caching at Fingers Limits Load N32 • Only O(log N) nodes have fingers pointing to N32 • This limits the single-block load on N32

  18. Virtual Nodes Allow Heterogeneity • Hosts may differ in disk/net capacity • Hosts may advertise multiple IDs • Chosen as SHA-1(IP Address, index) • Each ID represents a “virtual node” • Host load proportional to # v.n.’s • Manually controlled N10 N60 N101 N5 Node B Node A

  19. Why Blocks Instead of Files? • Cost: one lookup per block • Can tailor cost by choosing good block size • Benefit: load balance is simple • For large files • Storage cost of large files is spread out • Popular files are served in parallel

  20. Outline • Cooperative File System (CFS) • Open DHT

  21. Questions: • How many DHTs will there be? • Can all applications share one DHT?

  22. Benefits of Sharing a DHT • Amortizes costs across applications • Maintenance bandwidth, connection state, etc. • Facilitates “bootstrapping” of new applications • Working infrastructure already in place • Allows for statistical multiplexing of resources • Takes advantage of spare storage and bandwidth • Facilitates upgrading existing applications • “Share” DHT between application versions

  23. K V K V K V K V K V K V K V K V K V K V The DHT as a Service

  24. K V K V K V K V K V K V K V K V K V K V The DHT as a Service OpenDHT

  25. The DHT as a Service OpenDHT Clients

  26. The DHT as a Service OpenDHT

  27. The DHT as a Service What is this interface? OpenDHT

  28. It’s not lookup() lookup(k) Challenges: Distribution Security What does this node do with it? k

  29. How are DHTs Used? • Storage • CFS, UsenetDHT, PKI, etc. • Rendezvous • Simple: Chat, Instant Messenger • Load balanced: i3 • Multicast: RSS Aggregation, White Board • Anycast: Tapestry, Coral

  30. What about put/get? • Works easily for storage applications • Easy to share • No upcalls, so no code distribution or security complications • But does it work for rendezvous? • Chat? Sure: put(my-name, my-IP) • What about the others?

  31. Protecting Against Overuse • Must protect system resources against overuse • Resources include network, CPU, and disk • Network and CPU straightforward • Disk harder: usage persists long after requests • Hard to distinguish malice from eager usage • Don’t want to hurt eager users if utilization low • Number of active users changes over time • Quotas are inappropriate

  32. Fair Storage Allocation • Our solution: give each client a fair share • Will define “fairness” in a few slides • Limits strength of malicious clients • Only as powerful as they are numerous • Protect storage on each DHT node separately • Must protect each subrange of the key space • Rewards clients that balance their key choices

  33. Client 1 arrives fills 50% of disk Client 2 arrives fills 40% of disk Client 3 arrives max share = 10% time The Problem of Starvation • Fair shares change over time • Decrease as system load increases Starvation!

  34. Preventing Starvation • Simple fix: add time-to-live (TTL) to puts • put (key, value)  put (key, value, ttl) • Prevents long-term starvation • Eventually all puts will expire

  35. Preventing Starvation • Simple fix: add time-to-live (TTL) to puts • put (key, value)  put (key, value, ttl) • Prevents long-term starvation • Eventually all puts will expire • Can still get short term starvation Client A arrives fills entire of disk Client B arrives asks for space Client A’s values start expiring time B Starves

  36. max Sum must be < max capacity Reserved for future puts. Slope = rmin TTL space size Candidate put 0 time now max Preventing Starvation • Stronger condition: Be able to accept rmin bytes/sec new data at all times • This is non-trivial to arrange!

  37. max max TTL TTL size space space size 0 0 time time now now max max Preventing Starvation • Stronger condition: Be able to accept rmin bytes/sec new data at all times • This is non-trivial to arrange! Violation!

  38. Preventing Starvation • Formalize graphical intuition: f() = B(tnow) - D(tnow, tnow+ ) + rmin  • D(tnow, tnow+ ): aggregate size of puts expiring in the interval (tnow, tnow+ ) • To accept put of size x and TTL l: f() + x < C for all 0 ≤  < l • Can track the value of f efficiently with a tree • Leaves represent inflection points of f • Add put, shift time are O(log n), n = # of puts

  39. Queue full: reject put Per-client put queues Wait until can accept without violating rmin Select most under- represented Not full: enqueue put The Big Decision: Definition of “most under-represented” Fair Storage Allocation Store and send accept message to client

  40. Client A arrives fills entire of disk Client B arrives asks for space B catches up with A Now A Starves! time Defining “Most Under-Represented” • Not just sharing disk, but disk over time • 1 byte put for 100s same as 100 byte put for 1s • So units are bytes  seconds, call them commitments • Equalize total commitments granted? • No: leads to starvation • A fills disk, B starts putting, A starves up to max TTL

  41. Client A arrives fills entire of disk Client B arrives asks for space B catches up with A time A & B share available rate Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time”

  42. Defining “Most Under-Represented” • Instead, equalize rate of commitments granted • Service granted to one client depends only on others putting “at same time” • Mechanism inspired by Start-time Fair Queuing • Have virtual time, v(t) • Each put gets a start time S(pci) and finish time F(pci) F(pci) = S(pci) + size(pci)  ttl(pci) S(pci) = max(v(A(pci)) - , F(pci-1)) v(t) = maximum start time of all accepted puts

  43. FST Performance

More Related