1 / 45

Overlays and DHTs

Overlays and DHTs. Presented by Dong Wang and Farhana Ashraf. Schedule. Review of Overlay and DHT RON Pastry Kelips. Review on Overlay and DHT. Overlay Network build on top of another network Nodes connected to each other by logical/virtual links

heath
Download Presentation

Overlays and DHTs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overlays and DHTs Presented by Dong Wang and Farhana Ashraf

  2. Schedule • Review of Overlay and DHT • RON • Pastry • Kelips

  3. Review on Overlay and DHT • Overlay • Network build on top of another network • Nodes connected to each other by logical/virtual links • Improve Internet Routing and Easy to deploy • P2P(Gnutella, Chord…), RON • DHT • Allows you to do lookup, insert, delete objects with keys in a distributed settings • Performance Concerns • Load Balancing • Fault Tolerance • Efficiency of lookups and inserts • Locality • CAN, Chord, Pasrty, Tapestry are all DHTs

  4. Resilient Overlay Network David G. Andersen, etc. MIT SOSP 2001 Acknowledged: http://nms.csail.mit.edu/ron/ Previous CS525 Courses

  5. RON-Resilient Overlay Network • Motivation • Goal • Design • Evaluation • Discussion

  6. RON-Motivation • Current Internet Backbone NOT be able to • Detect failed path and recover quickly • BGP takes several mins to recover from faults • Detect flooding and congestion effectively • Leverage redundant path efficiently • Express fine-grained policy/metrics

  7. RON-Basic Idea B D A C • Inequality of Triangles does not usually hold for Internet! • -Eg. Latency-It is possible that: AB+BC<AC • RON makes use of underlying path redundancy of Internet to provide better path and route around failure • RON isend to endsolution, packets are simply wrapped around and sent normally

  8. RON-Main Goals • Fast Failure Detection and Recovery • Average detect and recover delay<20s • Tighter integration of routing/path selection with the application • Application can specify metrics to affect routing • Expressive Policy Routing • Fine-grained and aim at users and hosts

  9. RON-Design • Overlay • Old idea in networks • Easily deployed and let Internet focus on scalability • Only keep functionality between active peers • Approach • Aggressively probe all inter-RON node paths • Exchange routing information • Route along best path (from end to end view) consistent with routing policy

  10. RON-Design: Architecture • Probe between nodes, detect path quality • Store path qualities at Performance Database • Link-state routing protocol among nodes • Data are handled by application-specific conduit and forwarded in UDP

  11. RON-Design: Routing and Lookup • Policy routing • Classify by policy • Generate table per policy • Metric optimization • Application tags the packet with its specific metric • Generate table per metric • Multi-level routing table and 3 stage lookup • Policy->Metric->Next hop

  12. RON Design-Probing and Outage Detect • Probe every PROBE_INTERVAL (12s) • With 3 packets, both participants get an RTT and reachability without syn. Clocks • If probe is lost, send next immediately, up to 3 more probes (PROBE_TIMEOUT 3s) • Notify outage after 4 consecutive probe loses • Outage detection time on average=19 s

  13. RON Design-Policy Routing • Allow user to define types of traffic allowed on particular network links • Place policy tags on packets and build up policy based routing table • Two policy mechanisms • exclusive cliques • general policies

  14. RON-Evaluation • Two main dataset from Internet deployment • RON1-N=12 nodes, 132 distinct paths, traverse 36 AS and 74 Inter-AS paths • RON2-N=16 nodes, 240 distinct paths, traverse 50 AS and 118 Inter-AS paths • Policy-prohibit sending traffic to or from commercial sites over the Internet2

  15. RON Evaluation-Major Results • Increase the resilience of the overlay network • RON takes ~10s to route around failure • Compared to BGP’s several minutes • Many Internet outage are avoidable • Improve performance –Loss rate, Latency, TCP Throughput • Single-hop indirect routing works well • Overhead is reasonable and acceptable

  16. RON vs Internet 30 minute loss rates • For RON1-able to route around all outages; • For RON2-about 60% outages are overcome

  17. Performance-Loss rate

  18. Performance-Latency

  19. Performance-TCP Throughput Performance Improvement: RON employs App-specific metric optimization to select path

  20. Scalability

  21. Conclusions • RON improves network reliability and performance • Overlay approach is attractive for resiliency: development, fewer nodes, simple substrate • Single-hop indirection in RON works well • RON also introduces more probing and updating traffic into network

  22. RON-Discussion • Aggressiveness • RON never back-off as TCP does • Can it coexist with current traffic on Internet? • What happens if everyone starts to use RON? • Is it possible to modify RON to achieve good behavior in a global sense • Scalability • Trade scalability for improved reliability • Many RONS coexisting in the Internet • Hierarchical structure of RON network

  23. Distributed Hash Table

  24. Problem • Route a msg with key, K to the node, Z which has a ID closest to key K • Not scalable if routing table contains all the nodes • Tradeoffs • Memory per node • Lookup latency • Number of messages d471f1 d467c4 d462ba X=d46a1c d4213f d13da3 Route(d46a1c) A = 65a1fc

  25. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron Peter Druschel Middleware 2001 Acknowledged: Previous CS525 Courses

  26. Motivation • Node IDs are assigned randomly • With high probability nodes with adjacent IDs are diverse • Considers network locality • Seeks to minimize distance messages travel • Scalar proximity metric • #IP routing hops • RTT

  27. Pastry: Node Soft State Storage requirement in each node = O(log N) Immediate Neighbors in ID space (Used for routing) Similar to successor and predecessor Used for routing Similar to finger table entries Nodes closest according to locality (Used to update routing table)

  28. Pastry: Node Soft State • Leaf Set • Contains L nodes, closest in the ID space • Neighborhood Set • Contains M nodes, closest according to proximity metric • Routing Table • Entries of row n, shares exactly the first n digits with the local node • Nodes are chosen according to proximity metric

  29. Pastry: Routing • Case I • Key within leaf set • Route to the node in the leaf set with ID closest to key • Case II (Prefix Routing) • Key not within leaf set • Route a node in the routing table, such that the new node shares one more digit with the key than the local node • Case III • Key not within leaf set • Case II not possible • Route to a node which shares at least same number of digits with the key, but is closer to the key than the local node

  30. Routing Example • Cuts the ID space into 1/(2^b) • Number of hops needed is log2^bN d471f1 d467c4 d462ba d46a1c d4213f d13da3 lookup(d46a1c) 65a1fc

  31. Self Organization: Node Join d471f1 Z=d467c4 d462ba X=d46a1c d4213f New node: X=d46a1c A is X’s neighbor Route(d46a1c) d13da3 A = 65a1fc

  32. Leaf set (X) = leaf set (Z) Neighborhood set (A) = neighborhood set (X) Routing Table Row zero of X = row zero of A Row one of X = row one of B Pastry: Node State Initialization New node: X=d46a1c d471f1 Z=d467c4 d462ba X=d46a1c d4213f Route(d46a1c) B= d13da3 A = 65a1fc

  33. Pastry: Node State Update • X informs any nodes that need to be aware of its arrival • X also improves its table locality by requesting neighborhood sets from all nodes X knows • In practice: optimistic approach

  34. Pastry: Node departure (failure) • Leaf set repair (eager – all the time): • Leaf set members exchange keep-alive messages • Request set from furthest live node in set • Routing table repair (lazy – upon failure): • Get table from peers in the same row, if not found – from higher rows • Neighborhood set repair (eager)

  35. Routing Performance Average no. of hops = log(N) Pastry uses locality information

  36. Kelips: Building an Efficient and Stable P2P DHT through Increased Memory and Background Overhead Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, and Robert van Renesse IPTPS 2003 Acknowledged: Previous CS525 Courses

  37. Motivation • For n=1000000 and 20 byte per entry • Storage requirement at a Pastry node = 120 byte • Not using memory efficiently • How can we achieve O(1) lookup latency? • Increase memory usage per node

  38. Design • Consists of k virtual affinity groups • Each node member of an affinity group • Soft State • Affinity Group View • (Partial) set of other nodes lying in the same affinity group • Contacts • (constant sized) set of nodes lying in each of the foreign affinity groups • Filetuples • (partial) set of filename and host IP address (homenode), where homenode lies in the same affinity group • Contains RTT, heartbeat count for each of the entries

  39. affinity group view 15 id hbeat rttime … 76 18 23ms 1890 167 2067 67ms fname homenode 18 [18, 167, … ] p2p.txt contacts 160 group contactnodes … [129, 30, … ] 129 0 167 [15, 160, …] 1 filetuple 30 … # 1 Affinity Group # 0 # k-1 Kelips: Node Soft State soft state

  40. Storage Requirement @ Kelips Node • S(k,n) = n/k + c * (k-1) + F/k entries • Minimized at k = √( (n+F) / c ) • Assume F is proportional to n, and c fixed • Optimal k = O(sqrt(n)) • For n=1000000, and F = 10 million • Total memory requirement < 20 MB Affinity groups Contacts Filetuples

  41. Algorithm: Lookup • Lookup (key D at node A) • Affinity group G of homenode of D = hash(D) • A sends message to closest node X in the contact set for affinity group G • X finds homenode of D from filetuple set • O(1) lookup

  42. Maintaining Soft State • Heartbeat mechanism • Soft state entries refreshed periodically within and across groups • Each node periodically selects a few nodes as gossip targets and sends them partial soft state information • Uses constant gossip message size • O(log n) complexity for gossiping

  43. Load Balance N = 1500 with 38 affinity groups 1000 nodes with 30 affinity groups

  44. Discussion Points • What happens when triangular inequality does not hold for a proximity metric? • What happens for high churn rate in Pastry and Kelips? • What was the intuition behind affinity groups? • Can we use Kelips in a Internet scale network?

  45. Conclusion • A DHT tradeoff: • Storage requirement • Lookup Latency • Going one step further • One hop lookups for p2p overlays • http://www.usenix.org/events/hotos03/tech/talks/gupta_talk.pdf

More Related