a scalable content addressable network n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Scalable Content-Addressable Network PowerPoint Presentation
Download Presentation
A Scalable Content-Addressable Network

Loading in 2 Seconds...

  share
play fullscreen
1 / 20
hyman

A Scalable Content-Addressable Network - PowerPoint PPT Presentation

106 Views
Download Presentation
A Scalable Content-Addressable Network
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims

  2. Introduction Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems Content-Addressable Networks (CAN) are presented as a scalable, fault-tolerant and completely self-organizing peer-to-peer overlay network Indexing is accomplished with Distributed Hash Table mapping keys to values

  3. Design • Multi-dimensional coordinate space with d dimensions (d-torus) • Each node owns a zone in the space • zone is a section of the hash table • So each node stores a section of the table

  4. Distributed Hash Table • Uniform hash function is used to map key K to point P • Creates table of key value pairs (K, V) • For any point P, the corresponding (K, V) stored at node N that owns the zone that contains point P • Entries are retrieved by using same hash function to map K to P and retrieve entry from node that owns the zone containing P

  5. Routing • Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes • This data makes up the node’s routing table • Greedy algorithm if Pis within the Zone of current node, return(K, V) else forward the query to the neighbor with coordinates closest to P

  6. More Routing • Draw a straight line from point in local zone to P • Follow straight line via neighbors • For d-dimensional space, each node maintains 2d neighbors • Nodes are self-organizing, making decisions dynamically

  7. Node Joining the CAN • New node N1 attempts to locate node N2 already in the CAN, typically using the IP address of a bootstrap node • Generate random point P in the space • Use hash function to locate zone that contains P • Send JOIN message to node N3 that owns zone that contains P • N3 splits its zone in half, assigns half to N1 by sending half of (K, V)pairs to N1, along with neighbor information • N3 informs neighbors of space reallocation

  8. Node departure • Explicit departure – assigns zone and (K, V) pairs to a neighbor node to produce a single zone • Attempt to combine with a neighboring node to form a valid zone, else two zones are temporarily handled by smallest neighbor

  9. Failures • Each node sends periodic update messages to each of its neighbors • Crashed nodes are detected by neighbors by a lack of periodic update messages • Neighbor nodes start takeover timer • Send a takeover message to all of failed node’s neighbors • Neighboring nodes agree on node with smallest volume • Smallest node takes over crashed node’s zone

  10. Design Improvements • Multiple dimensions • Multiple realities • Multiple Hash functions • Overload the coordinate zones • Round trip time (RTT) Ratio • Topologically-sensitive construction (landmarking) • Uniform Partitioning

  11. Multiple Dimensions • Increase number of dimensions • Reduce average path length • Reduce path latency • Increases routing table size due to greater number of neighbors

  12. Multiple Realities • Increase number of Realities • Multiple coordinate spaces exist at the same time, each space is called a reality • Each node assigned a different node in each reality • Shorter paths, higher fault-tolerance • (K, V) mapping to P at (x,y,z) is possibly stored at three different nodes

  13. Dimensions v. Realities • Two improvements with greatest impact • Dimensions have a larger effect on reducing path length • Realities provide stronger fault-tolerance and data availability

  14. Multiple Hash Functions • Multiple hash functions increases data availability, reduces query latency • Improve data availability by mapping a single key to k points in the coordinate space by using k hash functions • (K, V) only unavailable when all nodes crash • Parallel querying of k nodes with k hash functions can reduce lookup latency

  15. Overload Coordinate Zones • Overload the coordinate zones by assigning more than one node to share the same zone • Reduces the average path length, improved fault-tolerance • No additional neighbors

  16. RTT Ratio • Limiting the round-trip-time (RTT) • Each node measures RTT to neighbors • Favor the lower latency paths

  17. Topologically Sensitive Construction • Use physical landmarks for construction • Each node measures RTT of each landmark

  18. Uniform Partitioning • A form of volume balancing • When a JOIN is received by a node, it also checks its neighbor nodes when deciding to accept JOIN • Largest neighbor accepts and splits • Achieves a load balance amongst the nodes

  19. Design Review • Ran two simulations using218 nodes • “bare bones” CAN withoutimprovements • “knobs-on-full” CAN using all features except landmarks and multiple hashes • Biggest gain from number of dimensions (path length 198 to 5)

  20. Questions?