106 Views

Download Presentation
##### A Scalable Content-Addressable Network

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**A Scalable Content-Addressable Network**Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims**Introduction**Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems Content-Addressable Networks (CAN) are presented as a scalable, fault-tolerant and completely self-organizing peer-to-peer overlay network Indexing is accomplished with Distributed Hash Table mapping keys to values**Design**• Multi-dimensional coordinate space with d dimensions (d-torus) • Each node owns a zone in the space • zone is a section of the hash table • So each node stores a section of the table**Distributed Hash Table**• Uniform hash function is used to map key K to point P • Creates table of key value pairs (K, V) • For any point P, the corresponding (K, V) stored at node N that owns the zone that contains point P • Entries are retrieved by using same hash function to map K to P and retrieve entry from node that owns the zone containing P**Routing**• Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes • This data makes up the node’s routing table • Greedy algorithm if Pis within the Zone of current node, return(K, V) else forward the query to the neighbor with coordinates closest to P**More Routing**• Draw a straight line from point in local zone to P • Follow straight line via neighbors • For d-dimensional space, each node maintains 2d neighbors • Nodes are self-organizing, making decisions dynamically**Node Joining the CAN**• New node N1 attempts to locate node N2 already in the CAN, typically using the IP address of a bootstrap node • Generate random point P in the space • Use hash function to locate zone that contains P • Send JOIN message to node N3 that owns zone that contains P • N3 splits its zone in half, assigns half to N1 by sending half of (K, V)pairs to N1, along with neighbor information • N3 informs neighbors of space reallocation**Node departure**• Explicit departure – assigns zone and (K, V) pairs to a neighbor node to produce a single zone • Attempt to combine with a neighboring node to form a valid zone, else two zones are temporarily handled by smallest neighbor**Failures**• Each node sends periodic update messages to each of its neighbors • Crashed nodes are detected by neighbors by a lack of periodic update messages • Neighbor nodes start takeover timer • Send a takeover message to all of failed node’s neighbors • Neighboring nodes agree on node with smallest volume • Smallest node takes over crashed node’s zone**Design Improvements**• Multiple dimensions • Multiple realities • Multiple Hash functions • Overload the coordinate zones • Round trip time (RTT) Ratio • Topologically-sensitive construction (landmarking) • Uniform Partitioning**Multiple Dimensions**• Increase number of dimensions • Reduce average path length • Reduce path latency • Increases routing table size due to greater number of neighbors**Multiple Realities**• Increase number of Realities • Multiple coordinate spaces exist at the same time, each space is called a reality • Each node assigned a different node in each reality • Shorter paths, higher fault-tolerance • (K, V) mapping to P at (x,y,z) is possibly stored at three different nodes**Dimensions v. Realities**• Two improvements with greatest impact • Dimensions have a larger effect on reducing path length • Realities provide stronger fault-tolerance and data availability**Multiple Hash Functions**• Multiple hash functions increases data availability, reduces query latency • Improve data availability by mapping a single key to k points in the coordinate space by using k hash functions • (K, V) only unavailable when all nodes crash • Parallel querying of k nodes with k hash functions can reduce lookup latency**Overload Coordinate Zones**• Overload the coordinate zones by assigning more than one node to share the same zone • Reduces the average path length, improved fault-tolerance • No additional neighbors**RTT Ratio**• Limiting the round-trip-time (RTT) • Each node measures RTT to neighbors • Favor the lower latency paths**Topologically Sensitive Construction**• Use physical landmarks for construction • Each node measures RTT of each landmark**Uniform Partitioning**• A form of volume balancing • When a JOIN is received by a node, it also checks its neighbor nodes when deciding to accept JOIN • Largest neighbor accepts and splits • Achieves a load balance amongst the nodes**Design Review**• Ran two simulations using218 nodes • “bare bones” CAN withoutimprovements • “knobs-on-full” CAN using all features except landmarks and multiple hashes • Biggest gain from number of dimensions (path length 198 to 5)