A Scalable Content-Addressable Network

A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims

Introduction Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems Content-Addressable Networks (CAN) are presented as a scalable, fault-tolerant and completely self-organizing peer-to-peer overlay network Indexing is accomplished with Distributed Hash Table mapping keys to values

Design • Multi-dimensional coordinate space with d dimensions (d-torus) • Each node owns a zone in the space • zone is a section of the hash table • So each node stores a section of the table

Distributed Hash Table • Uniform hash function is used to map key K to point P • Creates table of key value pairs (K, V) • For any point P, the corresponding (K, V) stored at node N that owns the zone that contains point P • Entries are retrieved by using same hash function to map K to P and retrieve entry from node that owns the zone containing P

Routing • Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes • This data makes up the node’s routing table • Greedy algorithm if Pis within the Zone of current node, return(K, V) else forward the query to the neighbor with coordinates closest to P

More Routing • Draw a straight line from point in local zone to P • Follow straight line via neighbors • For d-dimensional space, each node maintains 2d neighbors • Nodes are self-organizing, making decisions dynamically

Node Joining the CAN • New node N1 attempts to locate node N2 already in the CAN, typically using the IP address of a bootstrap node • Generate random point P in the space • Use hash function to locate zone that contains P • Send JOIN message to node N3 that owns zone that contains P • N3 splits its zone in half, assigns half to N1 by sending half of (K, V)pairs to N1, along with neighbor information • N3 informs neighbors of space reallocation

Node departure • Explicit departure – assigns zone and (K, V) pairs to a neighbor node to produce a single zone • Attempt to combine with a neighboring node to form a valid zone, else two zones are temporarily handled by smallest neighbor

Failures • Each node sends periodic update messages to each of its neighbors • Crashed nodes are detected by neighbors by a lack of periodic update messages • Neighbor nodes start takeover timer • Send a takeover message to all of failed node’s neighbors • Neighboring nodes agree on node with smallest volume • Smallest node takes over crashed node’s zone

Design Improvements • Multiple dimensions • Multiple realities • Multiple Hash functions • Overload the coordinate zones • Round trip time (RTT) Ratio • Topologically-sensitive construction (landmarking) • Uniform Partitioning

Multiple Dimensions • Increase number of dimensions • Reduce average path length • Reduce path latency • Increases routing table size due to greater number of neighbors

Multiple Realities • Increase number of Realities • Multiple coordinate spaces exist at the same time, each space is called a reality • Each node assigned a different node in each reality • Shorter paths, higher fault-tolerance • (K, V) mapping to P at (x,y,z) is possibly stored at three different nodes

Dimensions v. Realities • Two improvements with greatest impact • Dimensions have a larger effect on reducing path length • Realities provide stronger fault-tolerance and data availability

Multiple Hash Functions • Multiple hash functions increases data availability, reduces query latency • Improve data availability by mapping a single key to k points in the coordinate space by using k hash functions • (K, V) only unavailable when all nodes crash • Parallel querying of k nodes with k hash functions can reduce lookup latency

Overload Coordinate Zones • Overload the coordinate zones by assigning more than one node to share the same zone • Reduces the average path length, improved fault-tolerance • No additional neighbors

RTT Ratio • Limiting the round-trip-time (RTT) • Each node measures RTT to neighbors • Favor the lower latency paths

Topologically Sensitive Construction • Use physical landmarks for construction • Each node measures RTT of each landmark

Uniform Partitioning • A form of volume balancing • When a JOIN is received by a node, it also checks its neighbor nodes when deciding to accept JOIN • Largest neighbor accepts and splits • Achieves a load balance amongst the nodes

Design Review • Ran two simulations using218 nodes • “bare bones” CAN withoutimprovements • “knobs-on-full” CAN using all features except landmarks and multiple hashes • Biggest gain from number of dimensions (path length 198 to 5)

Questions?

A Scalable Content-Addressable Network

A Scalable Content-Addressable Network

Presentation Transcript

CONTENT ADDRESSABLE NETWORK

Content Addressable Memory

Scalable Content-Addressable Networks

A Scalable Content Addressable Network (CAN)

A Scalable Content Addressable Network

A Scalable Content Addressable Network (CAN)

Content Addressable Networks

SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network

Content Addressable Memories

A scalable Content- Addressable Network

Content Addressable Network CAN

A Scalable, Content-Addressable Network

SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network

A Scalable, Content-Addressable Network

A Scalable Content Addressable Network

CONTENT ADDRESSABLE NETWORK

A Scalable Content-Addressable Network

Towards a Scalable, Adaptive and Network-aware Content Distribution Network

A Scalable Content-Addressable Network (CAN)