Chord: A scalable peer-to-peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan Kathleen Ting 23 November 2004
The Chord project aims to build scalable, robust, distributed systems using peer-to-peer ideas. • basis is Chord distributed hash lookup primitive • Chord is completely decentralized and symmetric, and can find data using only log(N) messages, where N is the number of nodes in the system. • Chord's lookup mechanism is provably robust in the face of frequent node failures and re-joins.
Goal: Better Peer-to-Peer Storage • Lookup is the key problem • Lookup is not easy: • GNUtella scales badly • Freenet is imprecise • Chord lookup provides: • Good naming semantics and efficiency • Elegant base for layered features
Chord Architecture • Interface: • lookup(DocumentID) NodeID, IP-Address • Chord consists of • Consistent Hashing • Small routing tables: log(n) • Fast join/leave protocol
Chord Uses log(N) “Fingers” N80 knows of only seven other nodes. (0) ½ ¼ Circular 7-bit ID space 1/8 1/16 1/32 1/64 1/128 N80
Contributions from the Chord paper • Protocol that solves the lookup problem • Addition and deletion of Chord server nodes • Insert, update, and lookup of unstructured key/value pairs • Simple system that uses it for storing information • Evaluation of Chord protocol and system • Theoretical proofs • Simulation results based on 10,000 nodes • Measurement of actual implementation of Chord system
Chord protocol supports just one operation • Determine the node in a distributed system that stores the value for a given key • Chord protocol uses a variant of consistent hashing to assign keys to Chord server nodes • Load tends to be balanced • When Nth node joins (or leaves) network, only O(1/N) fraction of key/value pairs are moved to different location • Previous research on consistent hashingimpractical to scale because nodes know about every other node in network • Chordeach node only maintains information about O(log N) nodes, resolves lookups using only O(log N) messages, updates require only O(log2N) messages when node joins or leaves
Benefits of Chord • Decentralized • Automatically adapts when hosts leave and join • Scalable • Guarantees that queries make a logarithmic number of hops and that keys are well balanced • Uses heuristic to achieve network proximity • Doesn’t require the availability of geographic-location information • Prevents single points of failure
Chord vs. DNS • Similarities • Map names to values • Differences • No special root servers • No restrictions on the format and meaning of names, as Chord names are just the key/value pairs • No attempt to solve administrative problems
Chord vs. Freenet • Similarities • Decentralized, symmetric, and automatically adapts when hosts leave and join • Differences • Queries always result in success or definitive failure • Scalable • Cost of inserting and retrieving values, cost of adding and removing hosts grows slowly with the total number of hosts and key/value pairs
Chord system • Implemented as an application-layer overlay network of Chord server nodes • Each node maintains a subset of the key/value pairs as well as routing table entries that point to a subset of carefully chosen Chord servers • Chord clients may, but don’t have to, run on the same hosts as Chord server nodes
Chord system design goals • Scalability • Availability • Load-balanced operation • Dynamism • Updatability • Locating according to proximity
Scalable key location • Each node stores information about only a small number of other nodes • Amount of information maintained about other nodes falls off exponentially with the distance in key-space between the two nodes • Finger table of a node may not contain enough information to determine the successor of an arbitrary key k
What happens when a node n doesn’t know the successor of a key k? Theorem: With high probability, the number of nodes that must be contacted to resolve a successor query in an N-node network is O(log N). Each recursive call to find the successor halves the distance to the target identifier.
Node joins and departures • In a dynamic network, nodes can join and leave at any time. • Preserve ability to locate every node in the network • Each node’s finger table is correctly filled • Each key k is stored at node successor (k) • If a node is not the immediate predecessor of the key, then its finger table will hold a node closer to the key to which the query will be forwarded, until the key’s successor node is reached. • Theorem: With high probability, any node joining or leaving an N-node Chord network will use O(log2N) messages to re-establish the Chord routing invariants. • Predecessor pointer
Chord lookup cost is O(log N) Constant is ½ Average Messages per Lookup Number of Nodes
Chord properties • As long as every node knows its immediate predecessor and successor, no lookup will stall anywhere except at the node responsible for a key. • Any other node will know of at least one node (its successor) that is closer to the key than itself, and will forward the query to that closer node.
Chord properties • Log(n) lookup messages and table space. • Well-defined location for each ID. • No search required. • Natural load balance. • No name structure imposed. • Minimal join/leave disruption. • Does not store documents…
Conclusion • Intended to be used by decentralized, large-scale distributed applications • Because many distributed applications need to determine the node that stores a data item • Given a key, Chord will determine the node responsible for storing the key’s value • Maintains routing information for O(log N) nodes • Resolves lookups with O(log N) messages • Updates with O(log2N) messages
Conclusion • Chord provides distributed lookup • Efficient, low-impact join and leave • Flat key space allows flexible extensions • Good foundation for peer-to-peer systems http://www.pdos.lcs.mit.edu/chord