1 / 38

Distributed Hash Tables

CPE 401 / 601 Computer Network Systems. Distributed Hash Tables. Modified from Ashwin Bharambe and Robert Morris. P2P Search. How do we find an item in a P2P network? Unstructured: Query-flooding Not guaranteed Designed for common case: popular items Structured

minh
Download Presentation

Distributed Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE 401 / 601Computer Network Systems Distributed Hash Tables Modified from AshwinBharambe and Robert Morris

  2. P2P Search • How do we find an item in a P2P network? • Unstructured: • Query-flooding • Not guaranteed • Designed for common case: popular items • Structured • DHTs – a kind of indexing • Guarantees to find an item • Designed for both popular and rare items

  3. The lookup problem N2 N1 N3 Put (Key=“title” Value=file data…) Internet ? Client Publisher Get(key=“title”) N4 N6 N5 • At the heart of all DHTs

  4. Centralized lookup (Napster) N2 N1 SetLoc(“title”, N4) N3 Client DB N4 Publisher@ Lookup(“title”) Key=“title” Value=file data… N8 N9 N7 N6 Simple, but O(N) state and a single point of failure

  5. Flooded queries (Gnutella) N2 N1 Lookup(“title”) N3 Client N4 Publisher@ Key=“title” Value=file data… N6 N8 N7 N9 Robust, but worst case O(N) messages per lookup

  6. Routed queries (Freenet, Chord, etc.) N2 N1 N3 Client N4 Lookup(“title”) Publisher Key=“title” Value=file data… N6 N8 N7 N9

  7. Hash Table • Name-value pairs (or key-value pairs) • E.g,. “Mehmet Hadi Gunes” and mgunes@unr.edu • E.g., “http://cse.unr.edu/” and the Web page • E.g., “HitSong.mp3” and “12.78.183.2” • Hash table • Data structure that associates keys with values value lookup(key) key value

  8. Distributed Hash Table • Hash table spread over many nodes • Distributed over a wide area • Main design goals • Decentralization • no central coordinator • Scalability • efficient even with large # of nodes • Fault tolerance • tolerate nodes joining/leaving

  9. …. node node node Distributed hash table (DHT) Distributed application data get (key) put(key, data) Distributed hash table lookup(key) node IP address Lookup service • Application may be distributed over many nodes • DHT distributes data storage over many nodes

  10. Distributed Hash Table • Two key design decisions • How do we map names on to nodes? • How do we route a request to that node?

  11. Hash Functions • Hashing • Transform the key into a number • And use the number to index an array • Example hash function • Hash(x) = x mod 101, mapping to 0, 1, …, 100 • Challenges • What if there are more than 101 nodes? Fewer? • Which nodes correspond to each hash value? • What if nodes come and go over time?

  12. Definition of a DHT • Hash table  supports two operations • insert(key, value) • value = lookup(key) • Distributed • Map hash-buckets to nodes • Requirements • Uniform distribution of buckets • Cost of insert and lookup should scale well • Amount of local state (routing table size) should scale well

  13. PlaxtonTrees • Motivation • Access nearby copies of replicated objects • Time-space trade-off • Space = Routing table size • Time = Access hops

  14. 9 A E 4 2 4 7 B Plaxton Trees Algorithm 1. Assign labels to objects and nodes - using randomizing hash functions Object Node Each label is of log2b n digits

  15. 2 4 7 B Plaxton Trees Algorithm 2. Each node knows about other nodes with varying prefix matches 1 2 4 7 B Prefix match of length 0 3 Node 2 3 Prefix match of length 1 2 4 7 B 2 5 2 4 7 A 2 4 6 2 4 7 B 2 4 7 B Prefix match of length 2 2 4 7 C 2 4 8 Prefix match of length 3

  16. 9 A E 4 2 9 9 9 F A A 4 E 1 7 7 2 B 6 0 Plaxton Trees Object Insertion and Lookup Given an object, route successively towards nodes with greater prefix matches Node Object Store the object at each of these locations

  17. 9 A E 4 2 9 9 9 F A A 4 E 1 7 7 2 B 6 0 Plaxton Trees Object Insertion and Lookup Given an object, route successively towards nodes with greater prefix matches Node log(n) steps to insert or locate object Object Store the object at each of these locations

  18. 2 9 9 9 A 4 F A E 7 7 1 B 0 2 6 Plaxton Trees Why is it a tree? Object Object Object Object

  19. Plaxton Trees Network Proximity • Overlay tree hops could be totally unrelated to the underlying network hops Europe USA East Asia • Plaxton trees guarantee constant factor approximation! • Only when the topology is uniform in some sense

  20. Chord • Map nodes and keys to identifiers • Using randomizing hash functions • Arrange them on a circle Identifier Circle succ(x) 010111110 x 010110110 pred(x) 010110000

  21. Consistent Hashing • Construction • Assign each of C hash buckets to random points on mod 2n circle; hash key size = n • Map object to random position on circle • Hash of object = closest clockwise bucket 0 14 Bucket 12 4 8 • Desired features • Balanced: No bucket responsible for large number of objects • Smoothness: Addition of bucket does not cause major movement among existing buckets • Spread and load: Small set of buckets that lie near object • Similar to that later used in P2P Distributed Hash Tables (DHTs) • In DHTs, each node only has partial view of neighbors

  22. Consistent Hashing • Large, sparse identifier space (e.g., 128 bits) • Hash a set of keys x uniformly to large id space • Hash nodes to the id space as well 2128-1 0 1 Id space represented as a ring Hash(name)  object_id Hash(IP_address)  node_id

  23. Where to Store (Key, Value) Pair? • Mapping keys in a load-balanced way • Store the key at one or more nodes • Nodes with identifiers “close” to the key • where distance is measured in the id space • Advantages • Even distribution • Few changes as nodes come and go… Hash(name)  object_id Hash(IP_address)  node_id

  24. Hash a Key to Successor Key ID Node ID N10 K5, K10 K100 N100 Circular ID Space N32 K11, K30 K65, K70 N80 N60 K33, K40, K52 • Successor: node with next highest ID

  25. Joins and Leaves of Nodes • Maintain a circularly linked list around the ring • Every node has a predecessor and successor pred node succ

  26. Successor Lists Ensure Robust Lookup 10, 20, 32 N5 20, 32, 40 N10 5, 10, 20 N110 32, 40, 60 N20 110, 5, 10 N99 40, 60, 80 N32 N40 60, 80, 99 99, 110, 5 N80 N60 80, 99, 110 • Each node remembers r successors • Lookup can skip over dead nodes to find blocks

  27. Joins and Leaves of Nodes • When an existing node leaves • Node copies its <key, value> pairs to its predecessor • Predecessor points to node’s successor in the ring • When a node joins • Node does a lookup on its own id • And learns the node responsible for that id • This node becomes the new node’s successor • And the node can learn that node’s predecessor • which will become the new node’s predecessor

  28. Nodes Coming and Going • Small changes when nodes come and go • Only affects mapping of keys mapped to the node that comes or goes Hash(name)  object_id Hash(IP_address)  node_id

  29. How to Find the Nearest Node? • Need to find the closest node • To determine who should store (key, value) pair • To direct a future lookup(key) query to the node • Strawman solution: walk through linked list • Circular linked list of nodes in the ring • O(n) lookup time when n nodes in the ring • Alternative solution: • Jump further around ring • “Finger” table of additional overlay links

  30. Links in the Overlay Topology • Trade-off between # of hops vs. # of neighbors • E.g., log(n) for both, where n is the number of nodes • E.g., such as overlay links 1/2, 1/4 1/8, … around the ring • Each hop traverses at least half of the remaining distance 1/2 1/4 1/8

  31. Chord “Finger Table” Accelerates Lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

  32. Chord lookups take O(log N) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60

  33. Chord Self-organization • Node join • Set up finger i: route to succ(n + 2i) • log(n) fingers ) O(log2 n) cost • Node leave • Maintain successor list for ring connectivity • Update successor list and finger pointers

  34. Chord lookup algorithm properties • Interface: lookup(key)  IP address • Efficient: O(log N) messages per lookup • N is the total number of servers • Scalable: O(log N) state per node • Robust: survives massive failures • Simple to analyze

  35. Comparison of Different DHTs

  36. What can DHTs do for us? • Distributed object lookup • Based on object ID • De-centralized file systems • CFS, PAST, Ivy • Application Layer Multicast • Scribe, Bayeux, Splitstream • Databases • PIER

  37. Where are we now? • Many DHTs offering efficient and relatively robust routing • Unanswered questions • Node heterogeneity • Network-efficient overlays vs. Structured overlays • Conflict of interest! • What happens with high user churn rate? • Security

  38. Are DHTs a panacea? • Useful primitive • Tension between network efficient construction and uniform key-value distribution • Does every non-distributed application use only hash tables? • Many rich data structures which cannot be built on top of hash tables alone • Exact match lookups are not enough • Does any P2P file-sharing system use a DHT?

More Related