1 / 14

CSE6809-Distributed Search Techniques

CSE6809-Distributed Search Techniques. Lecture-2 DHT. Nov 24, 2007. What is a DHT?. Hash Table data structure that maps “keys” to “values” Interface put(key, value) get(key) Distributed Hash Table (DHT) similar, but spread across the Internet challenge: locate content.

galeno
Download Presentation

CSE6809-Distributed Search Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE6809-Distributed Search Techniques Lecture-2 DHT Nov 24, 2007

  2. What is a DHT? • Hash Table • data structure that maps “keys” to “values” • Interface • put(key, value) • get(key) • Distributed Hash Table (DHT) • similar, but spread across the Internet • challenge: locate content

  3. What is a DHT? (cont.) • Single-node hash table: Key = hash (data) put(key, value) get(key)->value • Distributed Hash Table (DHT): Key = hash (data) Lookup (key) -> node-IP@ Put (node-IP@, PUT, key, value) Get(node-IP@, GET, key) -> value • Idea: • Assign particular nodes to hold particular content (or reference to content) • Every node supports a routing function (given a key,route messages to node holding key)

  4. What is a DHT? (cont.) Distributed application value put(key, value) get (key) Distributed hash table lookup(key) node IP address Lookup service …. node node node

  5. (K1,V1) K V K V K V K V K V K V K V K V K V K V K V K V DHT in action put(K1,V1) get (K1)

  6. K V K V K V K V K V K V K V K V K V K V K V Iterative vs. Recursive Routing

  7. Peers vs Infrastructure • Peer: • Application users provide nodes for DHT • Examples: file sharing, etc • Infrastructure: • Set of managed nodes provide DHT service • Perhaps serve many applications

  8. DHT Design Goals • An “overlay” network with: • Decentralization and self-organization, i.e. no central authority, local routing decisions • Flexibilityin mapping keys to physical nodes and routing • Robustness to joining/leaving • Scalability, i.e. low communication overhead • Efficiency, i.e. low latency • A consistent “storage” mechanism with • No guarantees on persistence • Maintenance via soft state

  9. 0 1 .0011 .1010 .1100 .000 .010 .1101 .1011 .011 .0010 .111 Internet .100 The Partitioning Problem SOLUTION:ID Selection Scheme

  10. Lookup Problem Internet SOLUTION:Overlay Routing Network

  11. Partition Problem Lookup Problem Overlay Routing Network ID Selection Scheme P2P-related challenges: Dynamism Scale Design Goals: Equi-sized partitions Replicas for fault tolerance Low add/delete cost Design Goals: Small no of connections Low lookup latency IP-layer network proximity Routing load balance Resilience to network-partitions Low add/delete cost The Big Picture

  12. DHT Applications • global file systems • OceanStore, CFS, PAST, Pastiche, UsenetDHT • naming services • Chord-DNS, Twine, SFR • DB query processing • PIER, Wisc • Internet-scale data structures • PHT, Cone, SkipGraphs • communication services • i3, MCAN, Bayeux • event notification • Scribe, Herald • File sharing • OverNet

  13. Systems We Will Study • Basic DHT techniques • Chord • CAN • Pastry/Tapestry • Kademlia • SkipGraph/SkipNet • DHT-extensions • Squid • pSearch • Twine • i3

  14. Distributed Search Requirements • Decentralization • Efficiency • Scalability • Flexibility • Completeness • Fault-resilience • Load balancing • Others • Autonomy • Anonymity • Ranking of results

More Related