Principles of reliable distributed systems lecture 2 distributed hash tables dht chord
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord. Spring 2008 Idit Keidar. Today’s Material. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Stoica et al. Reminder: Peer-to-Peer Lookup. Insert (key, file) Lookup (key)

Download Presentation

Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Principles of reliable distributed systems lecture 2 distributed hash tables dht chord

Principles of Reliable Distributed SystemsLecture 2: Distributed Hash Tables (DHT), Chord

Spring 2008

Idit Keidar


Today s material

Today’s Material

  • Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

    • Stoica et al.


Reminder peer to peer lookup

Reminder: Peer-to-Peer Lookup

  • Insert (key, file)

  • Lookup (key)

    • Should find keys inserted in any node


Reminder overlay networks

Reminder: Overlay Networks

  • A virtual structure imposed over the physical network (e.g., the Internet)

    • over the Internet, there is a (IP level) link between every pair of nodes

    • an overlay uses a fixed subset of these

  • Why restrict to a subset?


Routing lookup in overlays

Routing/Lookup in Overlays

  • How does one route a packet to its destination in an overlay?

  • How about lookup (key)?

  • Unstructuredoverlay: (last week)

    • Flooding or random walks

  • Structuredoverlay: (today)

    • The links are chosen according to some rule

    • Tables define next-hop for routing and lookup


Structured lookup overlays

Structured Lookup Overlays

  • Many academic systems –

    • CAN, Chord , D2B, Kademlia, Koorde, Pastry, Tapestry, Viceroy, …

  • OverNet based on the Kademlia algorithm

  • Symmetric, no hierarchy

  • Decentralized self management

  • Structured overlay – data stored in a defined place, search goes on a defined path

  • Implement Distributed Hash Table (DHT) abstraction


Reminder hashing

Reminder: Hashing

  • Data structure supporting the operations:

    • void insert( key, item )

    • item search( key )

  • Implementation uses hash function for mapping keys to array cells

  • Expected search time O(1)

    • provided that there are few collisions


Distributed hash tables dhts

Distributed Hash Tables (DHTs)

  • Nodes store table entries

    • The role of array cells

  • Good abstraction for lookup?

    • Why?


The dht service interface

The DHT Service Interface

lookup( key )

returns the location of the node currently responsible for this key

key is usually numeric (in some range)


Using the dht interface

Using the DHT Interface

  • How do you publish a file?

  • How do you find a file?

  • Requirements for an application being able to use DHTs?

    • Data identified with unique keys

    • Nodes can (agree to) store keys for each other

      • location of object (pointer) or actual object (data)


What does a dht implementation need to do

What Does a DHT Implementation Need to Do?

  • Map keys to nodes

    • Needs to be dynamic as nodes join and leave

    • How does this affect the service interface?

  • Route a request to the appropriate node

    • Routing on the overlay


Lookup example

(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

lookup(K1)

Lookup Example

insert(K1,V1)


Mapping keys to nodes

Mapping Keys to Nodes

  • Goal: load balancing

    • Why?

  • Typical approach:

    • Give an m-bit id to each node and each key (e.g., using SHA-1 on the key, IP address)

    • Map key to node whose id is “close” to the key (need distance function)

    • How is load balancing achieved?


Routing issues

Routing Issues

  • Each node must be able to forward each lookup query to a node closer to the destination

  • Maintain routing tables adaptively

    • Each node knows some other nodes

    • Must adapt to changes (joins, leaves, failures)

    • Goals?


Handling join leave

Handling Join/Leave

  • When a node joins it needs to assume responsibility for some keys

    • Ask the application to move these keys to it

    • How many keys will need to be moved?

  • When a nodes fails or leaves, its keys have to be moved to others

    • What else is needed in order to implement this?


P2p system interface

P2P System Interface

  • Lookup

  • Join

  • Move keys


Chord

Chord

Stoica, Morris, Karger, Kaashoek, and Balakrishnan


Chord logical structure

Chord Logical Structure

  • m-bit ID space (2m IDs), usually m=160.

  • Think of nodes as organized in a logical ring according to their IDs.

N1

N56

N51

N8

N10

N48

N14

N42

N21

N38

N30


Consistent hashing assigning keys to nodes

Consistent Hashing: Assigning Keys to Nodes

  • Key k is assigned to first node whose ID equals or follows k – successor(k)

K54

N1

N56

N51

N8

N10

N48

N14

N42

N21

N38

N30


Moving keys upon join leave

Moving Keys upon Join/Leave

  • When a node joins, it becomes responsible for some keys previously assigned to its successor

    • Local change

    • Assuming load is balanced, how many keys should move?

  • And what happens when a node leaves?


Consistent hashing guarantees

Consistent Hashing Guarantees

  • For any set of N nodes and K keys, w.h.p.:

    • Each node is responsible for at most (1 + )K/N keys

    • When an (N + 1)st node joins or leaves, responsibility for O(K/N) keys changes hands (only to or from the joining or leaving node)

  • For the scheme described above,  = O(logN)

  •  can be reduced to an arbitrarily small constant by having each node run (logN) virtual nodes, each with its own identifier


Simple routing solutions

Simple Routing Solutions

  • Each node knows only its successor

    • Routing around the circle

    • Good idea?

  • Each node knows all other nodes

    • O(1) routing

    • Cost?


Chord skiplist routing

Chord Skiplist Routing

  • Each node has “fingers” to nodes ½ way around the ID space from it, ¼ the way…

  • finger[i] at n contains successor(n+2i-1)

  • successor is finger[1]

N0

N56

N51

N8

N10

How many entries in the finger table?

N48

N14

N42

N21

N38

N30


Example chord fingers

Example: Chord Fingers

N0

N10

finger[1..4]

N114

N21

finger[5]

m entries

log N distinct fingers

with high probability

N30

N90

finger[7]

finger[6]

N47

N82

N72


Chord data structures at each node

Chord Data Structures (At Each Node)

  • Finger table

  • First finger is successor

  • Predecessor


Forwarding queries

Forwarding Queries

  • Query for key k is forwarded to finger with highest ID not exceeding k

Lookup( K54 )

K54

N0

N56

N51

N8

N10

N48

N14

N42

N21

N38

N30


Principles of reliable distributed systems lecture 2 distributed hash tables dht chord

Remote Procedure

Call (RPC)

How long does it take?


Routing time

Routing Time

  • Node n looks up a key stored at node p

  • p is in n’s ith interval:

    p  ((n+2i-1)mod 2m, (n+2i)mod 2m]

  • n contacts f=finger[i]

    • The interval is not empty (because p is in it) so:f  ((n+2i-1)mod 2m, (n+2i)mod 2m]

    • RPC f

  • f is at least 2i-1 away from n

  • p is at most 2i-1 away from f

  • The distance is halved: maximum m steps


Routing time refined

Routing Time Refined

  • Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step:

    • Expected number of steps: log N

  • Note that:

    • m = 160

    • For 1,000,000 nodes, log N = 20


What about network distance

What About Network Distance?

Haifa

K54

Lookup( K54 )

N0

N56

China

N51

N8

N10

N48

N14

N42

Texas

N21

N38

N30


Joining chord

Joining Chord

  • Goals?

  • Required steps:

    • Find your successor

    • Initialize finger table and predecessor

    • Notify other nodes that need to change their finger table and predecessor pointer

      • O(log2N)

    • Learn the keys that you are responsible for; notify others that you assume control over them


Join algorithm take ii

Join Algorithm: Take II

  • Observation: for correctness, successors suffice

    • Fingers only needed for performance

  • Upon join, update successor only

  • Periodically,

    • Check that successors and predecessors are consistent

    • Fix fingers


Creation and join

Creation and Join


Join example

Join Example

joiner finds successor

stabilize

fixes

successor

gets

keys

stabilize

fixes

predecessor


Join stabilization guarantee

Join Stabilization Guarantee

  • If any sequence of join operations is executed interleaved with stabilizations,

    • Then at some time after the last join

    • The successor pointers form a cycle on all the nodes in the network

  • Model assumptions?


Performance with concurrent joins

Performance with Concurrent Joins

  • Assume a stable network with N nodes with correct finger pointers

  • Now, another set of up to N nodes joins the network,

    • And all successor pointers (but perhaps not all finger pointers) are correct,

  • Then lookups still take O(logN) time w.h.p.

  • Model assumptions?


Failure handling

Failure Handling

  • Periodically fixing fingers

  • List of r successors instead of one successor

  • Periodically probing predecessors:


Failure detection

Failure Detection

  • Each node has a local failure detectormodule

  • Uses periodic probes and timeouts to check liveness of successors and fingers

    • If the probed node does not respond by a designated timeout, it is suspected to be faulty

  • A node that suspects its successor (finger) finds a new successor (finger)

  • False suspicion- the suspected node is not faulty

    • Suspected due to communication problems


The model

The Model?

  • Reliable messages among correct nodes

    • No network partitions

  • Node failures can be accurately detected!

    • No false suspicions

  • Properties hold as long as failure is bounded:

    • Assume a list of r = (logN) successors

    • Start from stable state and then each node fails with prob. 1/2

    • Then w.h.p. find successor returns the closest living successor to the query key

    • And the expected time to execute find successor is O(logN)


What can partitions do

What Can Partitions Do?

Suspect successor

N0

N56

N51

N8

N10

N48

N14

N42

N21

N38

N30

Suspect successor

Suspect successor


What about moving keys

What About Moving Keys?

  • Left up to the application

  • Solution: keep soft state, refreshed periodically

    • Every refresh operation performs lookup(key) before storing the key in the right place

  • How can we increase reliability for the time between failure and refresh?


Summary dht advantages

Summary: DHT Advantages

  • Peer-to-peer: no centralized control or infrastructure

  • Scalability: O(log N) routing, routing tables, join time

  • Load-balancing

  • Overlay robustness


Dht disadvantages

DHT Disadvantages

  • No control where data is stored

  • In practice, organizations want:

    • Content Locality – explicitly place data where we want (inside the organization)

    • Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local

  • No prefix search


  • Login