Viceroy scalable emulation of butterfly networks for distributed hash tables
Download
1 / 37

Viceroy: Scalable Emulation of Butterfly Networks For Distributed Hash Tables - PowerPoint PPT Presentation


  • 52 Views
  • Uploaded on

Viceroy: Scalable Emulation of Butterfly Networks For Distributed Hash Tables. By: Dahlia Malkhi, Moni Naor & David Ratajzcak Nov. 11, 2003 Presented by Zhenlei Jia Nov. 11, 2004. Acknowledgments. Some of the following slides are adapted from the slides created by the authors of the paper.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Viceroy: Scalable Emulation of Butterfly Networks For Distributed Hash Tables' - garrett-burgess


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Viceroy scalable emulation of butterfly networks for distributed hash tables

Viceroy: Scalable Emulation of Butterfly Networks For Distributed Hash Tables

By: Dahlia Malkhi, Moni Naor & David Ratajzcak

Nov. 11, 2003

Presented by Zhenlei Jia

Nov. 11, 2004


Acknowledgments
Acknowledgments Distributed Hash Tables

Some of the following slides are adapted from the slides created by the authors of the paper


Outline
Outline Distributed Hash Tables

  • Outline

  • DHT Properties

  • Viceroy

    • Structure

    • Routing Algorithm

    • Join/Leave

    • Bounding In-degree: Bucket Solution

    • Fault Tolerance

  • Summary


DHT Distributed Hash Tables

  • What’s DHT

    • Store (key, value) pairs

    • Lookup

    • Join/Leave

  • Examples

    • CAN, Pastry, Tapestry, Chord etc.


Dht properties
DHT Properties Distributed Hash Tables

  • Dilation

    • Efficient lookup, usually O(log(n))

  • Maintenance cost

    • Support dynamic environment

    • Control messages, affected servers

  • Degree

    • Number of opened connections

    • Servers impacted by node join/leave

    • Heartbeat, graceful leave


Dht properties cont
DHT Properties (cont.) Distributed Hash Tables

  • Congestion:

    • Peers should share the routing load evenly

    • Load (of a node): the probability that it is on a route with random source and destination.

    • If path length = O(log(n)) then on average, each node is on n2 x O(log(n))/n = O(nlog(n)) routes. Average load = O(nlogn)/n2 = O(log(n))/n


Previous works
Previous Works Distributed Hash Tables


Intuition
Intuition Distributed Hash Tables

  • Route is a combination of links of appropriate size

  • Chord: Each node has ALL log(n) links

  • Viceroy

    • Each node has ONE of the long-range links

    • A link of length 1/2k points to a node has link of length 1/2k+1

Chrod


A butterfly network

011 Distributed Hash Tables

110

000

001

010

100

101

111

Level 1

Level 2

Level 3

Level 4

A Butterfly Network

  • Each node has ONE of the long-range links

  • A link of length 1/2k points to a node has link of length 1/2k+1

  • Nodes “share” each other’s long link

  • Routing

  • Route to root

  • Route to right group

  • Route to right level

  • Path: O(log(n))

  • Degree: O(1)


  • A viceroy network

    0101 Distributed Hash Tables

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    0

    1

    Level 1

    Level 2

    Level 3

    A Viceroy network

    • Ideally, there should be log(n) levels

    • There is not a global counter

    • Later, we will see how a node can estimate log(n) locally


    Structure nodes
    Structure: Nodes Distributed Hash Tables

    • Node

      • Id: 128 bits binary string, u

      • Level: positive integer, u.level

    • Order of ids

      • b1b2…bk ∑i=1…k bi/2i

      • Each node has a SUCCESSOR and a PREDECESSOR

        SUCC(u), PRED(u)

      • Node u stores the keys k such that u≤k<SUCC(u)


    Structure nodes1

    Keys stored on x Distributed Hash Tables

    0

    1

    PRED(x)

    x

    SUCC(x)

    Structure: Nodes

    • Lemma 2.1

      Let n0 = 1/d(x, SUCC(x)), then w.h.p. (i.e. p>1-1/n1+e) that

      log(n)-log(log(n))-O(1) <log(n0) ≤3log(n)

    • Node x selects level from 1…log(n0) uniformly randomly


    Structure links
    Structure: Links Distributed Hash Tables

    • A node u in level k has six out links

      • 2 x Short: SUCCESSOR ,PREDECESSOR

      • 2 x Medium: (left) closest level-(k+1) node whose id matches u.id[k] and is smaller than u.id.

      • 1 x Long: the closest level-(k+1) node with prefix u1…uk-1(1-uk)(?)

        u1…uk-1(1-uk)uk+1…uw*

        where w=log(n0)-log(log(n0))

      • 1 x Parent: closest level-(k-1) node

    • Also keeps track of in-bound links


    Structure links1

    0101 Distributed Hash Tables

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    0

    1

    Short link

    Long link, cross over about 1/2k

    Matches u[w] except kth bit. (11*)

    Short link

    Level 1

    Medium link

    Matches x[k]0*

    Matches 1*

    Wrong!

    Level 2

    Parent link, to level k-1

    Level 3

    Structure: Links


    Routing algorithm
    Routing: Algorithm Distributed Hash Tables

    LOOKUP(x, y):

    Initialization: set cur to x

    Proceed to root: while cur.level > 1:

    cur = cur.parent

    Greedy search:

    if cur.id ≤ y < SUCC(cur).id, return cur.

    Otherwise, choose m from links of cur that minimize d(m, y), move to m and repeat.

    Demo: http://www.cs.huji.ac.il/labs/danss/anatt/viceroy.html


    Routing example

    y Distributed Hash Tables

    0101

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    x

    0

    1

    Level 1

    Level 2

    Level 3

    Routing: Example


    One observation

    0101 Distributed Hash Tables

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    0

    1

    Level 1

    Level 2

    Level 3

    One Observation


    Routing analysis 1

    y Distributed Hash Tables

    0101

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    x

    0

    1

    Level 1

    Level 2

    Level 3

    Routing: Analysis (1)


    Routing analysis 2
    Routing: Analysis (2) Distributed Hash Tables

    • Expected path length = O(log(n))

      • log(n ) to `level-1’ node

      • log(n ) for traveling among clusters

      • log(n ) for final local search


    Routing theorems
    Routing: Theorems Distributed Hash Tables

    • Theorem 4.4

      The path length from x to y is O(log(n)) w.h.p.

    • Proof is based on several lemmas

    • Lemma 4.1

      For every node u with a level u.level < log(n)-log(log(n)), the number of nodes between u and u.Medium-left (Medium-right), if it exists, is at most 6log2(n) w.h.p.


    Routing theorems 2
    Routing: Theorems (2) Distributed Hash Tables

    • Lemma 4.2

      In the greedy search phase of a lookup of value Y from node x, let the j’th greedy step vj, for 1 ≤ j ≤ m, be such that vj is more than O(log2(n)) nodes away from y. Then w.h.p. node vj is reached over a Medium or Long link, and hence satisfies vj.level = j and vj[j] = Y[j].

    • m = log(n)-2loglog(n)-log(3+e)

    • W.h.p. within m steps, we are n/2m = 6log2(n) nodes away from the destination


    Routing theorems 3
    Routing: Theorems (3) Distributed Hash Tables

    • Lemma 4.3

      Let v be a node that is O(log2(n)) nodes away from the target y. Then w.h.p., within O(log(n)) greedy steps that target y is reached from v.

    • Theorem 4.4

      The total length of a route from x to y is O(log(n)) w.h.p.

    • Theorem 4.6

      Expected load on every node is O(log(n)/n).

      The load on every node is log2(n)/n w.h.p.

    • Theorem 4.7

      Every node u has in-degree O(log(n)) w.h.p.


    Join algorithm
    Join: Algorithm Distributed Hash Tables

    • Choose identifier: select a random 128 bits x1x2…x128

    • Setup short links: invoke LOOKUP(x), let x’ be the result node. Insert x between x’ and x’.SUUCESSOR.

    • Choose level: let k be the maximal number of matching prefix bits between x and either SUCC(x) or PRED(x), choose level from 1…k.

    • Set parent link: If SUCC(x) has level x.level-1, set x.parent to it. Otherwise, move to SUCC(x) and repeat.

    • Set long link: p = x1…xk-1(1-xk)xk+1…xw

      Invoke LOOKUP(p), stop after a node at level x.level+1 and matches p

      is reached.


    Join algorithm cont
    Join: Algorithm (cont.) Distributed Hash Tables

    6. Set medium links: Denote p = x1x2…xx.level. If SUCC(x) has prefix p and level x.level+1, set x.Medium-right link to it. Otherwise, move the SUCC(x) and repeat.

    7.Set inbound links: Denote p = x1x2…xx.level.

    Set inbound Medium links: Following SUCC links, so long as successor y has a prefix p and a level different from x.level, if y.level = x.level-1, set y.Medium-left to x.

    Set inbound long links: Following SUCC links, find y that has a prefix matches p and has level x.level. Take any inbound links that is closer to x than y.

    Set inbound parent links: Following PRED link, find y such that y.level = x.level+1. Repeat until meet a node in same level as x.


    Join example

    Set Medium link: Distributed Hash TablesO(lg2n) w.h.p

    p = x1x2…xk (01)

    If y[k] != p: stop

    If y[k]=p and y.level=k+1:

    set Medium link

    Otherwise, move to succ(y)

    0101

    0111

    1001

    1011

    1101

    1111

    0001

    0010

    0011

    0100

    0110

    1000

    1110

    0

    1

    Lookup(x)

    Level 1

    X

    Level 2

    Set inbound long links:

    Following short links, find y such that y[k]=x[k] and y.level = x.level, check y’s inbound links.

    STOP

    Level 3

    Set long link

    P = x1…xk-1(1-xk)…xw stop at level k+1?

    In this case, find 00*

    Join: Example

    Set Parent link:

    Following SUCC link, find a node has level k-1.

    0111


    Join analysis
    Join: Analysis Distributed Hash Tables

    • LOOKUP takes O(log(n)) messages w.h.p.

    • Travels on short links during link setting phase is O(lg2n) w.h.p.

      • A Medium link is within 6log2(n) nodes from x w.h.p.

      • Similar for others

    • Theorem 5.1:

      A JOIN operation by a new node x incurs expected O(log(n)) number of messages, and O(log2(n)) messages w.h.p.

      The expected number of nodes that change their state as a result of x’s join is constant, and w.h.p is O(log(n)).

      Because node x has O(log(n)) in-degrees w.h.p.

      Similar results holds for LEAVE.


    Bounding in degrees
    Bounding In-degrees Distributed Hash Tables

    • Theorem 4.7

      Every node has expected constant in-degree, and has O(log(n)) in-degree w.h.p.

    • In-degree=# of servers affected by join/leave

    • How to guarantee constant in-degree?

    • Bucket solution

      • A background process to balance the assignment of levels


    Bucket solution intuition

    Level k-1 Distributed Hash Tables

    Level k

    Bucket Solution: Intuition

    ~log(n)

    • Node x has log(n) in-degree, assuming Medium Right

    x

    • Too many nodes at level k-1;

    Too few nodes at level k

    • Improve the level selection procedure


    Bucket solution

    0101 Distributed Hash Tables

    1001

    1011

    0011

    0110

    1101

    1110

    1111

    0001

    0100

    1000

    0010

    0

    1

    Bucket Solution

    • The name space is divided into non-overlapped buckets.

    • A bucket contains m nodes, where log(n) ≤m ≤ clog(n), for c>2.

    • In a buckets, levels are NOT assigned randomly

    • For each 1≤j≤log(n), there are 1…c nodes at level j in each bucket

    • In(x) < 7c (?? 2c)


    Maintaining bucket size
    Maintaining Bucket Size Distributed Hash Tables

    • n can be accurately estimated

    • When bucket size exceeds clog(n), the bucket is split into two equal size buckets.

    • When bucket size drops below log(n), it is merged with a neighbor bucket.

      Further more, if the merged bucket is greater than log(n)x(2c+2)/3, the new bucket is split into two buckets.

      (c+1)/3 > 1 since c>2

    • Buckets are organized into a ring, which can be merged or split with O(1) message.


    Maintain level property
    Maintain Level Property Distributed Hash Tables

    • Node join/leave without merging or splitting O(1)

      • Join: size < clog(n), choose a level that has less that c nodes

      • Leave: If it is the only node in its level, find another level that has two nodes, reassign level j to one of them.

    • Bucket merge or split may result in a reassignment of the levels to all nodes in the bucket(s) O(log(n))

    • Merging/splitting are expensive, but they do not happen very often

    • After a merging or splitting of buckets, at least log(n) (c-2)/3 JOIN/LEAVE must happen in this bucket until another merging or splitting of this bucket is performed

      Amortized Overhead = c/((c-2)/3) = O(1) for c>2


    Amortized analysis

    Max bucket size Distributed Hash Tables

    New bucket size

    clog(n)

    d1

    d2

    Log(n)

    min(c/2lgn, (c+1)/3lgn)

    max(c/2lgn, (2c+2)/3lgn)

    d1, d2 > (c-2)/3

    Amortized analysis


    Fault tolerance
    Fault Tolerance Distributed Hash Tables

    • Viceroy has no built in support for fault tolerance

    • Viceroy requires graceful leave

      • Leaves are NOT the same as failures

    • Performance is sensitive to failure

    • External techniques:

      • Thickening Edges

      • State Machine Replication


    State machine replication

    Old Distributed Hash Tables

    New

    SMR

    SMR

    State Machine Replication

    Viceroy nodes

    Super node


    Related works
    Related Works Distributed Hash Tables

    • De Bruijn Graph Based Network

      • Distance halving

      • D2B

      • Koorde

    • Others

      • Symphony (Small world model)

      • Ulysses (ButterFly, log(n), log(n)/loglogn)


    Summary
    Summary Distributed Hash Tables

    • Constant out-degree

    • Expected constant in-degree

      • O(log(n )) w.h.p.

      • O(1) with bucket solution

    • O(log(n )) path length w.h.p

    • Expected log(n )/n load:

      • O(log2(n)/n) w.h.p.

    • Weakness/improvements:

      • Not Locality Aware

      • No Fault Tolerance Support

      • Due to the lack of flexibility of ButterFly network


    Question
    Question Distributed Hash Tables

    Photo by Peter J. Bryant


    ad