1 / 35

Principles of Reliable Distributed Systems Tutorial 4: SkipNet

Principles of Reliable Distributed Systems Tutorial 4: SkipNet. Spring 2009 Alex Shraer. Reading Material. SkipNet: A Scalable Overlay Network with Practical Locality Properties Harvey, Jones, Saroiu, Theimer, Wolman Microsoft Research. Reminder: DHT Advantages.

mliss
Download Presentation

Principles of Reliable Distributed Systems Tutorial 4: SkipNet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles of Reliable Distributed SystemsTutorial 4: SkipNet Spring 2009 Alex Shraer

  2. Reading Material • SkipNet: A Scalable Overlay Network with Practical Locality PropertiesHarvey, Jones, Saroiu, Theimer, WolmanMicrosoft Research

  3. Reminder: DHT Advantages • Peer-to-peer: no centralized control or infrastructure • Scalability: O(log N) routing, routing tables, join time • Load-balancing

  4. DHT Disadvantages: SkipNet Motivation • No control where data is stored • Data may be stored far from its users • Data may be stored outside its domain • Local accesses leave local organization • In practice, organizations want: • Content Locality – explicitly place data where we want (inside the organization) • Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local • No prefix search • Search(key) returns file whose name is the closest prefix to key.

  5. Practical Requirements • Data Controllability: • Organizations want control over their own data • Even if local data is globally available • Manageability: • Data control allows for data administration, provisioning and manageability

  6. Practical Requirements (cont’d) • Security: • Content and path locality are key building blocks for dealing with certain external attacks (DoS, Traffic analysis) • Data availability • Local data survives network partitions. • Performance • Data can be stored near clients that use it

  7. SkipNet Content Locality • Place files at nodes according to names • Name ID space (DNS-like) • for files and nodes • node name = reverse DNS name of the host (com.microsoft.host1) • file names have same prefix • Problem?

  8. Constrained Load-Balancing • Data uniformly distributed in designated subset of nodes • e.g., inside organization

  9. SkipNet’s Two Name Spaces Name ID Space com.microsoft.host1 non-uniform h(com.microsoft.host1) Numerical ID Space uniform

  10. Skip List data structure (Pugh 99) 21  - • In-memory dictionary data structure. • Elements are stored in a sorted linked list: • Problem: Search, Insert, Delete take O(N) • N – number of nodes in the list. • Solution: a subset of nodes will have additional links to skip over many list elements • Perfect (deterministic) skip list: • Pointer at level h skips over 2h elements • Search: O (log N) • Insertion/deletion: expensive/awkward 9 21 head tail 26 9 21 17 6 3 25 26 9 12 19 21 7 17 6

  11. Probabilistic Skip List 6 6 25 • Node at level h with probability 1/2h • Search, Insert, Delete: O (log N) w.h.p. head tail 6 17 9 25 3 25 26 9 12 19 21 7 17 6

  12. Skip List: Good for Us? • The Good: • Sorted list: path locality for name-based search • O(log N) (w.h.p.) operations • The Bad: • Lookup starts from root only • Unequal load • nodes on the top levels have high chance to be in routing path

  13. Ring 000 Ring 001 Ring 010 M Ring 011 Ring 100 Ring 101 O D Ring 110 Ring 111 A T X Z V M O D A T Ring 00 Ring 01 X Ring 10 Z Ring 11 V M O D A T Ring 1 X Ring 0 Z V M D O A T Root Ring Z V X SkipNet Global View L = 3 L = 2 L = 1 Level L = 0 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

  14. SkipNet Structure • Skip Graph = Distributed Skip List • Every node belongs to rings at all levels • Search can start at any node • Use doubly linked lists at each level • Perfect vs. Probabilistic • Perfect : Pointers at level h point to nodes that are exactly 2h nodes to the left and right. • Probabilistic : A node in level h probabilistically determines which ring it belongs to.

  15. M O D T A V X Z M D A T O V Z X M O D T A Z V X M D O T A Z V X SkipNet Routing Tables Ring 100 Ring 101 Ring 110 Ring 111 Ring 000 Ring 001 Ring 010 Ring 011 L = 3 Ring 01 Ring 00 Ring 10 Ring 11 L = 2 Node A’s Routing Table Ring 1 Ring 0 L = 1 Root Ring Level: L = 0

  16. An Alternative View 010 110 101 M D O 000 001 A T Z V X 100 111 011 SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown.

  17. Routing By Name ID • Like search in a Skip List • Simple Rule: • Forward the message to node that is closest to destination, without going too far. • Route either clockwise/counterclockwise • Terminates when messages arrives at a node whose name ID is closest to destination. • Number of hops is O(log N) w.h.p.

  18. M O D T A V X Z M D A T O V Z X M O D T A Z V X M D O T A Z V X Example: Routing from A to V Ring 100 Ring 101 Ring 110 Ring 111 Ring 000 Ring 001 Ring 010 Ring 011 L = 3 Ring 01 Ring 00 Ring 10 Ring 11 L = 2 Ring 1 Ring 0 L = 1 Root Ring Level: L = 0

  19. M O D T A V X Z M D A T O V Z X M O D T A Z V X M D O T A Z V X Example: Routing from A to V Ring 100 Ring 101 Ring 110 Ring 111 Ring 000 Ring 001 Ring 010 Ring 011 L = 3 Ring 01 Ring 00 Ring 10 Ring 11 L = 2 Node T’s Routing Table Ring 1 Ring 0 L = 1 Root Ring Level: L = 0

  20. M O D T A V X Z M D A T O V Z X M O D T A Z V X M D O T A Z V X Example: Routing from A to V Ring 100 Ring 101 Ring 110 Ring 111 Ring 001 Ring 010 Ring 011 Ring 000 L = 3 Ring 01 Ring 00 Ring 10 Ring 11 L = 2 Ring 1 Ring 0 L = 1 Root Ring Level: L = 0

  21. E O D T A V X Z E D A T O V Z X E O D T A Z V X E D O T A Z V X • Route from A to F -> Terminates at E Example: Routing to Object Ring 100 Ring 101 Ring 110 Ring 111 Ring 001 Ring 010 Ring 011 Ring 000 L = 3 Ring 01 Ring 00 Ring 10 Ring 11 L = 2 Ring 1 Ring 0 L = 1 Root Ring Level: L = 0

  22. Name ID Routing Algorithm Load Balancing // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); } SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } Path Locality

  23. Routing By Numeric ID • Numeric id’s are random, no ring is sorted by them • We can’t route top-down! • Bottom-up Routing • Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. • Messages forwarded from ring in level h, Rh, to a ring in level h+1, Rh+1, such that nodes in Rh+1 share h+1 digits with destination numeric ID. • Terminates when message delivered, or none the nodes in Rh share h+1 digits with destination numeric ID

  24. Foo.c Example: Routing by Numeric ID Ring 100 Ring 101 Ring 110 Ring 111 Ring 000 Ring 001 Ring 010 Ring 011 M O D L = 3 T A V Z X • Hash(“Foo.c”) = 101 M D Ring 01 Ring 00 Ring 10 Ring 11 A T L = 2 O V Z X M O D Ring 1 Ring 0 T L = 1 A Z V X M D O Root Ring T A Level: L = 0 Z V X

  25. Routing by Numeric ID • The same routing tables are used for routing by nameID and numericID • The number of message hops is O(log N) whp • What sequential data structure does this search resemble?

  26. Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }

  27. Routing Summary • It all depends on how we look at the routing tables … • What is the data structure consisting of all the pointers in the rings that the specific node’s name ID belongs to? • A Skip List! Search is top-down. • What is the data structure consisting of all the rings in respect to searching by numeric id? • A Trie! Search is bottom-up. • The search in both directions takes O(log N) messages whp. • Ready for join/departure procedures?

  28. Node Join • Two-stage process: (1) bottom-up + (2) top-down • Bottom-up: find the top level ring that matches the node’s numeric ID. • Top-down: build the new node’s routing table • Find a neighbor in the top ring using name ID search. • Starting from this neighbor, search for the name ID at the next lower level and thus find neighbors at lower level. • Repeated until the search reaches the root. • Update of the existing nodes’ routing tables: • after the new node has joined the root ring.

  29. Node join illustrated Joining node Ring P1 Ring P0 Only a few in expectation Ring P

  30. Node Join - Analysis • Key ideas: • Climb to a weakly populated ring. • Search for the node’s neighbors at the lower levels only after finding the neighbors at the higher levels. • The range of traversed nodes at the level = the range of neighbors at the next higher level. • Insertion traverses O(log N) hops whp • Expected O(log N) levels, constant number of neighbors at each level.

  31. Node Departure/Failure • Graceful (notified) vs crash departure • Key issue –routing tables’ update • Key idea – separate vital info from optimizations • Routing is correct as long as the root level ring is maintained. • Other levels regarded as optimization hints • Does this remind something? • Upper-ring membership maintained through a background repair process.

  32. Leaf Sets • Idea = use redundant pointers at level 0: • Protect from independent failures • Improve the search performance • Store L/2 pointers in every direction • SkipNet uses L=16 • Not an original SkipNet idea – used in Pastry.

  33. Numeric Routing Name Routing Constrained Load Balancing (CLB) • Multiple DHTs with differing scopes using a single SkipNet structure • A result of the ability to route in both address spaces • Divide data object names into two parts with !CLB DomainCLB Suffix microsoft.com!skipnet.html • microsoft.com/skipnet.html! – controlled placement • !microsoft.com/skipnet.html – Global DHT

  34. skipnet. html CLB Example com.microsoft • File ID = “com.microsoft!skipnet.html” • Route by name ID to com.microsoft • Inside com.microsoft, route by numeric ID to hash(“skipnet.html”) com.sun gov.irs edu.ucb

  35. com.microsoft.research SkipNet Path Locality com.microsoft • Organizations correspond to contiguous SkipNet segments • Internal routing by NameID remains internal • Nodes have left / right pointers com.sun gov.irs edu.ucb

More Related