1 / 78

Membership (2)

Membership (2). Brian Cho Hieu Khac Le. Agenda. Full vs. Partial Membership SCAMP CYCLON T-MAN. I. Full vs. Partial Membership. Full Membership (last week) Group membership information held at all nodes Scalable failure detection Scalable membership? Large groups

Download Presentation

Membership (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Membership (2) Brian Cho Hieu Khac Le

  2. Agenda • Full vs. Partial Membership • SCAMP • CYCLON • T-MAN

  3. I. Full vs. Partial Membership • Full Membership (last week) • Group membership information held at all nodes • Scalable failure detection • Scalable membership? • Large groups • Nodes join/leave frequently • Partial Membership (this week) • Previously: Gnutella, Chord, etc. • Applications: multicast, data aggregation, resource discovery, etc. • Partial group membership information held at each node • Scalable membership protocols • Simulations with 100,000 ~ 1,000,000 nodes • Self-configuring • Implicit failure detection • Potential for isolation/partitioning

  4. Peer-to-peer membership management for gossip-based protocols A.J. Ganesh - Anne-Marie Kermarrec – Laurent Massoulie Presented by Brian Cho - Hieu Khac Le

  5. Motivation: How can we (reliably) multicast a message? • Centralized approach • R-multicast • Tree-based approach • Maintaining the tree • Probabilistic gossip-based multicast • Do we need global knowledge of system?

  6. Gossip-based multicast with partial membership • [Eugster et al, 2003] • Full membership, gossip to logN nodes • Probability that node A sends gossip message to node B is logN/N • Uniformly random partial membership, gossip to logN nodes (size of partial view l > logN) • Probability that node A sends gossip message to node B is also: (l/N) * (logN/l) = logN/N

  7. SCAMP • Membership management for gossip-based protocols • Random directed graph • Partial view size scales to (c+1)log(n) • If each node gossips to log(n)+k other nodes on average, then the probability that everyone gets the message converges to • Provides resilience for gossip-based protocols (multicast): gossip should work as well as in full membership • Membership maintenance • Rebalancing mechanisms

  8. Partial view and InView x

  9. Partial view and InView x

  10. Partial view and InView x

  11. Membership Graph x

  12. Subscription (Join) • New node sends message to contact node. The new node starts with a partial view of only the contact node. • Contact node forwards subscription requests to all nodes in its partial view and then sends c additional requests to randomly chosen nodes in its partial view. • A node keeps a subscription with probability p = 1/(1 + size of PartialView). If the node does not keep the subscription, it is forwarded to a random node in the PartialView. • When a subscription is kept by a node x, x sends a message to the new node telling it to put x in its InView. No explicit max

  13. Subscription analysis • Model system as random directed graph. • Supposethere are n nodes already in the graph. If the new nodesubscribes to a node with out-degree d, then d+c+1 edges are added. • Let E[Mn] be average number of edges in graph (E[Mn]/n is the average out-degree). Then: And we get: .

  14. Unsubscription (Leave) • Node i wishes to leave. • Inform l ‘- c - 1 nodes in InView to replace node i from their Partial Views with a node from node i’s Partial View. • Inform remaining c+1 nodes to remove i from their Partial Views.

  15. Unsubscription analysis • Model system as random directed graph. • Let E[Mn] be average number of edges in graph (E[Mn]/n is the average out-degree). Then: .

  16. Recovery from Isolation • Primary mechanism by which a network becomes disconnected is the isolation of individual nodes. • Each node sends periodic heartbeat messages to nodes in its partial view. • If no heartbeat messages received for some timeout then node resubscribes through an arbitrary node in its partial view. • Note: not failure detection, does not prevent partitioning of multiple nodes.

  17. Partial view size • Discrete event simulator (how are nodes injected?). • Subscription request to random contact node. • Target average view size of log(n) (i.e., c = 0). • Distribution matches target average view size.

  18. Multicast reliability • Multicast using first node that joined the system. • Comparable to full membership gossip.

  19. Multicast reliability (2) • Multicast using first node that joined the system. • Comparable to full membership gossip.

  20. Multicast reliability (3) • Multicast using random node in the system. • Exhibits bimodal behavior.

  21. Lease Mechanism • Nodes that subscribe first will have larger Partial Views. Also, must rebalance the graph in case of failed nodes that do not unsubscribe. • Each node leases its subscription for a certain time, after which it must resubscribe. • When a node’s lease expires, all nodes that have the expiring node in their Partial View will remove the expiring node.

  22. Impact of Lease Mechanism • Improves distribution of partial view sizes.

  23. Impact of Lease Mechanism (2) • Lease mechanism increases probability of delivery. • Random node compares favorably with first node in system.

  24. Indirection • Problem: Subscription contact nodes must be uniformly random among nodes in the system. However, in reality only a few well-known nodes. • Solution: Use indirection to find random contact nodes from well-know nodes. • Periodically update weights for links: • Probabilistically forward according to weights until counter(TTL)=0.

  25. Impact of Indirection Mechanism • Request to random contact node (ideal) vs. Indirection Mechanism (realistic) • A (slight) decrease in reliability.

  26. Discussion • Is a random graph created? • How much bandwidth is used for the lease mechanism? • What are the effects of churn on graph structure?

  27. CYCLON: Inexpensive Membership Managementfor Unstructured P2P Overlays Spyros Voulgaris – Daniela Gavidia – Maarten van Steen Presented by Brian Cho - Hieu Khac Le

  28. III. CYCLON • Random Graph • Low clustering coefficient • Low diameter • Low average shortest path length • CYCLON • Basic shuffling creates random graph • Parameters: cache size (c), shuffle length (l) • Enhanced shuffling used in CYCLON because it maintains resilient overlay x x CC(x)=1 CC(x)=1/3

  29. Basic shuffling cache size = 5 l = 3 Initiator P: 2 Q: 9 2: {0, 1, 3, 6, 9} 9: {0, 4, 5, 6, 7} 1. Select a random subset of l neighbors from P’s own cache, and a random peer, Q, within this subset.

  30. Basic shuffling 2: {0, 1, 3, 6, 9} 9: {0, 4, 5, 6, 7} 2. Replace Q’s address with P’s address

  31. Basic shuffling 2: {0, 1, 3, 6, 9} 9: {0, 4, 5, 6, 7} 3. Send the updated subset to Q. (On reception of request, Q randomly selects a subset of its neighbors) 4. Receive from Q a subset of no more than l of Q’s neighbors.

  32. Basic shuffling 2: {0, 1, 3, 6, 9} 2: {0, 1, 3, 5, 7} 9: {0, 4, 5, 6, 7} 9: {0, 4, 2, 6, 7} 5. Discard entries pointing to P, and entries that are already in P’s cache 6. Update P’s cache to include all remaining entries, by firstly using empty cache slots (if any), and secondly replacing entries among the ones originally sent to Q. (Q also executes steps 5 and 6)

  33. CYCLON and random graphs • 100,000 nodes, starting from a chain (worst case). • Directed graph converted to undirected graph. • Average path length and clustering coefficient converge to values of a random graph almost exponentially.

  34. CYCLON and random graphs (2) • Starting from a chain; directed graph converted to undirected graph. • Average path length increases logarithmically with number of nodes. • Cluster coefficient drops exponentially with number of nodes. • Convergence value of average path length and clustering coefficient same as random graph values.

  35. Enhanced Shuffling - Motivation • Basic shuffling creates random membership graphs • However, pointers to dead nodes can potentially be passed around for a long time • To maintain up-to-date overlay: • Pointers to dead nodes should be removed quickly, ideally within a given time bound

  36. Enhanced shuffling • Increase by one the age of all neighbors. • Select neighbor Q with the highest age among all neighbors, and l-1 other random neighbors. • Replace Q’s entry with a new entry of age 0 and with P’s address. …

  37. Enhanced shuffling 2: {<0, 0>, <1, 1>, <3, 2>, <6, 3>, <9, 4>} 2: {<0, 1>, <1, 2>, <3, 3>, <5, 2>, <7, 4>} 9: {<0, 0>, <4, 1>, <5, 2>, <6, 3>, <7, 4>} 9: {<0, 0>, <4, 1>, <2, 0>, <6, 3>, <7, 4>} • Notice the new pointer to 2 has age 0. • The only pointer that is removed is the pointer to 9. • * Connectivity is guaranteed (in a fail-free environment).

  38. Adding Nodes • P’s introducer initiates c random walks with TTL close to expected average path length. • At node Q, where a random walk ends: • Replaces one of its cache’s entries with an entry for P of age 0. • Forwards replaced cache entry to P.

  39. Removing Nodes • No distinction between nodes disconnecting gracefully or abruptly • Timely implicit failure detection through enhanced shuffling • Shuffle initiation acts as a ping; pointer removed if no response. • When node P fails, a shuffle is initiated at each node holding pointer to P after at most (cache size) periods w.h.p.

  40. Implicit failure detection • 50,000 nodes removed simultaneously from 100,000 node system. • Enhanced shuffling limits detection of dead nodes to number of cycles equal to the cache size.

  41. Robustness • Nodes removed simultaneously in 100,000 node system. • Robustness + implicit failure detection = self-healing.

  42. Shuffle length • 100,000 nodes, starting from a chain (worst case). • Shuffling too few or too many neighbors results in caches not being mixed well.

  43. Discussion • Set cache size = (c+1)log(n) • Despite churn, system size n stays within a constant factor in P2P systems [Bhagwan et al, 2003] • Proactive vs. Reactive • Simplicity • Is SCAMP really reactive? (Lease mechanism) • Network aware overlay • Simplicity • Link failure model

  44. T-MAN: Fast Gossip-based Construction of Large-Scale Overlay Topologies Márk Jelasity - Ozalp Babaoglu Presented by Hieu Khac Le – Brian Cho

  45. Motivation • Topology: • “Who knows whom” • “Who connects to whom” • Topology is mainly for: • Communication: most popular • Defining total order relationship (sorting): rarely • Others: search, clustering

  46. Motivation • T-MAN: • A general framework to: • Define topologies • Construct topologies • In very large scale distributed system with logarithmic cost w.r.t number of nodes. • Using: • A ranking function to define topology • Gossip to construct topology

  47. Outline • Basic Ideas • System model • The problem • The proposed solution • Analysis and Refinements • Other Considerations: • Simulation Experiments • Conclusions

  48. Basic Ideas – System Model • Model: • A set of nodes connected through a routed network. • Each node maintain a set of c node descriptors • Assumptions: • Synchronous system • Communication channels are reliable

  49. Basic Ideas – The Problem • Input: • N: number of nodes • c: view size of a nodes • R: a ranking function: • R(x, {y1, …, ym}) all possible ordering of m nodes • Output: • For all x, there is no node out of viewx consistently ranked higher than any node in viewx

  50. Basic Ideas – The problem – Example Topologies • Line and ring: • d(a, b) = |a – b| • d(a, b) = min(N – |a – b|, |a – b|) • Mesh and torus: • d(a, b) = |a.x – b.x| + |a.y – b.y| • d(a, b) = min(N - |a.x – b.x|, |a.x – b.x|) + min(N - |a.y – b.y|, |a.y – b.y|) • Binary tree: • d(S1, S2) • Sorting problems: • R: allow defining total ordering function

More Related