350 likes | 362 Views
An overview of SHELL, a distributed heap overlay architecture for robust systems and P2P networks, explained by Christian Scheideler and Stefan Schmid in a 2008 lecture. Topics include dynamics in peer-to-peer computing, heterogeneity challenges, distributed heap properties, and the concept of an oblivious distributed heap. The presentation covers the motivation, objectives, and construction of the SHELL overlay graph designed for dynamic and fault-tolerant networks.
E N D
SHELL: A Distributed and Oblivious Heapwith Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks Christian Scheideler Stefan Schmid Network Algorithms Summer 2008
Bevor wir SHELL anschauen... • Prof. Scheideler an Konferenz • Deshalb: Spezialprogramm • Shell - Baut auf gelerntem auf! • Ongoing work... Keine Unterlagen Hat noch Lücken, ev. auch Fehler / Slides auf Englisch damit auch sonst mal gebrauchbar! Offen für Inputs / Ideen! DISTRIBUTED COMPUTING Stefan Schmid @ TU München, 2008
Motivation • Today, still many challenges in distributed systems (e.g., the Internet) • E.g., viruses, spam, DoS attacks, selfish users, etc. • Very active research • For example, peer-to-peer computing • Dynamics / churn: Peers join and leave frequently • In 1,000,000 network where peer sessions are around 60 minutes, there are hundreds of membership changes every second! • Peer-to-peer based on contributions of participants: problematic if users are selfish! • E.g., BitThief free-rides in BitTorrent • Heterogeneity: peers have different Internet connections, different CPUs, run different operating systems, etc. DISTRIBUTED COMPUTING Stefan Schmid @ TU München, 2008
SHELL Overview • SHELL = our overlay architecture • Basically, a distributed heap • Refresher: min heap - children have larger key than parent - e.g., useful for priority queues (fast removeMin()) DISTRIBUTED COMPUTING slide from GAD lecture 2008... Stefan Schmid @ TU München, 2008
Heap Refresher Heap in GAD... Stefan Schmid @ TU München, 2008
A Distributed Heap? • What is a distributed heap? • We assume that peers have a key / order / rank / id - for example: time when peer joined • (Min-) heap property: Peers only connect to peers of lower order - for example: peers only connect to older peers - Shell constructs a directed overlay (however, backward edges, see later) DISTRIBUTED COMPUTING 28 26 23 21 20 19 18 17 16 9 10 3 Stefan Schmid @ TU München, 2008
An Oblivious Distributed Heap? (1) • What is an oblivious distributed heap? • Oblivious = overlay topology only depends on set of currently active peers (and their IDs / orders) in the network - but not on history, e.g., on time when these peers joined! - example: if at join time, a new peer is inserted at the end of a list of peers, the resulting topology is not oblivious - example: if a new peer is inserted in a list of peers with respect to the peer‘s order, the topology is oblivious DISTRIBUTED COMPUTING Stefan Schmid @ TU München, 2008
An Oblivious Distributed Heap? (2) • Why is oblivious good? - the oblivious property is useful when it comes to fault-tolerance - e.g., desktops may crash temporarily, and will then rejoin - if topology is oblivious, peers can „remember“ their old contacts, and when an old contact reappears, it can be integrated immediately (instantaneous rejoin) DISTRIBUTED COMPUTING • Many systems today are oblivious - e.g., Pastry, Chord, etc. - but not: e.g., Pagoda - many systems in practice are not: Gnutella, BitTorrent, etc. Stefan Schmid @ TU München, 2008
Objectives of Shell • Primary goal: dynamic and robust overlay • In particular: - maintaining heap property - low peer degree, low network diameter, low congestion - fast join / rejoin / leave - peers can simply crash DISTRIBUTED COMPUTING • Applications - i-SHELL: A distributed information system robust to Sybil attacks - h-SHELL: A peer-to-peer system for heterogeneous environments Stefan Schmid @ TU München, 2008
Overlay Graph (1) • How to achieve these goals? • Overlay based on continuous-discrete approach - basically a de Bruijn graph • Refresher: continuous-discrete approach - peers in cyclic [0,1)-interval - connected to peer responsible for continuous position x/2 and (x+1)/2 Stefan Schmid @ TU München, 2008
Overlay Graph (2) • Our distributed heap has larger peer degree • Space is divided into different partitions - partition i = 2i intervals of size 1/2i - global partition renders analysis simpler („same views“) Stefan Schmid @ TU München, 2008
Overlay Graph (3) • Peer connects to all peers of lower order in - Level-i home interval (interval which includes position x of peer) - Adjacent level-i intervals to home - de Bruijn intervals: intervals which include position x/2 and (x+1)/2 • What is level i? - Level i chosen such that there are c log np peers in interval - np = total number of peers in system with lower order - np can be estimated, in the following we assume it is given Stefan Schmid @ TU München, 2008
Overlay Graph (4) • In order to ensure connectivity when many peers leave, interval size must be increased over time (peer upgrades to larger partition) • Similarly, if many peers of lower order join in interval, peers needs to downgrade • In addition to these forward edges, peers store incoming edges - called backward edges Stefan Schmid @ TU München, 2008
Overlay Graph (5) • These edges are already sufficient for Shell • However, in order to speed-up changes between levels, peer additionally store pointers to peers it would connect to if it upgraded - to „funnel“ to which peer would connect - of course, peer only connects to these lower order peers once they are on the corresponding level - requires notification mechanism Level 1 ... ... • In the following, we will not consider funnel edges in further detail! Level i-2 Level i-1 Level i Stefan Schmid @ TU München, 2008
Implication: Monotonicity • From this construction, we can already derive some properties • For instance, Shell features a monotonicity property: If two peers p and p‘ are connected to the same interval I and if p is of larger order than p‘, then p knows strictly more peers in I - because peers only connect to lower order peers in an interval Stefan Schmid @ TU München, 2008
Distributed Order...: A Simplification • In the following, we will assume that peers have distinct IDs • E.g., assigned at join time by network entry point • Otherwise: in case of multiple joins close in time, peers may not be able to decide which is older => need to introduce blackout zones, etc. • In the following, we will not consider this issue in more detail Stefan Schmid @ TU München, 2008
Analysis of Degree (1) • Topological description allows to analyze the peer degree • Peers employ the following strategy: if number of neighbors falls below c log n_p in at least one interval, all intervals are doubled • According to Chernoff bounds, it holds that if one interval contains c log n peers, there is no interval of size larger (1+d) c log n for any d > 0, with high probability. • Therefore, degree is in O(log n) w.h.p. - with funnel edges, the degree is log square Stefan Schmid @ TU München, 2008
Analysis of Degree (2) • What about incoming / backward edges? Stefan Schmid @ TU München, 2008
Routing (1) • The Shell overlay allows peers to route messages • Similarly to continuous-discrete routing (adjusting one bit after another) • Routing operation route(x) consists of two phases Phase 1: Route along forward edges to peer of lower order which is closest to x (or: to a lower order peer whose home region contains position x) Phase 2: Descent along backward edges to peer which is closest to x Implication: If a peer wants to send a message to a peer of lower order, only Phase 1 is necessary, and the message will not traverse any higher order peers! Stefan Schmid @ TU München, 2008
Routing (2) • Observe that in our overlay, peers have multiple neighbors which could be used for the next de Bruijn routing hop (log n neighbors per interval) • This can be exploited in order to minimize congestion • Routing policy: peer p always forwards packets to its neighbor which is of largest order among the eligible peers (lower order than p) • This alleviates load on very low order peers Stefan Schmid @ TU München, 2008
Routing (3) • Visualization of routing towards higher order peers • Messages travel towards lower order peers • But on each hop, as high order peer as possible is taken Stefan Schmid @ TU München, 2008
Routing (4) towards higher order peers • Analysis of Phase 1 - accoring to continuous-discrete routing, at most log n hops are needed to destination - we make the following observation: prob that all peers of order lower than p but higher than n_p-l_1 are in other interval prob that this peer is located in the corresponding interval Stefan Schmid @ TU München, 2008
Routing (5) towards higher order peers • Generally for i-th hop: • Summing up, after some lines of calculation, the probability that the final peer reached is of order np/2 or smaller is at most O(np-c) for some constant c With high probability, in first phase of routing, request travels to peer of order at least np/2. Stefan Schmid @ TU München, 2008
Routing (6) towards higher order peers • Definition of congestion: • So what is the congestion in the first routing phase? Stefan Schmid @ TU München, 2008
Routing (7) towards higher order peers • So what is the congestion in the first routing phase? See our argument before... At most k peers can send via p, routing path is of length log 2k and probability that it enters interval on one of these hops is c log k / k Stefan Schmid @ TU München, 2008
Routing (8) Theorem: First phase of routing terminates in logarithmic time and yields congestion of asymptotically log2 np. Stefan Schmid @ TU München, 2008
Routing (9) • Routing phase 2: descent along backward edges to higher order peers - idea: binary search which exploits monotonicity property - higher order peers know more about interval - on each level i, go to highest order peer which is located in interval which includes final position x - terminates in logarithmic time - logarithmic congestion: in each hop, a peer forwards at most one request Stefan Schmid @ TU München, 2008
Join and Leave • Join: similar to lookup, find highest order peer in final interval, get integrated • Leave: peers can even crash, not particular operation • Change of level in time O(1), update cost induced at other peers in O(log2 n) Stefan Schmid @ TU München, 2008
Application 1: i-Shell • i-Shell is a distributed information system • Idea: data management through consistent hashing approach • Generalized to multiple levels: on each level, data is stored on peer closest to x - on each hop during insertion, a replica is placed • Order of peers: time-stamps (assigned by network entry point) • Thus: peers only connect to older peers Stefan Schmid @ TU München, 2008
i-Shell • Therefore: - we immediately get that two peers p and p‘ can communicate on paths which include only peers which are of peers at least their age - this renders the communication independent of younger peers • Side benefit: measurement studies have shown that older peers typically have a longer remaining session time - renders topology more stable • Shells imply rebustness to various attacks • E.g., Sybil attack Stefan Schmid @ TU München, 2008
Sybil Attack (1) • Sybil attack - big problem in Internet - e.g., spam - Sybil: book by Flora Rheta about person with 16 identities • Attacker seeks to acquire many identities - e.g., to control large fraction of network • Countermeasures - virutal identities: captchas etc. - real identities? botnet? - Douceur has shown that issue is difficult to deal with in distributed environments... Stefan Schmid @ TU München, 2008
Sybil Attack (2) • Shell is resilient to Sybil attacks of any scale! • Model: Sybil attack starts at some time t0 • Theorem: traffic of old peers independent of Sybil attack • Techniques - Admission control - Rate control 3 5 traffic between older peers unaffected 4 7 9 12 higher peers can perform a rate control algorithm 10 8 21 14 15 11 attack originates from lower peers Stefan Schmid @ TU München, 2008
Application 2: h-Shell • Alternatively, IDs could represent inverse of the peers‘ capabilities • Therefore: peers only connect to peers with stronger capabilities • Interesting architecture for heterogeneous systems • Corollary: paths between strong peers only include strong peers • Interesting, e.g., for multi-quality live-streaming Stefan Schmid @ TU München, 2008
Conclusion • Distributed heap based on continuous-discrete appraoch • Oblivious for highly transient environments • Robustness to Sybil attacks of arbitrary scale • Alternatively, useful for heterogeneous environments • Work in progress... Stefan Schmid @ TU München, 2008