cs 525 advanced topics in distributed systems spring 07 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CS 525 Advanced Topics in Distributed Systems Spring 07 PowerPoint Presentation
Download Presentation
CS 525 Advanced Topics in Distributed Systems Spring 07

Loading in 2 Seconds...

play fullscreen
1 / 51

CS 525 Advanced Topics in Distributed Systems Spring 07 - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

CS 525 Advanced Topics in Distributed Systems Spring 07. Indranil Gupta Handling Stress March 27, 2007. Traditional Fault-tolerance in Distributed Systems. Node failures M assive failures. Intermittent message losses Network outages Network partitions. Stress in Distributed Systems.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS 525 Advanced Topics in Distributed Systems Spring 07' - chaela


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs 525 advanced topics in distributed systems spring 07

CS 525 Advanced Topics in Distributed SystemsSpring 07

Indranil Gupta

Handling Stress

March 27, 2007

traditional fault tolerance in distributed systems
Traditional Fault-tolerance in Distributed Systems

Node failures Massive failures

Intermittent message losses Network outages Network partitions

stress in distributed systems
Stress in Distributed Systems

Node failures Massive failures

Intermittent message losses Network outages Network partitions

Perturbation Churn

Static Objects Dynamic Objects

stress in distributed systems1
Stress in Distributed Systems

Node failures Massive failures

Intermittent message losses Network outages Network partitions

Today’s Focus

Perturbation Churn

Static Objects Dynamic Objects

papers
Papers

We’ll concentrate on one class of distributed systems: peer to peer systems

We’ll study:

  • Characteristics of Churn: How does node availability vary in peer to peer systems?
  • Effect of Churn: How do p2p DHTs behave under churn?
  • Fighting Churn: How do we use churn to our benefit?
understanding availability

Understanding Availability

R. Bhagwan, S. Savage, G. Voelker

University of California, San Diego

goals
Goals
  • Measurement study of peer-to-peer (P2P) file sharing application
    • Overnet (January 2003)
  • Analyze collected data to analyze availability
    • Host IP address changes
    • Diurnal patterns
    • Interdependence among nodes
overnet
Overnet
  • Based on Kademlia, a DHT
  • Each node uses a random self-generated ID
    • The ID remains constant (unlike IP address)
    • Used to collect availability traces
  • Routing works in a similar manner to Gnutella
  • Widely deployed (eDonkey)
  • Overnet protocol and application are closed-source, but have already been reverse engineered.
experiment methodology
Experiment Methodology
  • Crawler:
    • Takes a snapshot of all the active hosts by repeatedly requesting 50 randomly generated IDs.
    • The requests lead to discovery of some hosts (through routing requests), which are sent the same 50 IDs, and the process is repeated.
    • Run once every 4 hours to minimize impact
experiment methodology1
Experiment Methodology
  • Prober:
    • Probe the list of available IDs to check for availability
      • By sending a request to ID I; request succeeds only if I replies
      • Does not use TCP, avoids problems with NAT and DHCP
    • Used on only randomly selected 2400 hosts from the initial list
    • Run every 20 minutes
  • All Crawler and Prober trace data from this study is available for your project (ask Indy if you want access)
experiment summary
Experiment Summary
  • Ran for 15 days from January 14 to January 28 (with problems on January 21) 2003
  • Each pass of crawler yielded 40,000 hosts.
  • In a single day (6 crawls) yielded between 70,000 and 90,000 unique hosts.
  • 1468 of the 2400 randomly selected hosts probes responded at least once
host availability
Host Availability

As time interval

increased, av.

decreases

diurnal patterns
Diurnal Patterns
  • Normalized to
  • “local time” at peer,
  • not EST
  • N changes by only
  • 100/day
  • 6.4 joins/host/day
  • 32 hosts/day lost
are node failures interdependent
Are Node Failures Interdependent?

30% with 0 difference, 80% within

+-0.2

Should be same

if X and Y

independent

arrival and departure
Arrival and Departure
  • 20% of nodes each day
  • are new
  • Number of nodes
  • stays about 85,000
conclusions and discussion
Conclusions and Discussion
  • Each host uses an average 4 different IP addresses within just 15 days
    • Keeping track of assumptions is important for trace collection studies
  • Availability data optimistic if we keep track of host IP aliasing
    • But still high churn
    • How does one design churn-resistant systems?
  • Strong diurnal patterns
    • Design DHTs that are adaptive to time-of-day?
  • No strong correlation among failure probabilities – use of redundancy may be ok in p2p systems
  • High churn rates
    • How does it affect internals of structured DHTs?
comparing the performance of dhts under churn

Comparing the Performance of DHTs under Churn

J. Li, J. Stribling, T.M. Gil, R. Morris, M.F. Kaashoek

MIT

comparing different dhts
Comparing different DHTs
  • Metrics to measure
    • Cost = number of bytes of messages sent
    • Performance = latency for a query
  • p2psim
    • 1024 nodes (inter-node latencies obtained from DNS servers, avg. 152 ms)
    • lookups issued for random keys at exponentially distributed intervals (avg. 10 min)
    • nodes crash and rejoin at exponentially distributed intervals (avg. 1 hour)
    • experiments run for 6 hours.
slide22

“Convex Hull”

  • -upper bound on performance
  • does this hide the “real”
  • performance?
dhts considered
DHTs considered
  • Tapestry, Pastry, Chord, Kademlia: normal implementations
  • Kelips: slightly different
slide24

Kademlia

Tapestry

Kelips

Chord

kelips the strawman
Kelips, the strawman
  • Tapestry, Pastry, Chord, Kademlia: normal implementations
  • Kelips: slightly different
    • nodeIDs treated as filetuple: routing within each affinity group is through a random walk (thus )
    • Not what Kelips was originally intended for. Adds extra layer of files being inserted and deleted all the time!
      • Original Kelips (if studied) would use bandwidth that was higher by a constant but give much shorter lookup latencies (due to the replication)!
slide26

Kademlia

Tapestry

Kelips

Chord

Expected Kelips

behavior

slide27

Tapestry

  • Higher base=>
    • short paths, but…
    • same lookup latency
    • more entries => b/w

Base

Stabilization int

(stab)

Reasonable value 72 s

Base low, stab low

slide28

Chord

Fixed: 72 s stab for succ/pred

Varied: stab for routing entries

Base

Base value makes no

difference

Bases 2 and 8 enough

conclusions and discussion1
Conclusions and Discussion
  • Upper Bound of performance for all DHTs considered are similar
    • Is this enough?
    • Why not average performance curves?
  • Parameter tuning is essential to performance
    • Design DHTs that tune parameters adaptively?
  • Comparing different systems is a tricky task!
avcast availability dependent reliability for multicast

AVCast: Availability-Dependent Reliability for Multicast

Thadpong Pongthawornkamol

Indranil Gupta

tpongth2@uiuc.edu

IEEE Symposium on Reliable Distributed Systems (SRDS), 2006

motivation
Motivation
  • Multicast Applications
    • PlanetLab multicast (e.g., CDNs)
    • Pub/Sub (e.g., p2p RSS dissemination [Corona, FeedTree])
    • Enterprise Cluster multicast
  • Differentiated multicast reliability
    • No necessity for 100% message delivery guarantee
    • Different multicast receivers get different delivery probability
    • The differentiation is controllable
  • Goals
    • simplicity, scalability, fault-tolerance, churn-resistance

Gossip-based approach on top of unstructured overlay

motivation cont
What should be used as priority in differentiation?

Monetary cost

Link bandwidth

Contribution to the system

…?

Host availabilityas priority

% time online

Fairness

Incentivizing availability

Motivation (Cont.)

Availability trace in Overnet file sharing system (Bhagwan et al ‘03)

challenges
Challenges
  • How to obtain each node’s availability?
    • Self-monitoring may cause lying
    • Using DHTs does not give consistent monitors for a given node
    • Peer-monitoring must sustain system dynamism
      • The relation between a node and a monitor must be
        • verifiable and
        • consistent over time
  • Two parts of the problem:
    • Global: Specifying relation between availability and reliability (a reliability predicate)
    • Global to Local: Given a reliability predicate, design a local algorithm to implement the predicate as global emergent behavior
availability dependent reliability predicates
Availability-Dependent Reliability Predicates
  • Specify a node x’s multicast reliability rx as a function of its availability ax
    • rx = f(ax)
      • f is an arbitrary function
    • Some Examples:
      • Uniform: rx=constant
      • Proportional: rx=ax
      • Threshold-Linear: rx=c1 if ax < t, otherwise rx=ax
      • Bimodal: rx=c1 if ax < t, otherwise rx=c2 (which is > c1)
  • Basic Idea: Allow application developer/system administrator to decide which predicate is best
avcast implementing these reliability predicates
AVCast: Implementing these Reliability Predicates
  • An availability-aware, gossip-based multicast system
    • consistent, verifiable peer-monitoring
    • Gossip-based multicast forwarding using the availability information
  • AVCast consists of two main components

I. Monitoring component

      • Monitor availability of other nodes

II. Multicast component

      • Based on the availability information of other nodes, forward messages in gossip style
model assumptions
Model & Assumptions
  • Multiple-source gossip-based multicast
  • An approximation of long-term stable system size (N)
  • Crash-recovery nodes
  • Each node has persistent storage (available only when node online)
definition

message

message

message

message

message

message

message

X

X

x

online

offline

online

offline

time

Definition

Availability of a node x,ax= fraction of time x online

Live messagesof a nodex = multicast messages initiated during x’s online period

Reliability at a node x, rx = fraction of live messages that xreceives during its online period

ax= 5/8 = 0.625

#live messages of x = 5

rx = 3/5 = 0.6 (and not 5/7 )

i monitoring component ts ps
Each node xperiodically monitors a set of other nodes called target set of x, or TS(x)

At the same time, each node x is being monitored by a set of other nodes called pinging set of x, or PS(x)

y TS(x)iff x PS(y)

u

p

v

x

q

w

au = 0.37

av = 0.14

aw = 0.60

az = 0.91

z

r

TS(x)

PS(x)

I. Monitoring Component: TS & PS
remembering availability
Monitored availability information is aged

anew = (1-)aold + anow

On failure, aoldcan be retrieved from persistent storage

u

X

p

X

v

x

q

w

au = 0.37

av = 0.14

aw = 0.60

az = 0.91

z

r

TS(x)

PS(x)

Remembering availability

 = 0.1

= 0 if offline

= 1 if online

Decay rate

au = (1- )0.37 + x1 = 0.43

av = (1- ) 0.14 + x0 = 0.13

aw = (1- ) 0.60 + x1 = 0.64

az = (1- ) 0.91 + x1 = 0.92

selecting ps ts
Selecting PS & TS
  • Randomization won’t work (e.g., Eclipse attacks [Singh et. al 06])
  • The relation needs to be randomized and consistent over time
  • Our approach [Consistency Condition]: use a globally consistent hash function H(x,y) with range [0,1]

y TS(x) iff H(x,y) < K / N

  • Assuming uniformity of id space, each node’s TS and PS contains an expected K nodes
  • A joining node x broadcasts the JOIN message to all other nodes
  • Upon receiving JOIN message, any node y such that H(x,y) < K/N or H(y,x) < K/N adds x into its TS(y) or PS(y) respectively. y then replies to x

Expected size of target set = O(log N)

(Fixed parameter) Approximate total number

of (online) nodes

ii multicast component
Multicasts forwarded along TS-PS graph

When a node x receives a multicast message, it picks up Conline nodes from its TS(x) to forward the message at once

With probability p(ay), x forwards message to online node y  TS(x)

Problem: choose function p(.) and number Ctoimplement desired reliability predicate

C =2

u

X

p

X

v

x

q

w

au = 0.43

av = 0.13

aw = 0.64

az = 0.92

z

r

TS(x)

PS(x)

II. Multicast Component

P(0.43)

M

P(0.64)

M

P(0.92)

M

avcast supported basic predicates
Uniform Reliability

(rx= constant R)

p(a) = 1/|TSon|

TSon = online nodes in target set TS

C is fixed to a globally defined constant

R = 1 - eRC

Availability-proportional reliability (rx= ax)

a = availability

C = number of copies forwarded

K = target set size (|TS|)

E[a2] = the average of square of availability values in target set (sample of global distbn.)

Each node x adjusts C so thatyTS(x) p(ay) ≤ 1

AVCast-Supported Basic Predicates
avcast can support any generic predicate
AVCast can Support Any Generic Predicate!
  • Generic Reliability Predicate (rx= f(ax))
    • a = availability
    • C = number of copies forwarded
    • K = target set size (|TS|)
    • E[r.a] = the average of reliability-availability product in target set (sample of global distbn.)
  • Each node x adjust C so thatyTS(x) p(ay) ≤ 1
evaluation
Evaluation
  • AVCast prototype implemented in 3,000-line C++ code
  • Simulate a system of 1,442 nodes
    • availability values from Overnet file-sharing trace [Bhagwan et. al 03]
    • Average availability in traces: 0.3
  • Simulate for 6000 protocol rounds
    • One message injected per round during last 3000 rounds
    • K=31
result membership monitoring
Result : Membership & Monitoring
  • Simulated with target set size (K=3logN~31)
  • αis a parameter used to adjust K dynamically at each node (to maintain target set size)
system wide availability measured at each node
System-wide availability measured at each node
  • Estimated based on TS’s availabilities
  • The variance decreases when K increases

global = 0.315

result uniform proportional multicast predicates
Uniform reliability (P = 1/|TSon|) for K = 2logN

Availability-proportional Reliability (rx=ax) for K = 3logN

Result : Uniform & Proportional Multicast Predicates

C=fanout of gossip

result bimodal and threshold linear predicates
Result: Bimodal and Threshold-Linear Predicates

rx = 0.3 if ax< 0.5

0.9 otherwise

rx = 0.3 if ax< 0.3

ax otherwise

avcast conclusions
AVCast: Conclusions
  • Availability-aware multicast reliability differentiation
  • Consistent, verifiable peer-monitoring
  • Gossip-based multicast with adjustable forwarding probability function and fanout
  • AVCast currently supports any reliabilitypredicate. Examples are:
    • Uniform, Proportional, Bimodal, Threshold-Linear
lecture summary and discussion
Lecture Summary and Discussion

For peer to peer systems:

  • Characteristics of Churn: Node availability varies across nodes, and time. Failures are independent.
  • Effect of Churn: Upper bound reasonable, but real performance?
  • Churn-Resistance: Can incentivize nodes to have higher availability by tying end-host reliability to host’s availability. AVCast for multicast.

Research Directions

  • Adaptive DHTs : adaptivity to changing network conditions, changing churn rates, changing time-of-day…
  • Stress-Resistant Protocols: more challenging than mere fault-tolerance
  • AV* protocols: AVMON (ICDCS 07), AVMEM (ongoing), …
bigger direction stress resistance in distributed systems
Bigger Direction: Stress-Resistance in Distributed Systems

Node failures Massive failures

Intermittent message losses Network outages Network partitions

Perturbation Churn

Varying Request Rate Flash Crowds

Static Objects Dynamic Objects