using failure injection mechanisms to experiment and evaluate a grid failure detector
Download
Skip this Video
Download Presentation
Using failure injection mechanisms to experiment and evaluate a grid failure detector

Loading in 2 Seconds...

play fullscreen
1 / 20

Using failure injection mechanisms to experiment and evaluate a grid failure detector - PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on

Using failure injection mechanisms to experiment and evaluate a grid failure detector. Sébastien Monnet and Marin Bertier IRISA / INRIA, PARIS project-team. Systems evaluation. Simulations Fast/easy System model Formal proofs Reliable System model Experimentations on real testbeds

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Using failure injection mechanisms to experiment and evaluate a grid failure detector' - addison


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
using failure injection mechanisms to experiment and evaluate a grid failure detector

Using failure injection mechanisms to experiment and evaluate a grid failure detector

Sébastien Monnet and Marin Bertier

IRISA / INRIA, PARIS project-team

WCGC 2006 - Rio de Janeiro

systems evaluation
Systems evaluation
  • Simulations
    • Fast/easy
    • System model
  • Formal proofs
    • Reliable
    • System model
  • Experimentations on real testbeds
    • Real system code / real environment
    • Hard !

WCGC 2006 - Rio de Janeiro

running experimentations
Running experimentations
  • Find ressources
  • Deploy the system
  • Launch the test
  • Control the test
  • Get and analyze results

WCGC 2006 - Rio de Janeiro

experimenting fault tolerance
Experimenting fault-tolerance
  • Evaluate fault tolerance mechanisms
    • Fault-free runs
    • With failures
  • Fault prevention cost
  • Resilience to failures
  • Overhead due to failures (recovery, adaptation, etc.)

WCGC 2006 - Rio de Janeiro

volatility control needs
Volatility control - needs
  • Assumption: a stable testbed
  • Injecting failures
    • At large scale
    • Accurately
    • Reproducibly
      • Using failure scenarios

WCGC 2006 - Rio de Janeiro

jxta distributed framework jdf
JXTA Distributed Framework(JDF)
  • A tool to automate the tests of JXTA-based systems (Sun Microsystems, Paris research team)
  • Test description
    • Nodes file
    • Files to deploy file
    • XML file describing nodes profile
  • Set of scripts to deploy, launch and fetch results

WCGC 2006 - Rio de Janeiro

description language extension
Adding a specific XML tag for failure injection

<failure grp=“groupName”>

<failure dep=“profileName”>

Single failure

Correlated failures

(00) <network analyze-class="test.Analyze">

(01) <profile name="manager" replicas="1">

(02) <!-- peer information -->

(03) <peer base-name="peerA"/>

...

(11) <bootstrap class="test.MyClass1"/>

(12) <!-- argument -->

(13) <arg value="x"/>

(14) </profile>

(15) <profile name="non-manager" replicas="20">

(16) <peer base-name="peerB"/>

...

(23) <bootstrap class="test.MyClass2"/>

(24) </profile>

(25) </network>

Description language extension

WCGC 2006 - Rio de Janeiro

injecting failures when
Injecting failures - when ?
  • Active research field
  • A failure schedule generator
    • Input
      • The failure tags in the XML description file
      • Probabilistic parameters (MTBF)
    • Output
      • A new configuration file for JDF

Format: peerID=uptime

WCGC 2006 - Rio de Janeiro

injecting failures how
Injecting failures - how ?

WCGC 2006 - Rio de Janeiro

using failure injectors 1
Using failure injectors (1)
  • Launching a simple test
  • Correlated failures

WCGC 2006 - Rio de Janeiro

using failure injectors 2
Using failure injectors (2)
  • Refining the failure schedule

WCGC 2006 - Rio de Janeiro

failure detectors
Failure detectors
  • Basic building bloc for fault-tolerance mechanisms
  • Basic principle
    • Periodical Heartbeat exchanges (all-to-all)
    • On each node a suspects list is updating according heartbeats arrivals

WCGC 2006 - Rio de Janeiro

grid failure detectors gfd
Grid failure detectors (GFD)
  • Adaptability
    • Network load
    • Quality of service
  • Scalability
    • Hierarchical failure detectors
      • All-to-all within clusters
      • Leader-to-leader among clusters

WCGC 2006 - Rio de Janeiro

experimental testbed
Experimental testbed
  • Grid5000 grid platform
    • 9 cities inter-connected by Renater
      • Bandwidth: 1Gb/s (10Gb/s soon)
      • Latency: from 4 to ~30ms
    • In each city clusters provides high performance networks
      • Bandwidth: 1Gb/s
      • Latency: few micro seconds

http://www.grid5000.fr/

WCGC 2006 - Rio de Janeiro

experimental setup
Experimental setup
  • 64 nodes partitioned in 4 different cities

Cluster 1

Cluster 2

Cluster 4

Cluster 3

WCGC 2006 - Rio de Janeiro

failure injector alone
Failure injector - alone
  • MTBF = 1 minute
  • No failure dependencies

WCGC 2006 - Rio de Janeiro

correlated failures
Correlated failures
  • Adding a failure dependencies in cluster 1:

<failure dep=“cluster1-leader”>

Cluster1 leader

crashes

WCGC 2006 - Rio de Janeiro

failure d tection in subgroups
Failure détection in subgroups
  • No leader failures
  • No failure dependencies

WCGC 2006 - Rio de Janeiro

between groups
Between groups
  • Failure dependency in each cluster to avoid new leader selection

WCGC 2006 - Rio de Janeiro

conclusion
Conclusion
  • Evaluating a distributed system is complex
  • Running experimentations provides the ability to
    • Evaluate a new concept or software
    • Debug during implementation phase
  • Failure-injection mechanisms provide the ability to experiment fault-tolerance mechanisms
  • We have designed a failure injection tool that allows the tester to run large scale experiments
    • with various volatility conditions
    • In a reproducible manner

WCGC 2006 - Rio de Janeiro

ad