1 / 14

Fault Tolerance: Basic Mechanisms

Fault Tolerance: Basic Mechanisms. mMIC-SFT September 2003 Anders P. Ravn Aalborg University. Fault Tolerance. Means to isolate component faults. ... And mask them. Prevents system failures. May increase system dependability. Dependability - means. Fault prevention Fault tolerance

rozene
Download Presentation

Fault Tolerance: Basic Mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University

  2. Fault Tolerance Means to isolate componentfaults ... And mask them Prevents systemfailures May increase systemdependability

  3. Dependability - means • Fault prevention • Fault tolerance • Error Removal • Failure Forecasting BW p. 106, ...

  4. Fault Tolerance

  5. Full tolerance • Graceful Degradation • Fail safe FT - levels BW p. 107

  6. Retry ... ... Try Try Try FT basis: Redundancy • Time • Space Try Retry BW p. 109

  7. N-version programming V1 V3 V2 Comparison vectors (votes) Driver (comporator) Comparison status indicators Comparison points BW p. 109

  8. byzantine Fault classification (scope of N-VP) + + (+) ++ (+) + / (+) + / + + / + • physical (internal/external) • logical (design/interaction) • Origin • Kind • Property • omission • value • timing • duration (permanent, transient) • consistency (determinate, nondeterminate) • autonomy (spontaneous, event-dependent)

  9. Dynamic Redundancy • Error detection • Damage confinement and assessment • Error recovery • Fault treatment and continued service BW p. 114

  10. D Error Detection f: State x Input  State x Output • Environment (exception) • Application • Assertion: • precondition (input) • postcondition (input, output) • invariant(state, state’) • Timing: • WCET(f, input) • Deadline (f,input) BW p. 115

  11. object I object I Damage Confinement • Static structure • Dynamic structure BW p. 117

  12. Error Recovery • Forward • Backward Repair the state – if you can ! • define recovery points • checkpoint state at r. p. • roll back • retry Domino effect BW p. 118

  13. Recovery blocks ENSURE acceptance_test BY { module_1 } ELSE BY { module_2 } ... ELSE BY { module_m } ELSE ERROR BW p. 120

  14. Failure exception Interface exception Request/response Interface exception Failure exception Request/response The ideal FT-component Normal mode Exception Handler BW p. 126

More Related