recovery oriented software l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Recovery Oriented Software PowerPoint Presentation
Download Presentation
Recovery Oriented Software

Loading in 2 Seconds...

play fullscreen
1 / 26

Recovery Oriented Software - PowerPoint PPT Presentation


  • 393 Views
  • Uploaded on

Recovery Oriented Software Joao Magalhães Orientadores: Arndt von Staa, Carlos J. P. Lucena Motivation As important as trying to avoid bugs is to write software that can coexist with them. It is not a matter of “if your software will fail” , but “when it will fail” .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Recovery Oriented Software' - emily


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
recovery oriented software

Recovery Oriented Software

Joao Magalhães

Orientadores: Arndt von Staa, Carlos J. P. Lucena

motivation
Motivation
  • As important as trying to avoid bugs is to write software that can coexist with them.
  • It is not a matter of “if your software will fail”, but “when it will fail”.
  • Efforts shall be spent in minimizing the consequences when this moment comes.

© LES/PUC-Rio

motivation3
Motivation
  • We need mechanisms which:
    • prevent these bugs in
      • specification
      • architecture
      • design
      • coding standards
      • quality control prior to and after deployment
    • allow software to coexist with them:
      • trap and
      • exam bugs when they first occur.

© LES/PUC-Rio

what is recovery oriented software
What is Recovery Oriented Software?
  • ROS takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved
    • This perspective is supported both by historical evidence and by recent studies on the main sources of outages in production systems

© LES/PUC-Rio

what is recovery oriented software5
What is Recovery Oriented Software?
  • The key ideas behind ROS are:
    • Concentrate on minimizing the amount of failures in your software
    • Concentrate on reducing Mean Time to Repair (MTTR) (thus offering higher availability)
    • Concentrate on minimizing the consequences of failures

© LES/PUC-Rio

what is recovery oriented software6
What is Recovery Oriented Software?
  • The main axioms:
    • “It is impossible to build perfect software”
      • “We are only humans, born to make mistakes”. And, yes, this also applies to software
    • “Software failures can be tolerated (to some extent) if their consequences are minimized”
      • When MS Word fails, do you get mad because of the software failure, or because of the possible loss of work?
      • Some consequences to be analyzed:
        • Loss of life
        • Damages to equipment, ecology, enterprise
        • Loss of money
        • Loss of work
        • Time to restart
        • Time to restore the previous “workbench”

© LES/PUC-Rio

building recovery oriented software
Building Recovery Oriented Software
  • So, how to build recovery oriented software?
  • There are four important points:
    • Fault Prevention Effort
    • Fault Detection Effort
    • Fault Handling Effort
    • Fault Removal Effort

© LES/PUC-Rio

building recovery oriented software8
Building Recovery Oriented Software
  • Fault Prevention Effort
    • Effort spent during development time to avoid anomalies – bugs – in a software
      • Effort in good design
        • Allow for the use of stubs and mocks
      • Effort in tests (automated or not)
        • Generation of test cases
        • Validation of test cases
      • Use of Design by Contract

© LES/PUC-Rio

building recovery oriented software9
Building Recovery Oriented Software
  • Fault Detection Effort
    • Effort spent during development time to detect faults in runtime.
      • Hardware dedicated to fault detection
      • Data-structure validators
      • Self-test algorithms
      • Use of Design by Contract with executable assertions turned on
      • Software redundancy and/or hardware redundancy
        • Comparison of the results obtained from different sources can indicate problems
        • Use of oracles to predict expected measurements (in control systems)

© LES/PUC-Rio

building recovery oriented software10
Building Recovery Oriented Software
  • Fault Handling Effort
    • Effort spent during development time to handle detected faults in runtime.
    • Handling means recovering as gracefully as possible
      • Sometimes it is impossible to fully recover from an error
        • Degraded operation
        • Minimizing loss of data
    • Some examples:
      • Software redundancy and/or hardware redundancy
        • May avoid service interruption
      • Code that restores the system to a valid state

© LES/PUC-Rio

building recovery oriented software11
Building Recovery Oriented Software
  • Fault Removal Effort
    • Once a fault has been detected, the first idea would be to fix it.
    • However, sometimes the fault removal can be too expensive if compared to its impact on the system
    • An alternative solution could be trying to co-exist with the fault
      • Of course, this is definitely cannot be applied to every fault!

© LES/PUC-Rio

building recovery oriented software12
Building Recovery Oriented Software
  • Fault detection and fault handling require runtime effort
    • It is a continuous process
  • Fault detection is possible without human intervention
    • Or, at least, the detection of a possible fault can be automated, as the user may analyze the data available and decide that no fault is really present.
  • Fault handling may require human intervention
    • Sometimes, the user may want to keep a corrupted state in order to try to manually recover (and reduce the loss of data).

© LES/PUC-Rio

building recovery oriented software13
Building Recovery Oriented Software
  • By our previous experiences in developing high availability software, we think that balancing the resources spent in each effort is what makes the difference to achieve the desired level of quality
  • But, once a software is built, how do we determine if the desired level of quality has been achieved? Even more, would it be possible to generate a software development process that, by construction, guarantees the level of quality?
    • This is yet to be defined...

© LES/PUC-Rio

what extra effort does building a recovery oriented software requires
What extra effort does building a recovery oriented software requires?
  • This is difficult to prove, unless there were a great number of experiments (of our own), and there is only a few.
  • However, our feeling says that the “extra” effort spent in some issues end up reducing the efforts spent in other activities.

© LES/PUC-Rio

are there technologies available for building recovery oriented software
Are there technologies available for building recovery oriented software ?
  • We had some good experiences with some existing technologies/practices
    • Software components
    • Design by contract
    • Mock elements
    • Extreme Programming (specially pair programming)
    • Strict coding discipline
  • And also with some tools
    • Subversion/CVS
    • Eclipse
    • Valgrind

© LES/PUC-Rio

software components
Software components
  • Structuring a software in small components:
    • Provides better level of control over development complexities
    • Provides better level of control over fault detection
    • Enhances the chances of isolating (existing) anomalies
    • Enhances chances of gracefully recover from faults
    • Enhances chances of reuse (thus allow for the natural maturity of components as time goes by)

© LES/PUC-Rio

design by contract
Design by Contract
  • Using contracts and executable assertions:
    • Increase significantly the design and coding phases
    • Reduce dramatically the test and homologation phases
    • Enhances fault detection capacity

© LES/PUC-Rio

mock elements
Mock Elements
  • Using mock elements:
    • Allow for independently testing of components and groups of components

© LES/PUC-Rio

extreme programming
Extreme Programming
  • Using pair programming:
    • Enhances the quality and reduces the number of bugs in complex code

© LES/PUC-Rio

strict code discipline
Strict code discipline
  • Using a strict code discipline:
    • Creates a unique code appearance, thus reducing effort spent in understanding.
    • Enhances code productivity by reducing the number of coding problems

© LES/PUC-Rio

subversion cvs
Subversion/CVS
  • Using CVS/Subversion:
    • Provides the basis for team development
    • Keeps track of code changes and enhancements
    • Provides tools for release control

© LES/PUC-Rio

eclipse
Eclipse
  • Using eclipse:
    • Provides a unique interface for subversion/CVS even in different operating systems
    • Works as a workbench where plugins – like CDT and maven – can be installed to enhance productivity and automate tasks

© LES/PUC-Rio

valgrind
Valgrind
  • Using Valgrind:
    • Tool for UNIX systems that checks for memory leaks, and memory violations

© LES/PUC-Rio

conclusions
Conclusions
  • As there are not enough efficient tools to build bug-free software, building recovery oriented systems is the best effort that can be done.
  • Balancing efforts spent in fault prevention, fault detection, fault handling, and fault removal, seem to be a key to develop ROS
  • Some tools, practices and technologies contribute for building ROS

© LES/PUC-Rio

references
References
  • Design for Testability for Object Oriented Software. Jeffery E. Payne et al. Object Magazine, 2001
  • Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. David Patterson et al, http://www.stanford.edu/~candea/papers/roc_vision/roc_vision.html
  • Towards a Fault Tolerant Multi-Agent System Architecture. Sanjeev Kumar, Phillip Cohen. Agents 2000.
  • Generation of Self-Testing Components. Leonardo Mariani et al, 2003.

© LES/PUC-Rio

references26
References
  • Toward Systematic Design of Fault-Tolerant Systems. Algurdas Avizienis, IEEE, 1997
  • Merging components and testing tools: The Self-Testing COTS Components (STECC) Strategy. Sami Beyeda et al, 2004
  • Endo-Testing: Unit Testing with Mock Objects. Mackinnon T. et. al . XP2000.
  • Mocks aren`t stubs. Fowler, M. (2004) .Martin Fowler`s Blog.

© LES/PUC-Rio