1 / 35

Software Safety: An Oxymoron?

Software Safety: An Oxymoron?. March 29, 2007 Ken Wong, Ph.D., Senior Systems Analyst McKesson Medical Imaging Group. Points to Ponder*. A system can be correct and reliable and yet unsafe Software safety is not about bugs

marrim
Download Presentation

Software Safety: An Oxymoron?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Safety:An Oxymoron? March 29, 2007 Ken Wong, Ph.D., Senior Systems Analyst McKesson Medical Imaging Group

  2. Points to Ponder* • A system can be correct and reliable and yet unsafe • Software safety is not about bugs • Program testing can be used to show the presence of bugs, but never to show their absence * We will return to these statements in the discussion

  3. Outline • Introduction to Software Safety • Software: Meet System Safety • System Safety: Meet Software • Verifying Software Safety

  4. Introduction toSoftware Safety

  5. Software In the Real World • Therac 25 accidents • Ariane 5 Flight 501 explosion • Titan 4 Centaur/Milstar failure • TCAS collision near Uberlingen, Germany

  6. Ariane 501

  7. Ariane 501 Events • Destruction of Ariane 501 on 4 June 1996 (from final report): • nominal behaviour of the launcher up to H0 + 36 seconds; • failure of the back-up Inertial Reference System (SRI) followed immediately by failure of the active SRI;

  8. Building Dependable Software … Quality Safety Correctness Reliability Security

  9. Safety is a Distinct Property • Safety is a distinct part of the interlocking puzzle of how to build dependable software • A system can be “correct” and “reliable” and yet unsafe! • Improved software process alone does not mean a safer system • Note: These can be a contentious claims even among safety engineers.

  10. Safety is … avoiding mishaps!

  11. Software: Meet System Safety

  12. “Is it Safe”? Christian Szell: Is it safe? Babe: Yes, it's safe, it's very safe, it's so safe you wouldn't believe it. - Marathon Man 1976

  13. System Safety • “System Safety” is a systematic approach to safety primarily developed in the US for the aerospace and defense industries • Spreading to other industries, e.g., health care • Focus on managing system hazards • E.g., FDA Quality System Regulation recommends “risk analysis” (A.K.A. hazard analysis)

  14. Hazard ID Hazard Analysis Hazard Mitigation System Safety Risk Assessment Safety Verification

  15. Hazard • A hazard is the system’s potential contribution to a mishap • E.g., brake failure, engine overheating • Key is understanding the system environment

  16. Hazards and Mishaps hazard causes hazard mishap System Environment

  17. Ariane 501: SRI Bug? • Uncaught exception from floating point conversion • From high value of BH (Horizontal Bias) • Programming 101! • Conversion check deliberately removed for performance reasons • SRI reused from Ariane 4 • Check not required for Ariane 4 trajectory

  18. Safety is a System Property • SRI worked exactly as specified – for Ariane 4! • Ariane 5 trajectory different from Ariane 4 • SRI spec did NOT include Ariane 5 trajectory data • SRI NOT tested with Ariane 5 trajectory data • “Safety” cannot be understood without knowing the operational environment • FDA “use-related” vs “device failure” hazards • E.g., TCAS collision in Germany

  19. When Software Met Safety • … there was a definite risk in assuming that critical equipment such as the SRI had been validated by qualification on its own, or by previous use on Ariane 4. • ARIANE 5 Flight 501 Failure Report

  20. System Safety: Meet Software

  21. In the beginning (or Europe) …* • Mechanical systems with well understood designs • Hazards caused by component failure from random hardware faults • Mitigation through integrity and redundancy * Myth, but there is underlying truth in all good myths

  22. Fault Tree Analysis Basic Event Steering Fails Intermediate Event OR Steering Assembly Fails Driver Error OR OR Steering Wheel Fails Drive Shaft Fails Steering Control Software Fails

  23. Is Software Another Component? • What is the probability that the steering control software fails? • If software is just another component: • Software cannot wear out or breakdown like a mechanical component • Only “fault” is a programming bug • Assuming programmers do their job, failure rate should be zero* *Paraphrased from talk by a system safety engineer

  24. Software Revealed Basic Event Steering Fails Intermediate Event OR Steering Assembly Fails Driver Error OR OR Steering Wheel Drive Shaft Fails Steering Control Software Fails

  25. The Software Werewolf Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into horrors … The familiar software project, at least as seen by the nontechnical manager, has something of this character … • Frederick P. Brooks, Jr. from No Silver Bullet : Essence and Accidents of Software Engineering

  26. Ariane 501: Safety in Numbers? • In response to “fault”, the Primary SRI was deliberately shutdown • Attempt made to switch to backup SRI • Typical strategy in face of random failures • However, BOTH SRIs shutdown! • “Fault” due to same design in both SRIs • Exception in non-essential component

  27. Safety is an Emergent Property • Software safety is not about “faults” • Many potential “faults” but not all created equal – most have no impact on safety • “Correct” behaviour can contribute to the hazard! • Hazards can emerge from complex interactions between “correct” components

  28. When Safety Met Software • An underlying theme in the development of Ariane 5 is the bias towards the mitigation of random failure. • Board wishes to point out that software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system. • ARIANE 5 Flight 501 Failure Report

  29. Verifying Software Safety

  30. Software and Safety Process Hazards Requirements Hazard ID, Analysis and Mitigation Design Safety Verification Verification Source Code

  31. Limits of Testing Program testing can be used to show the presence of bugs, but never to show their absence • E. Dijkstra in Structured Programming

  32. Hazard-Driven Testing • Focus on hazard – force it to occur • Consider: • Hazard risk (“risk-based testing”) • Mishap scenarios • Hazard causes identified during hazard analysis • Problem reports/issues with safety implications • See Jeffrey J. Joyce and Ken Wong, Hazard-driven Testing of Safety-Related Software

  33. Summary and Conclusions • Safety is a distinct property • Safety is a system property • Operational and development environment factors • Safety is an emergent property • Hazards can emerge from complex interactions between “correct” components

  34. Safety and Software: Happy Together?

  35. References* • ARIANE 5 Flight 501 Failure Report by the Inquiry Board, Paris, July 1996 • Frederick P. Brooks, Jr., No Silver Bullet : Essence and Accidents of Software Engineering, Computer Magazine, April 1987 • Jeffrey J. Joyce and Ken Wong, Hazard-driven Testing of Safety-Related Software, 21st International System Safety Conference, Ottawa, Ontario, August 4-8, 2003 *All available on-line

More Related