1 / 82

Engineering a Safer World

Engineering a Safer World. Traditional Approach to Safety. Traditionally view safety as a failure problem Chain of random, directly related failure events leads to loss Establish barriers between events or try to prevent individual component failures

caelan
Download Presentation

Engineering a Safer World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Engineering a Safer World

  2. Traditional Approach to Safety • Traditionally view safety as a failure problem • Chain of random, directly related failure events leads to loss • Establish barriers between events or try to prevent individual component failures e.g., redundancy, overdesign, safety margins, punishment and training for operators, defense in depth • Analysis techniques • Focus on probability of component failures and combinations of component failures • Where do they get the probabilities? • Historical failure rates • Make up numbers for human error and software error or ignore these in the analysis

  3. Chain-of-events example

  4. Confusing Safety and Reliability

  5. It’s only a random failure, sir! It will never happen again.

  6. Limitations of Traditional Approach • Systems are becoming more complex • Accidents often result from interactions among components, not just component failures • Too complex to anticipate all potential interactions • By designers • By operators • Indirect and non-linear interactions • Can no longer exhaustively test and get out design errors • Omits or oversimplifies important factors • Human error • New technology, particularly software • Culture and management • Evolution and adaptation

  7. Accident with No Component Failures

  8. Types of Accidents Component Failure Accidents Single or multiple component failures Usually assume random failure Component Interaction Accidents Arise in interactions among components Complexity getting to point where cannot anticipate or guard against all potential interactions Exacerbated by introduction of computers and software

  9. Interactive Complexity • Arises in interactions among system components • Software allows us to build highly coupled and interactively complex systems • Coupling causes interdependence • Increases number of interfaces and potential interactions • Too complex to anticipate all potential interactions • By designers • By operators • May lead to accidents even when no individual component failures

  10. Non-Linear Complexity • Cause and effect not related in an obvious way • Systemic factors in accidents, e.g., safety culture, work environment, production pressures, etc. • Our safety engineering techniques assume linearity • Systemic factors affect events in non-linear and indirect ways

  11. Dynamic Complexity • Related to changes over time • Systems are not static, but we often assume they are • Systems migrate toward states of high risk under competitive and financial pressures [Rasmussen] • Need to control and identify unsafe changes

  12. Software-Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer Unhandled controlled-system states and environmental conditions Merely trying to get the software “correct” or to make it reliable will not make it safer under these conditions.

  13. Software-Related Accidents (2) Software may be highly reliable and “correct” and still be unsafe: Correctly implements requirements but specified behavior unsafe from a system perspective. Requirements do not specify some particular behavior required for system safety (incomplete) Software has unintended (and unsafe) behavior beyond what is specified in requirements.

  14. Event-based Thinking Systems Thinking

  15. STAMP:System-Theoretic Accident Model and Processes Based on Systems Theory (vs. Reliability Theory)

  16. Applying Systems Thinking to Safety • Accidents involve a complex, dynamic “process” • Not simply chains of failure events • Arise in interactions among humans, machines and the environment • Treat safety as a dynamic control problem • Safety requires enforcing a set of constraints on system behavior • Accidents occur when interactions among system components violate those constraints • Safety becomes a control problem rather than just a reliability problem

  17. Safety as a Dynamic Control Problem • Examples • O-ring did not control propellant gas release by sealing gap in field joint of Challenger Space Shuttle • Software did not adequately control descent speed of Mars Polar Lander • At Texas City, did not control the level of liquids in the ISOM tower; • In DWH, did not control the pressure in the well; • Financial system did not adequately control the use of financial instruments

  18. Safety as a Dynamic Control Problem (2) • Most major accidents arise from a slow migration of the entire system toward a state of high-risk • Need to control and detect this migration • A change in emphasis: “prevent failures” ↓ “enforce safety constraints on system behavior”

  19. Example Safety Control Structure

  20. Qi Hommes, 2012

  21. Safety as a Control Problem (3) • Goal: Design an effective control structure that eliminates or reduces adverse events. • Need clear definition of expectations, responsibilities, authority, and accountability at all levels of safety control structure • Entire control structure must together enforce the system safety property (constraints) • Physical design (inherent safety) • Operations • Management • Social interactions and culture

  22. Role of Process Models in Control • Controllers use a process model to determine control actions • Accidents often occur when the process model is incorrect • Four types of hazardous control actions: • Control commands required for safety are not given • Unsafe ones are given • Potentially safe commands given too early, too late • Control stops too soon or applied too long Controller Control Algorithm Process Model Control Actions Feedback Controlled Process 22 (Leveson, 2003); (Leveson, 2011)

  23. Processes System Engineering (e.g.,Specification, Safety-Guided Design, Design Principles) Risk Management Management Principles/ Organizational Design Operations Regulation Tools Accident/Event Analysis CAST Hazard Analysis STPA Specification Tools SpecTRM Organizational/Cultural Risk Analysis Identifying Leading Indicators STAMP: Theoretical Causality Model

  24. Learning from Events • CAST: Causal Analysis based on System Theory • Goal: more complete causal analysis of accidents, incidents, and adverse events

  25. Facts about Accidents • Almost never have single causes • “Root cause seduction” • Accidents are complex processes • Usually involve flaws in • Engineered equipment • Operator behavior • Management decision making • Safety culture • Regulatory oversight

  26. Root Cause Seduction • Assuming there is a root cause gives us an illusion of control. • Usually focus on operator error or technical failures • Ignore systemic and management factors • Leads to a sophisticated “whack a mole” game • Fix symptoms but not process that led to those symptoms • In continual fire-fighting mode • Having the same accident over and over

  27. Three Levels of Analysis • What (events) • e.g., explosion • Who and how (conditions) • e.g., bad valve design, operator did not notice something • Why (systemic factors) • e.g., production pressures, cost concerns, flaws in design process, flaws in reporting process, etc. • Why was safety control structure ineffective in preventing the loss?

  28. Goals for an Accident Analysis Technique Minimize hindsight bias Provide a framework or process to assist in understanding entire accident process and identifying systemic factors Get away from blame (“who”) and shift focus to “why” and how to prevent in the future Goal is to determine Why people behaved the way they did Weaknesses in the safety control structure that allowed the loss to occur

  29. Hazard Analysis • “Investigating an accident before it occurs” • Identify potential causal scenarios and try to eliminate them • Must be based on some model of how and why accidents occur • STPA (System-Theoretic Process Analysis) • Based on STAMP • Assumes accidents are more complex processes than just chains of component failure events

  30. STPA (System-Theoretic Process Analysis) • A top-down, system engineering technique • Identifies safety constraints (system and component safety requirements) • Identifies scenarios leading to violation of safety constraints; use results to design or redesign system to be safer • Can be used on technical design and organizational design • Supports a safety-driven design process where • Hazard analysis influences and shapes early design decisions • Hazard analysis iterated and refined as design evolves

  31. Steps in STPA • Establish fundamentals • Define “accident” for your system • Define hazards • Rewrite hazards as constraints on system design • Draw preliminary (high-level) safety control structure • Identify potentially unsafe control actions (high-level safety requirements and constraints) • Determine how each potentially hazardous control action could occur

  32. Steps in STPA • Establish foundation for analysis • Define “accident” for your system • Define hazards • Rewrite hazards as constraints on system design • Draw preliminary (high-level) safety control structure • Step 1: Identify potentially unsafe control actions (high-level safety requirements and constraints) [Step 1 STPA] • Step 2: Determine how each potentially hazardous control action could occur [Step 2 STPA] x

  33. Identifying Accidents and Hazards • Accident (Loss): An undesired or unplanned event that results in a loss, including a loss of human life or human injury, property damage, environmental pollution, mission loss, financial loss, etc. • Hazard: A system state or set of conditions that together with a worst-case set of environmental conditions, will lead to an accident (loss)

  34. Accident (Loss): An undesired or unplanned event that results in a loss, including a loss of human life or human injury, property damage, environmental pollution, mission loss, financial loss, etc. • Hazard: A system state or set of conditions that together with a worst-case set of environmental conditions, will lead to an accident (loss)

  35. Identify High-Level Safety Constraints (Requirements)

  36. Safety Constraints vs. Safety Requirements • Design constraints: • ACC must not violate separation requirements with object ahead • ACC must not brake too abruptly • Design requirements • ACC shall maintain a TBD amount of distance between the vehicle and the object in front when engaged • ACC shall limit vehicle deceleration to no more than TBC m/s2

  37. In-Class Example In-Trail Procedure (ITP) A new passing procedure for oceanic flights

  38. In-Trail Procedure (ITP) Enables aircraft to achieve FL changes on a more frequent basis. Designed for oceanic and remote airspaces not covered by radar. Permits climb and descent using new reduced longitudinal separation standards. Potential Benefits Reduced fuel burn and CO2 emissions via more opportunities to reach the optimum FL or FL with more favorable winds. Increased safety via more opportunities to leave turbulent FL. But standard separation requirements not met during maneuver

  39. ITP Procedure – Step by Step Check that ITP criteria are met. If ITP is possible, request ATC clearance Check that there are no blocking aircraft other than Reference Aircraft in the ITP request. Check that ITP request is applicable (i.e. standard request not sufficient) and compliant with ITP phraseology. Check that ITP criteria are met. If all checks are positive, issue ITP clearance. Air Traffic Controller Flight Crew • When ITP clearance is received, check that ITP criteria are still met. • If ITP criteria are still met, accept ITP clearance via CPDLC. • Execute ITP clearance without delay. • Report when established at the cleared FL. Involves multiple aircraft, crew, communications (ADS-B, GPS) , ATC

  40. Accident and Hazard Definition for ITP Accident: Two aircraft collide Hazard?:

  41. Accident and Hazard Definition for ITP Accident: Two aircraft collide Hazard:Two aircraft violate minimum separation requirement

  42. Accident with No Component Failures

  43. Batch Reactor In-Class Exercise • What is the accident? • What is the system-level hazard (associated with that accident)? • What is the high-level system safety requirement (safety constraint)?

  44. Steps in STPA • Establish foundation for analysis • Define “accident” for your system • Define hazards • Rewrite hazards as constraints on system design • Draw preliminary (high-level) functional control structure • Step 1: Identify potentially unsafe control actions (high-level safety requirements and constraints) • Step 2: Determine how each potentially hazardous control action could occur

  45. Draw the Functional Control Structure • Identify major components and controllers (HINT: Start at very high level) • Label control and feedback arrows • Create the preliminary process models

  46. ITP High-Level Control Structure • What are the major components and controllers of the system?

  47. ITP High-Level Control Structure • What are the major components and controllers of the system? ATC, pilot, aircraft • Who controls who or what?

  48. ITP High-Level Control Structure • What are the major components and controllers of the system? ATC, pilot, aircraft • Who controls who or what? ATC Pilot Aircraft

  49. ITP High-Level Control Structure (2) • What commands are sent and feedback provided? ATC Pilot Aircraft

  50. ITP High-Level Control Structure (2) • What commands are sent and feedback provided? ATC Clearance to pass (to execute ITP) Requests Acknowledgements Pilot Execute ITP maneuver A/C status, position, etc. Aircraft

More Related