Human reliability analysis – challenges in modelling operational risk

Human reliability analysis – challenges in modelling operational risk Tim Bedford Strathclyde Business School University of Strathclyde

Objectives • Discuss modelling issues surrounding human reliability issues in operational risk • Consider how time dynamics can be incorporated, and the potential benefits and difficulties • Work done on safety relevant to other operational risks

Example – Lambrigg Derailment February 2007, Virgin train derails between Preston and Carlisle 1 Fatality, 22 Hospitalised Primary cause identified as faulty set of points

Inquiry Findings • Deficiencies in the inspection and maintenance regime resulted in the points falling into disrepair. These deficiencies included: • A breakdown in the local management structure responsible for inspection and maintenance • The track patrolling regime’s systematic failure to inspect the area adequately • Quality standards not being communicated or executed in the proper manner • A lack of sample checking of the track to test inspection quality and arrangements

Inquiry Findings The patrol scheduled for 18 February 2007 was not done The QA regime did not identify failures in the reliability of inspection regimes, nor failures in application of best practice. Emergence of “Them & Us” culture Management structure based on activity, not location

Inquiry Findings High proportion of Staff on Temporary promotion Culture of “Learned Helplessness” Insufficient records on staff training and competencies Staff unsure of their contracted responsibilities Lapsed engineering qualifications

Is risk static? • Clearly not • Physical systems change through time, either through degradation or upgrading • Human systems change through time, as a result of operating procedures, staff ability, organisational changes etc

Should we be concerned about dynamically changing risks? • Maybe yes, maybe no…! • No – over time it averages out to the same as the “static” risk, so that cumulative risk is same. • Yes – If different risks change dynamically in a coupled way, then this can magnify the overall effect • Yes – If no intervention then the risk at the end may be lower than acceptable (eg often regulate annual risk) • Yes – If understanding the dynamics helps you create new strategies to reduce risks

Dynamic versus static statistical • PRA models usually assume rates/probabilities not time dependent r Worst case Statistical estimate with conf bounds Achievable t

Interacting dynamics of productivity and safety pressures D. L. Cooke and T. R. Rohleder, Learning from incidents: from normal accidents to high reliability, Sys Dyn Review

Feedback from incidents Examples: Accident Precursors; CIRAS D. L. Cooke and T. R. Rohleder, Learning from incidents: from normal accidents to high reliability, Sys Dyn Review

Human reliability models • In widespread use as part of Probabilistic Risk Analysis • Aim to “give a number” as well as understanding of source of risk. • Largely based on task analysis, breaking down human behaviour into steps (cognitive, decision, action etc). • Performance shaping factors influence probability of success, and may be common to more than one step • First generation methods • Eg THERP, HCR, HEART, JHEDI • Second generation methods • Eg ATHENA, CREAM • Third generation • Monte Carlo based – linking cognition based models to technical system dynamics

THERP HRA Tree Start SP pump 1; Selecting wrong control from functional group Table (20-12)3 =1E-3 1. Start confinement spray pumps Stress Mod high, skilled, dynamic (heavy task load) THERP Table (20-16)5a = 5 Dependency Action could start as early as 6 minutes, so dependency based on 10 minutes Operator 2 = complete = 1 Shift Super. = high = 0.5 Assumed all pumps are required 1 Start SP pump 2 2 Start SP pump 3 { 3 [1E-3 * 5 (stress) * .5 (dependency)] * 3 branches = 7.5E-3 Start LH pump 1 2. Start Low pressure pumps 4 Start LH pump 2 5 7.5E-3 { Start LH pump 3 6 FT open press. safety valve 1 3. Open Pressurizer safety valves 7 FT open PS valve 2 7.5E-3 8 (Depressurization) FT open PS valve 3 { Total HEP [(7.5E-3)*3] +3E-3 = 2.55E-2 EF from Table (20-20)7 = 5 9 Monitor primary system pressure & temp.; Table (20-10)1 4. Monitor primary system temperature & pressure 10 7.5E-3 3E-3

THERP Data Summary Table

What drives the main risks? • The standard HRA models, while useful do not appear to capture the main sources of risk • Accidents continue, and many (most?) are not due to random human failures • Models do give insight and guidance about risk reduction including prioritization • Qualitative approaches such as normal accident theory and HRO do not give guidance about prioritization, but may give insights about strategies for risk reduction

Organisational failure: Reason’s Swiss Cheese model

Modelling for understanding, or for optimization? • Models typically one of • Formative: inform system, organisation and process design, guiding management practice • Summative: used to support decisions on, e.g., adoption, licensing or maintenance, by modelling cost/benefit trades • Qualitative HR modelling tends to be formative. • Quantitative HR modelling should be summative, but if not modelling the most significant system behavior then maybe actually most value in formative sense (risk analysis rather than management)

Summative Modelling • Model building philosophy • Models appropriate to purpose • Cost-effective • Taking account of uncertainties • Models for DM should be able to include effects of intervention. • Hard and soft interventions possible • Hard example – employ extra staff member to increase capacity • Soft example – give employees performance feedback

Some dynamic approaches to HR • Holmberg et al (2000) • Suggested use of marked point process • David L. Cooke Thomas R. Rohleder (2008) • Used systems dynamics • Zahra Mohaghegh, Reza Kazemi, Ali Mosleh (2009) • Used hybrid approaches combining SD, PRA and BBNs • Lots of other dynamic risk modelling approaches, eg petri nets, living psa

Mohaghegh, Kazemi, Mosleh

Common framework • A marked point process requires specification of • Possible marks (event types) • Relevant history for each mark • The likelihood for a mark occuring, given the history • Broadly, all three approaches fit into this framework, with either SD or BBNs driving the likelihood.

Main difficulties • Complexity – existing models seem very complex… is this necessary for summative purposes? • Measurement scales – for soft interventions these are often vaguely defined and not sufficient to build a robust model • Elicitation – require ways of robustly assessing rates etc for these models • Dependencies – interventions may impact on many different aspects of the system • Model uncertainties – folding these into analysis of options

Possible approaches • Complexity – restrict attention to cost/benefit of “discrete” feedback (major accidents) and “continuous” feedback (eg CIRAS). However, for summative approach also need to account for model uncertainties, which makes more complex again! • Measurement scales – use locally valid subjectively defined scales • Elicitation – assess possible changes in system outcomes and derive parameters implicitly (inversion) • Dependencies – model through impact of intervention on common PSFs (eg workload) • Model uncertainties – simulation

+ Safety first culture Clear Quality standards - Quality drift Productivity focus Cost cutting Broad brush effects on HR

Example discrete feedback • System is designed to have exponential time to failure with MTTF 1000 years • However, due to lack of failures the system management becomes lax, and rate increases. When failure happens, system is reset to design standard. Suppose 1 failure per 30 years. + Hazard rate - Failure event

Model for failure rate is +t • MTTF is 30= • Solving gives =0.0017

Local measurement scales exampleSLIM – based on MCDA • Success Likelihood Index Methodology is an early HRA method • Combines Performance Shaping Factor scores using “multiattribute utility” method to quantify Human Error Probability • Key ideas • Ideal points on PSF scale, • Expert defined scores • Pairwise comparison for attribute weights • Two point calibration to identify scale length • Common PSFs provide dependency across HR elements

Conclusions • New growth in dynamic human reliability modelling • Approaches more applicable to service operations • Hybrid HR models with feedback loops give the possibility of modelling “soft” interventions • BUT many open problems in implementing robustly

Acknowledgements • Work in EPSRC funded project with Simon French, Jerry Busby, Emma Soane, David Tracy and others

Human reliability analysis – challenges in modelling operational risk