690 likes | 843 Views
Recapping. Dependabilitythe property of a system such that we can justifiably place our reliance on the service it delivers"Attributes, relationships, conflictsAvailabilityReliability?Continuity of Correct Service"Quantitative Measures ? POFOD, ROCOF, MTTFFault-Error-FailureFault removal/avo
E N D
1. Safety Glen Dobson
g.dobson@lancs.ac.uk
http://www.comp.lancs.ac.uk/~dobsong/teaching/dependability
2. Recapping Dependability
the property of a system such that we can justifiably place our reliance on the service it delivers”
Attributes, relationships, conflicts
Availability
Reliability
“Continuity of Correct Service”
Quantitative Measures – POFOD, ROCOF, MTTF
Fault-Error-Failure
Fault removal/avoidance/tolerance
Representative operational profiles hard to achieve
3. Overview Safety definition
Hazards
Compromise
Safety measures
Hazard avoidance
Hazard tolerance
Risk
4. Safety Is about… (during normal & abnormal operation)
Controlling potentially dangerous systems
Preventing injury or death of people
Preventing damage to environment
Viewed as a specialisation of reliability
Minimise occurrence of failures – specifically those with catastrophic consequences
5. Safety systems Direct safety (primary):
Safety critical system
System itself can cause damage/injury
Power station control, flight controller, etc.
Indirect safety (secondary):
Support system with safety implications
System can lead to damage/injury
Treatment db, maintenance manager, etc.
6. Hazard chain
7. Similarities ?
8. Examples Hazard
Live electrical cable on the lawn
Narrow coolant pipes
Incident
Lawn mower cuts through cable
Coolant pipes become blocked
Accident
Gardener gets electrocuted
Core meltdown
9. Value for human life
10. Compromise We must put a price on life and suffering
Perfect safety is impossible
Very high reliability is very expensive
Reach an “acceptable” compromise between:
Safety, Practicality, Cost
Otherwise we would never do anything!
Many social, technical and political issues
11. Example “I was a recall coordinator, my job was to apply the formula: Take the number of vehicles in the field (A), multiply it by the probable rate of failure (B), then multiply the result by the average out of court settlement (C). [If the result] (A x B x C) is less than the cost of a recall, we don’t do one.”
12. Manufacturer responsibility Because death/injury must be tolerated
Manufacturer is open to litigation
Fines by government agencies (e.g. environment agency)
Civil proceedings
Even imprisonment of employees
13. Manufacturer defence Demonstrate system “fitness for purpose”
“As Safe as Could Reasonably be Expected”
Demonstrate lack of negligence
Provide warnings (signs, labels, disclaimers)
Take out insurance !!!
14. Evaluating safety Safety is hard to measure
Often rely on “Judged” safety level
Estimates our “confidence level”
From “Very safe” to “Very unsafe”
Matter for professional judgement
Evidence supported argument
Should address product AND process
15. Factors influence judgement Reputation of developers
Maturity of development process
Adherence to standards
Well documented V&V:
Reviews/inspections
Static checking
Comprehensive testing
Formal proofs
Safety case or safety argument
16. System safety case Justify and defend system
Does not prove safety of system
Reasoned argument indicating safety
Demonstrates design and assessment
Presents “evidence” based on:
Expert engineering judgement
Probabilistic risk analysis
Demonstrating risks have been addressed
17. Proof by contradiction Systematic & mathematical approach
Show unsafe state can’t be reached
Conditions for hazard can’t exist
Focused on single aspect of system
Shorter than full formal method
“Semi-formal” (?) thus easy to understand
18. Safety Integrity levels Safety specified using integrity level
Exact quantitative value not always possible
Probability of accident occurrence:
Integrity level 4: 10-2 to 10-1
Integrity level 3: 10-3 to 10-2
Integrity level 2: 10-4 to 10-3
Integrity level 1: 10-5 to 10-4
(From IEC 1508 standard)
19. Give example systems For integrity level 4 (10-2 to 10-1) ?
For integrity level 3 (10-3 to 10-2) ?
For integrity level 2 (10-4 to 10-3) ?
For integrity level 1 (10-5 to 10-4) ?
20. Very high safety measures Use of v. high measures is problematic
Often impossible to verify achievement
We can’t test to such extremes
So can we build to these extremes?
Maybe such systems are too risky?
If we can’t check it - don’t build it!
21. Severity counts Not all failures have the same severity
We can put up with some minor ones… aim for the following integrity levels:
Negligible: 10-2 to 10-1
Minor effect: 10-4 to 10-3
Major effect: 10-6 to 10-5
Hazardous: 10-8 to 10-7
Catastrophic: 10-9 and lower
(From civil aircraft manufacturing)
22. Of negligible (10-2 to 10-1 ) ?
Of minor effect (10-4 to 10-3) ?
Of major effect (10-6 to 10-5) ?
Of hazardous (10-8 to 10-7) ?
Of catastrophic (10-9 and lower) ?
Give example systems
23. Hazards and failures Hazard viewed as specialised “fault”
Safety related failure
Wider socio-technical perspective
Hazards thus managed in similar way:
Hazard avoidance (c.f. fault avoidance)
Damage limitation (c.f. fault tolerance)
24. Accident prevention
25. Hazard avoidance & removal Formal proofs
Informal arguments
Managed development lifecycle
Hazard analysis:
Thought support tools
Checklists
Brainstorming
26. “Safe” development process Hazard analysis
Hazard management (logging, tracing)
Engineers with responsibility for safety
Extensive use of safety reviews
Safety certification
Detailed configuration management
27. Safety development lifecycle
28. Hazard analysis process
29. Hazard analysis collaboration Developers
Domain experts
Safety advisers
Managers
End user
Regulatory bodies
Certification organisation
30. Hazard analysis Long and time consuming
Difficult and complex
Expensive
Boring and tedious
Omission and error prone
Estimating probabilities and severities is hard
31. Hazard analysis process
32. Hazard identification Identify all possible hazards
Often many possible hazards
Hard to identify all possible hazards
Potential for hazard interaction
Most accidents are due to multiple hazards/incidents (Perrow 1984)
33. Identification mechanisms Introspection
Group brainstorming
Precedence and case studies
Thought support tools
Checklists
34. HazOp analysis Supports cooperation between experts
Aims to bridge the “culture gap”
Systematic “though support”
Prompt human operators
Entities and phenomenon
Domain specific “bad things”
Consider all combinations
Some make sense, others do not
35. HazOp concepts Intention - how system should operate
Guide word - abstract “bad things”
Parameter – changeable entity or phenomenon
Deviation - unintended operation (2 x 3)
Cause - cause of deviation
Consequence - results of deviation
Suggested action - prevent deviation
36. Example HazOp analysis Making “a nice cup of tea”
37. Possible Deviations More tea leaves - too strong
Less heat - poor brewing, cold tea
Milk late - (tea in first) proteins damaged
More sugar - too sweet
Other than comfy chair - ruin experience
38. Hazard analysis process
39. Hazard classification Nature of damage caused (e.g. toxic)
Example being road haulage labels
Probability of damage
Severity of damage
40. Hazard analysis process
41. Risk assessment Produce calculated risk values
Consider acceptability of risk:
Intolerable
As Low As Reasonably Practical (ALARP)
Acceptable
Consider social and political factors
Take into account costs of prevention
Help decide if action needs taking
42. Risk phenomenon Risk is a very strange thing
Subject to illogical thinking
Subject to political and social pressure
Perceived risk differs from actual risk
43. Perception of risk Big accident, many fatalities = high impact
Small accident, few fatalities = low impact
Even though there are many small accidents
“Total deaths” is not important !!!
What kills more people: Planes or Donkeys?
2004… 9000 trouser related accidents resulted in injury
44. Strange risk Train crash - many killed
Public outcry
Government forced into action
Introduce train protection system
Slower trains, increased fares
More passengers choose to drive
Cars are less safe than trains
More people die than if gov did nothing!!
45. Risk calculations Hazard probability (occurrence)
Incident probability (conversion)
Accident probability (completion)
Hazard severity (worst case damage)
Hazard risk =
haz_prob x incident_prob x accident_prob x haz_sev
46. Dimensions of risk Probability - numerical value or scale:
Frequent, Probable, Occasional, Remote, Improbable, Incredible (N.B. nothing is impossible!)
Severity - numerical value or scale:
Catastrophic, Hazardous, Major, Minor, Negligible, No effect
Risk - numerical (death/year) or scale:
Intolerable, Undesirable, Tolerable, Negligible
47. Risk estimation question? Identify potential accidents resulting from each of the following and give estimations of perceived and actual risks:
Driving on M6 in the snow
Flying on concorde
Riding a rollercoaster
Being an MSc student
48. Event tree analysis How hazards contribute to accidents
Interaction of hazards and events
Effect of combined hazards
Help reason about what could happen
Uses probability of hazards and events
Calculate probability of accident
Used in assessment of risk
49. Example system Boat with leaky hull
Sea water detection system
Automatic pump
Pump failure alarm
Level alarm
Manual pump available
50. Event tree analysis
51. Hazard analysis process
52. Hazard filtration Minimise set of hazards for analysis
Remove impossible hazards
Remove very improbable hazards
Remove very low risk hazards
Keep record of removed hazards!
Retain rationale for removal
53. Hazard analysis process
54. Hazard Decomposition Identify causes of each hazard
Often combination of factors lead to hazard
A single hazard may have different causes
Essential to understanding of each hazard
55. Fault tree analysis Systematic documentation of hazards
Can utilise probability of events
Tables of failure probability available for common components
Calculate probability of hazard
Tend to produce very large trees
Evolve greatly during analysis process
56. Fault tree analysis
57. Fault tree circuit example
58. Example fault tree
59. Hazard analysis process
60. Guard proposition Prevent causes of hazards:
Interlocks
Physical guards
Control software
Work practices and procedures
Block consequences of incidents
This overlaps with damage limitation…
61. Damage limitation
62. Limitation approaches Assertions and state checks
Exception handling
Safety states (Fail-safe systems)
Human flexibility
Incident reporting
Emergency procedure (e.g. fire drill)
63. Safety states Fail-controlled: graceful failure
Fail-uncontrolled: disgraceful failure
Fail-stop: halts with no output
Fail-silent: continue, but with no output
Fail-safe: halts and drops into safe state
Fail-operational: still some functionality
64. Human “components” What effect do humans have on a system
Inject unreliability and unpredictability?
Inject flexibility and resilience?
…Probably a bit of both
Make use of their advantages
Take account of their shortfalls
Modern planes still carry a pilot!
(opens up whole issue of trust)
65. Blame All failures are caused by humans:
Developers
Administrators
Operators
Operators are good scapegoats if things go bad
Especially if they are dead!
Often “operator” errors trace back to UI
66. FMECA (or just FMEA) Failure Mode - a way something can fail
Cause - what leads to failures
Effect - the consequences of failure
Severity - seriousness off effects
Occurrence - prob of cause occurring
Criticality - severity x occurrence
Current control - existing guard on cause
Detection - prob of success for control
Risk priority - criticality x detection
67. A320 Group exercise In your groups, identify as many potential hazards as you can for flight in the Airbus A320. Consider all socio-technical issues. Consider using the following identification mechanisms:
Freeform brainstorming
Precedence and case based comparison
HazOp analysis
68. Extended exercise For each hazard identified by the previous exercise, assign the following values:
Accident probability
Accident severity
Hazard risk
Possible guards
69. System evolution How would the hazards that you have identified be effected by the following:
Change in flight duration
Total removal of pilots
Near miss protection system
Increased security (e.g. air marshals)