190 likes | 353 Views
The Therac-25, a medical particle accelerator from the 1980s, exemplifies catastrophic software failures resulting in serious injuries and fatalities. This case study examines the combination of software bugs, such as the Increment Overflow bug, and poor engineering practices that led to the deaths of three patients and injuries to three others. We explore the implications of design decisions, lack of mechanical interlocks, and the reliance on software systems without adequate auditing. This analysis serves as a vital lesson in software engineering ethics and the necessity for rigorous testing protocols.
E N D
Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013 Final Presentation
Recap: Software that Kills • Early to mid-1980s • Revolutionary Double-Pass medical particle accelerator • Moved to complete software control • Injured 6 people, killing 3 of them • Two different underlying bugs • But it was more than just bugs • Poor software engineering practices
Let’s Look at the PIG in Detail Don’t Kill or Injure People Injures & Kills People Increment Overflow Bug Indecipherable Error Messages ++ ++ Malfunction 54 Bug + Operator “Malfunction Fatigue” + 40 Malfunctions/ Day
Assembly Language Programming Injures & Kills People Increment Overflow Bug Assembly Language Programming Programming Shortcuts ++ ++ Malfunction 54 Bug + + + + + Bad Testability
Code Reuse Code Reuse Injures & Kills People Increment Overflow Bug Expensive Hardware, etc RT Synchronization Issues ++ ++ Homemade RT-OS Malfunction 54 Bug + + + ++ + + “Working Code”
Moving to Complete Computer Control !! Move to Computer Control Mechanical Controls “Less Cool” Toxic Situation Increment Overflow Bug Injures & Kills People No Mechanical Interlocks ++ ++ Mechanical Controls Fail Malfunction 54 Bug ++ ++ Code Reuse ++ + +
Cross-Cutting Issues Injures & Kills People Increment Overflow Bug Faith in Software No Auditing Hardware Focused Organization ++ ++ Malfunction 54 Bug
The Real Issue • A combination of: • Code Reuse • The removal of the mechanical interlocks • An unreasonable faith in Software • General bad software engineering practice
The Solution Domain • Based in early 1980’s technology • Hindsight is one thing • But 30 years of technological innovation is cheating • Based on my experiences • I was a junior engineer starting my career in process & manufacturing systems
Control System Design • In the 1980s – and now • Uses a “Distributed Control System” • Provides for strong segregation between the layers • Early user of networking technology • Typically combined • Done with a“PLC”
PLC: Programmable Logic Controller • In 1980s used “Ladder Logic” graphical programming language • Program spec-ed by an engineer – Programmed by an electrician • Consider…
PLC: Ladder Logic Pump Programmable by an Electrician Pump On Switch Valve Position Open
The Rest of the System • Multi-bus system and enclosure • Intel 8086 with 8087 coprocessor • 512 kilobytes of memory • 20 megabyte disk drive: program, logs and audits • Mark Williams “C” Compiler • Intel iRMX-86 real-time operating system • RS-232 and RS-485 serial connections • Commercial terminal management software • ANSI compatible terminal (e.g. VT-100) All this is Off the Shelf
Error Messages • Even with something like aVT-100 Green Screen a “windowed” interface is possible • Lots of terminal management software was available commercially to handle this PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITOR UNITS 50 50 200 TIME (MIN) 0.27 1.00 GANTRY ROTATION (DEG) 0.0 0 VERIFIED COLLIMATOR ROTATION (DEG) 359.2 359 VERIFIED COLLIMATOR X (CM) 14.2 14.3 VERIFIED COLLIMATOR Y (CM) 27.2 27.3 VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777 OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND: PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITO┌──────────────────────────────────────┐ TIME │ Error 54: │ │ This is a serious error and could │ GANTRY ROT│ compromise patient safety │ VERIFIED COLLIMATOR│ The system must be reset │ VERIFIED COLLIMATOR│ [Enter] │ VERIFIED COLLIMATOR└──────────────────────────────────────┘ VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777 OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:
Final System Design Intel 8086/8087 Running iRMX-86 Programmed in “C” UI Implemented Using Commercial Terminal Manager Software PLC Programmed in Ladder Logic
References • “Medical Devices – The Therac-25”,Levenson, Nancy.http://sunnyday.mit.edu/papers/therac.pdf • “An Investigation of the Therac-25 Accidents”, Levenson, Nancy and Turner, Clark S., IEEE Computer, Vol. 26, No. 7, July 1993, pp. 18-41http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html • “Fatal Dose - Radiation Deaths linked to AECL Computer Errors”,Rose, Barbara Wade, Saturday Night (magazine), June, 1994http://www.ccnr.org/fatal_dose.html • “Safety-Critical Computing: Hazards, Practices, Standards, and Regulation”, Jacky, Jonathan, http://staff.washington.edu/jon/pubs/safety-critical.html • “Therac-25”,Wikipediahttp://en.wikipedia.org/wiki/Therac-25