Can We Trust the Computer? Case Study: The Therac-25 Based on Article in IEEE-Computer, July 1993.
Introduction • More computers introduced into safety-critical systems • results in more accidents • One of the most widely reported accidents involved the Therac-25 • radiation therapy machine • June 1985 and January 1987 • Six known accidents - massive overdoses • causing deaths and serious injuries • Worst accidents in 35 year history of medical accelerators
Introduction(2) • Mistakes made not unique to this manufacturer • fairly common in other safety-critical systems • “A significant amt of SW for life-critical systems comes from small firms, especially in the medical industry; firms that fit the profile of those resistant to or uninformed of the principles of either system safety or software engineering.”
Introduction(3) • These problems are not limited to medical industry • Common belief that a good engineer can build SW, regardless of whether they are trained in state-of-the art SW-Engineering procedures • Many companies build safety-critical SW w/o using proper procedures from a SW-Eng and safety-engineering perspective
Genesis of the Therac-25 • Medical linear accelerators accelerate electrons to create high-energy beams that can destroy tumors w/ minimal impact on surrounding healthy tissue • shallow tissue is treated w/ accelerated electrons; deeper tissue requires converting the electron beam into X-ray photons
The Builders • Early 70’s, Atomic Energy of Canada Limited (AECL) and a French company (CGR) collaborated to build linear accelerators • They developed 1) Therac-6 a 6MeV accelerator producing only X rays, and • 2) Therac-20, a 20-MeV dual mode(X Rays or electrons) accelerator • SW functionality was limited in both machines, it added convenience to existing hardware • Industry-standard hardware safety features and interlocks in the hardware were retained
Developing Therac-25(1) • Mid 70’s, AECL developed a new double-pass concept for electron acceleration • needs less space to develop similar energy levels • AECL developed Therac-25, dual-mode linear accelerator • more compact and versatile than Therac-20 • Therac-6,20,and25 controlled by PDP 11 • Therac-25 takes advantage of computer control from outset while Therac-6 and 20 designed around machines already having histories of clinical use w/o computer control • Therac-25 has more responsibility for maintaining safety than SW in previous machines
Safety Issues : New and Old Therac’s • Therac-20 had independent protective circuits to monitor electron-beam scanning • Therac-20 also had mechanical interlocks for policing machine and ensuring safe operation • Therac-25 relies more on SW for these functions • AECL took advantage of computer’s abilities to control and monitor HW • decided not to duplicate all existing HW safety mechanisms and interlocks • This approach is becoming more common • companies choosing to cut cost by avoiding extra HW interlocks and backups • Maybe they are placing more faith in SW
Therac-25 Development • 1st hardwired Therac-25 developed in 1976 • Completely computerized commercial version available in late 1982 • March 1983, AECL performed a safety analysis in form of a fault tree and EXCLUDED SOFTWARE!
The Safety Analysis Report (before release of product) • Programming errors have been reduced by extensive testing on a HW simulator and under field conditions on teletherapy units. Any residual SW errors are not included in the analysis • Program SW does not degrade due to wear, fatigue, or reproduction process • Computer execution errors are caused by faulty HW components and by “soft” (random) errors induced by alpha particles and electromagnetic noise. • The fault tree does include computer failure but only hardware failures • ex) One OR gate leading to the event of getting the wrong energy is labeled with a probability of 1E-11 • ex) the gate leading to Computer selects wrong mode is labeled with a probability of 4E-9 • The report provides NO justification of either number!
Therac-25 Software Development and Design • SW for Therac-25 developed by a single person using PDP11 ASSEMBLY language • Developed over several years • SW “evolved” from Therac-26 (which was started in 1972) • Very little SW documentation produced during development • AECL also had an apparent lack of documentation on SW specifications and a SW test plan
Therac-25 SW Testing • Manufacturer said the HW and SW were “tested and exercised separately or together over many years” • In deposition, QA manager explained, testing was done in two parts • “small amount” of SW testing done on a simulator • most done on system • Reports indicate that unit and SW testing was minimal • Most testing efforts directed to integrated system test • Same QA manager at a Therac-25 users meeting stated the SW was tested for 2,700 hours • Under questioning by users clarified this as “2700 hours of use” • Programmer left AECL in 1986, we know nothing of the programmer • AECL employees could not provide any information about the programmers educational background or experience
How it Operates • SW responsible for monitoring machine status • accepts input about treatment desired, sets machine up for treatment • turns beam on , activated by operator command • turns beam off when treatment is completed, or when operator commands it OR when a malfunction is detected • Unit has an interlock system designed to remove power to unit when there is a HW malfunction • Computer monitors interlock system and provides diagnostic messages • depending on fault the computer either prevents a treatment from starting OR if treatment is in progress, creates a pause or suspension of treatment
Accident History • Eleven Therac-25’s were installed • 5 in US; 6 in Canada • 6 accidents involving massive overdoses to patients occurred between 1985 and 1987 • Machine recalled in 1987for extensive design changes, including HW safeguards against SW errors • Related problems found in Therac-20 SW, not recognized until after Therac-25 accidents • Not detected because of Therac-20 HW safety interlocks (so no injuries occurred)
Kennestone Regional Oncology Center, 1985 • Marietta, Ga • Accident never carefully investigated, no admission that Therac-25 caused injury until much later • This despite claims by patient that she had been injured during treatment, • obvious and severe radiation burns patient suffered and suspicions of radiation physicist involved
Kennestone(2) • After undergoing a lumpectomy to remove a malignant breast tumor, 61 yr. Old woman was receiving follow up radiation to nearby lymph nodes on • The Therac-25 had been operating at Kennestone for about 6 months other Therac 25-’s had been operating w/o incident since 1983. • Jun 3, 1985, patient set up for a 10-MeV electron treatment to clavicle area • When machine turned on, she felt a “tremendous force of heat… this red-hot sensation.” • Technician came in, she said, “you burned me.” • Technician replied that it was not possible • No red marks on patient at the time, but the area was “warm to the touch.”
Kennestone (3) • Patient went home, shortly afterward developed a reddening and swelling in the center of the treatment area • her pain increased to the point that her shoulder “froze” and she experience spasms • She was admitted to West Paces Ferry Hospital in Atlanta, oncologists continued to send her to Kennestone for Therac-25 treatments • 2 weeks later, physicist at Kennestone noticed a matching reddening on her back as though burn had gone through her body • her should was immobile, she experienced great pain, patients breast had to be removed due to radiation burn • obvious that she had a radiation burn but hospital and doctors could not provide a satisfactory explanation • Kennestone physicist estimated she received one or two doses of radiation in 15k-20k range (typical doses are in 200 rad range)