01/07/2012. Alan Clements. 2. The Getting it Right. Alan ClementsSchool of ComputingUniversity of TeessideMiddlesbroughEngland. 01/07/2012. Alan Clements. 3. Overview of the Lecture. Computer systems control all aspects of life.The failure of computer systems can have catastrophic effects.Al
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
1. 01/07/2012 Alan Clements 1 Getting it Right The Cost of Computer Failure
13 February 2008
2. 01/07/2012 Alan Clements 2 The Getting it Right Alan Clements
School of Computing
University of Teesside
3. 01/07/2012 Alan Clements 3 Overview of the Lecture Computer systems control all aspects of life.
The failure of computer systems can have catastrophic effects.
Already major aviation disasters have taken place because of computer errors.
We must ensure that designers know how failures occur and how they can be prevented.
We have to balance new disasters due to computers against old disasters due to humans.
4. 01/07/2012 Alan Clements 4 The Computer Aided Crash My academic interest is computer architecture.
I propose to talk about the need for computer scientists to understand the consequence of the systems they design by reference computer-based accidents – particularly in the aviation industry.
5. 01/07/2012 Alan Clements 5 Ethics and Professionalism All professional organizations stress that students be taught ethics.
The emphasis on ethics is not for idealistic reasons; it is because of the high cost of ‘lapses in ethics’.
History is full of examples of the tremendous cost of neglecting ethics.
Corporate manslaughter is a new offense in the UK.
Corporate manslaughter is a crime that can be committed by a company in relation to a work-related death.
The offence is intrinsically linked to whether a senior manager - a "controlling mind and will" of the company - is guilty of manslaughter.
If the director or manager is found guilty, the company is guilty.
King Hammurabi said ‘If a building collapsed and kills people builder shall be put to death.
6. 01/07/2012 Alan Clements 6 Non-aviation Examples of Computer Error Let’s look at two classical computer errors.
One involves a therapeutic X-ray machine and the other a surface-to-air missile.
Both errors were due to poor design rather than component failure.
7. 01/07/2012 Alan Clements 7 The Therac 25 Incidents The Therac-25 was a therapeutic X-ray machine designed to treat cancer sufferers.
It operated in two modes: X-ray and electron beam.
In the X-ray mode a powerful electron beam was aimed at a target to generate X rays.
8. 01/07/2012 Alan Clements 8 The Therac 25 Incidents If the machine was set in the X-ray mode and the target was not engaged, the patient would receive a fatal does of high intensity radiation.
Early Therac models had electro-mechanical interlocks that made it impossible to energize the electron beam if the target were not in place.
The Therac-25 used a PDP-11 computer to perform all operations including moving the target into place when in the X-ray mode.
9. 01/07/2012 Alan Clements 9 The Therac 25 Incidents On several occasions the target was not rotated into position.
Patients suffered massive doses of high intensity electron beams leading to both thermal and electromagnetic radiation damage.
Six accidents involving massive overdoses of radiation occurred between 1985 and 1987 before the machines were recalled.
This was a failure of design and of imagination. It was also a failure by the regulatory bodies to anticipate the problem and then to respond to it.
10. 01/07/2012 Alan Clements 10 Reasons for the Therac 25 Failures Software re-used from older models that had hardware interlocks.
The hardware provided no way for the software to verify that sensors were working correctly.
The operator interface was not correctly synchronized with the system operation. If the operator corrected an error too quickly, a race condition occurred. This was missed during testing, because operators weren’t fast enough for the problem to occur.
The software set a flag by incrementing it. If it was incremented too often, arithmetic overflow occurred and the software bypassed safety checks.
11. 01/07/2012 Alan Clements 11 A Comment by the FDA on the Therac Manual “The operator's manual does not explain nor even address the malfunction codes...
The materials provided give no indication that these malfunctions could place a patient at risk.
The program does not advise the operator if a situation exists wherein the ion chambers used to monitor the patient are saturated, thus are beyond the measurement limits of the instrument.
This software package does not appear to contain a safety system to prevent parameters being entered and intermixed that would result in excessive radiation being delivered to the patient under treatment.”
12. 01/07/2012 Alan Clements 12 The Patriot Missile Failure The Patriot missile was used in the first Iraq war to destroy incoming Scud missiles.
The position of a Scud missile was calculated using a formula that involved time.
Patriot software measured time in increments of 0.1s second.
The decimal value 0.1 cannot be exactly represented in binary as it is a recurring fraction.
The Patriot used 24-bit arithmetic to represent time.
13. 01/07/2012 Alan Clements 13 The Arithmetic Failure The longer a Patriot missile is in operation (booted up) the greater the accumulated time error becomes.
On February 25,1991, a patriot missile had been operating for over 100 consecutive hours.
The period of operation gave rise to an accumulated error of 0.34s.
A Scud flies at over 1,600 m/s and covers over 500 m in this time.
The Patriot missed the Scud.
The Scud struck a US army barracks killing 28 soldiers.
14. 01/07/2012 Alan Clements 14 Arithmetic Error In this example, the failure occurred directly as a consequence of the imprecision of the tracking algorithm.
However, the failure to re-start the Patriot should have been anticipated by the designers.
15. 01/07/2012 Alan Clements 15 Accident Rates by Aircraft Generation