Safety critical computer systems open questions and approaches l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Safety Critical Computer Systems - Open Questions and Approaches PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on
  • Presentation posted in: General

Safety Critical Computer Systems - Open Questions and Approaches. Andreas Gerstinger Institute for Computer Technology February 16, 2007. Agenda. Safety-Critical Systems Project Partners Three research topics Safety Engineering Diversity Software Metrics Conclusion and Outlook.

Download Presentation

Safety Critical Computer Systems - Open Questions and Approaches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Safety critical computer systems open questions and approaches l.jpg

Safety Critical Computer Systems - Open Questions and Approaches

Andreas Gerstinger

Institute for Computer Technology

February 16, 2007


Agenda l.jpg

Agenda

  • Safety-Critical Systems

  • Project Partners

  • Three research topics

    • Safety Engineering

    • Diversity

    • Software Metrics

  • Conclusion and Outlook


Slide3 l.jpg

  • Safety-Critical Systems


Safety critical systems l.jpg

Safety Critical Systems

  • A safety-critical computer system is a computer system whose failure may cause injury or death to human beings or the environment

  • Examples:

    • Aircraft control system (fly-by-wire,...)

    • Nuclear power station control system

    • Control systems in cars (anti-lock brakes,...)

    • Health systems (heart pacemakers,...)

    • Railway control systems

    • Communication systems

    • Wireless Sensor Networks Applications?


Sysari project l.jpg

SYSARI Project

  • SYSARI = SYstem SAfety Research in Industry

  • Goal of the project

    • to conduct and promote the research in system safety engineering and safety-critical system design and development

  • Close cooperation between ICT and Industry

    • One "shared" Employee (me)

    • Students conducting practical Diploma Theses

    • PhD Theses


What is safety l.jpg

What is Safety?

“The avoidance of death, injury or poor health to customers, employees, contractors and the general public; also avoidance of damage to property and the environment”

Safety is also defined as "freedom from unacceptable risk of harm"

A basic concept in System Safety Engineering is the avoidance of "hazards"

Safety is NOT an absolute quantity!


Safety vs security l.jpg

Safety vs. Security

  • These two concepts are often mixed up

  • In German, there is just one term for both!


Sils and dangerous failure probability l.jpg

SILs and Dangerous Failure Probability


Slide9 l.jpg

  • Project Partners


Project partner l.jpg

Austrian High Tech company

World leader in air traffic control communication systems

700 employees, company based in Vienna, customers all over the world

http://www.frequentis.com

Project Partner:


Frequentis voice communication system l.jpg

Enables communication between aircraft and controller

Communication link must never fail!

Requirements:

Safety

High Availability and Reliability

Fault Tolerance

Other domains:

railway

ambulance, police, fire brigade,...

maritime

Safety Integrity Level 2

Frequentis Voice Communication System


Project partner12 l.jpg

French company

68000 employees worldwide

Mission critical information systems

25000 researchers

Nobel Prize in Physics 2007 awarded to Albert Fert, scientific director of Thales research lab

http://www.thalesgroup.com

Project Partner:


Railway signalling systems l.jpg

Signalling and Switching

Axle Counters

Applications for ETCS

An incorrect output may lead to an incorrect signal causing a major accident!

Safety Integrity Level 4 (highest)

Railway Signalling Systems


Old interlocking systems l.jpg

(Old) Interlocking Systems

Mechanical /

Electromechanical

Systems


Signal box interlocking tower l.jpg

Signal Box / Interlocking Tower

  • Electric system with some electronics


Modern signal box interlocking tower l.jpg

Modern Signal Box / Interlocking Tower

  • Lots of electronics and computer systems


Slide17 l.jpg

  • Safety Engineering


What is a hazard l.jpg

What is a Hazard?

  • Hazard

    • physical condition of platform that threatens the safety of personnel or the platform, i.e. can lead to an accident

    • a condition of the platform that, unless mitigated, can develop into an accident through a sequence of normal events and actions

    • "an accident waiting to happen"

  • Examples

    • oil spilled on staircase

    • failed train detection system at an automatic railway level crossing

    • loss of thrust control on a jet engine

    • loss of communication

    • distorted communication

    • undetectably incorrect output


Hazard severity level example l.jpg

Hazard Severity Level (Example)


Hazard probability level example l.jpg

Hazard Probability Level (Example)


Risk classification scheme example l.jpg

Risk Classification Scheme (Example)


Risk class definition example l.jpg

Risk Class Definition (Example)


Risk acceptability l.jpg

Risk Acceptability

  • Having identified the level of risk for the product we must determine how acceptable & tolerable that risk is

    • Regulator / Customer

    • Society

    • Operators

  • Decision criteria for risk acceptance / rejection

    • Absolute vs. relative risk (compare with previous, background)

    • Risk-cost trade-offs

    • Risk-benefit of technological options


Risk tolerability l.jpg

Risk Tolerability

Hazard

Severity

Probability

Risk

Risk Criteria

Risk Reduction Measures

Tolerable?

No

Yes


Slide25 l.jpg

  • Diversity


Diversity l.jpg

Diversity

  • Goal: Fault Tolerance/Detection

  • Diversity is "a means of achieving all or part of the specified requirements in more than one independent and dissimilar manner."

  • Can tolerate/detect a wide range of faults

"The most certain and effectual check upon errors which arise in the process of computation, is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods."

Dionysius Lardner, 1834


Layers of diversity l.jpg

Layers of Diversity


Examples for diversity l.jpg

Examples for Diversity

  • Specification Diversity

  • Design Diversity

  • Data Diversity

  • Time Diversity

  • Hardware Diversity

  • Compiler Diversity

  • Automated Systematic Diversity

  • Testing Diversity

  • Diverse Safety Arguments

Some faults to be targeted:

programming bugs, specification faults, compiler faults, CPU faults, random hardware faults (e.g. bit flips), security attacks,...


Compiler diversity l.jpg

Use of two diverse compilers to compile one common source code

Compiler Diversity


Compiler diversity issues l.jpg

Compiler Diversity: Issues

  • Targeted Faults:

    • Systematic compiler faults

    • Some Heisenbugs

    • Some systematic and permanent hardware faults (if executed on one board)

  • Issues:

    • To some degree possible with one compiler and different compile options (optimization on/off,…)

    • If compilers from different manufacturers are taken, independence must be ensured


Systematic automatic diversity l.jpg

Systematic Automatic Diversity

  • Artificial introduction of diversity to tolerate HW Faults

  • (Automatic) Transformation of program P to a semantically equivalent program P' which uses the HW differently

    • e.g. different memory areas, different registers, different comparisons,...

      if A=B then  if A-B = 0 then

      A or B  not (not A and not B)


Systematic automatic diversity32 l.jpg

Systematic Automatic Diversity

  • What can be "diversified":

    • memory usage

    • execution sequence

    • statement structures

    • array references

    • data coding

    • register usage

    • addressing modes

    • pointers

    • mathematical and logic rules


Systematic automatic diversity issues l.jpg

Systematic Automatic Diversity: Issues

  • Targeted Faults:

    • Systematic hardware faults

    • Permanent random hardware faults

  • Issues:

    • Can be performed on source code or assembler level

    • If performed on source code level, it must be ensured that compiler does not "cancel out" diversity

    • (Software) Fault injection experiments showed an improvement of a factor ~100 regarding HW faults


Example diverse calculation of position l.jpg

Position P can be calculated based on speedometer and accelerometer readings

Voter can also be implemented diversely

PositionA and PositionB could be transmitted in different formats

Example: Diverse Calculation of Position


Open issues l.jpg

Open Issues

  • How can diversity be used most efficiently?

  • Can diversity be introduced automatically?

  • Which faults are detected/tolerated to which extent?

  • How can the quality fo the diversity be measured?

  • Can diversity be also used to detect security intrusions?


Slide36 l.jpg

  • Software Metrics


Software metrics for safety critical systems l.jpg

Problems

Which metrics should safety-critical software fulfill?

Which coding rules are good and useful?

What are the desired ranges for metrics?

Which metrics influence maintainability?

Software Metrics for Safety-Critical Systems


Some raw metrics l.jpg

Some RAW Metrics...


Outline of method l.jpg

Outline of Method

  • Create a questionnaire with relevant questions regarding software quality and get answers from expert developers for various software packages they work with

  • Automatically measure potentially interesting metrics of the software packages

  • Correlate questionnaire responses with the measured metrics to find out which metric correlates with which property


Graph 3 code clarity vs return points l.jpg

Graph 3: Code Clarity vs. Return Points


Graph 4 internal quality vs cc l.jpg

Graph 4: Internal Quality vs. CC


Summary of results l.jpg

Summary of Results

  • Strongest correlation with perceived internal quality:

    • Comment density

    • Control Flow Anomalies

  • No correlation with perceived internal quality:

    • Cyclomatic Complexity

    • Average Method Size

    • Average File Size

    • ...


Slide43 l.jpg

  • Conclusion and Outlook


Further related topics l.jpg

Further Related Topics

  • Agile Methods in Safety Critical Development

  • Hazard Analysis Methods

  • Safety Standards

  • Safety of Operating Systems

  • COTS Components for Safety-Critical Systems

  • Safety Aspects of Modern Programming Languages (Java, C#.NET)

  • Fault Detection, Correction and Tolerance

  • Safety and Security Harmonisation

  • Linux in Safety-Critical Environments

  • Online Tests to detect hardware faults


Conclusion l.jpg

Conclusion

  • Many open issues in this field...

  • All research activities in SYSARI project practically motivated

  • Number of safety-critical systems increases

  • International Standards play a vital role (e.g. IEC 61508)

    Contact:

    Andreas Gerstinger: [email protected]


  • Login