Safety Validation Techniques in Critical Systems: A Comprehensive Guide

Critical Systems Validation CIS 376 Bruce R. Maxim UM-Dearborn

Validation Perspectives • Reliability validation • Does measured system reliability meet its specification? • Is system reliability good enough to satisfy users? • Safety validation • Does system operate so that accidents do not occur? • Are accident consequences minimized? • Security validation • Is system secure against external attack?

Validation Techniques • Static techniques • design reviews and program inspections • mathematical arguments and proof • Dynamic techniques • statistical testing • scenario-based testing • run-time checking • Process validation • SE processes should minimize the chances of introducing system defects

Static Validation Techniques • Concerned with analysis of documentation • Focus is on finding system errors and identifying potential problems that may arise during system operation • Documents may be prepared to support static validation • structured arguments • mathematical proofs

Static Safety Validation Techniques • Demonstrating safety by testing is difficult • Testing all possible operational situations is impossible • Normal reviews for correctness may be supplemented by specific techniques intended to make sure unsafe situations never arise

Safety Reviews • Intended system functions correct? • Is structure maintainable and understandable? • Verify algorithm and data structure design against specification • Check code consistency with algorithm and data structure design • Review adequacy of system testing

Review Tips • Keep software as simple as possible • Avoid error prone software constructs during implementation • Use information hiding to localize effects of data corruption • Make appropriate use of fault tolerant techniques

Hazard-Driven Analysis • Effective safety assurance relies on hazard identification • Safety can be assured by • hazard avoidance • accident avoidance • protection systems • Safety reviews should demonstrate that one or more of these techniques have been applied to all identified hazards

System Safety Case • The normal practice for a formal safety case to be required for all safety-critical computer-based systems • A safety case presents a list of arguments, based on identified hazards, as to why there is an acceptably low probability that these hazards will not result in an accident • Arguments can be based on formal proof, design rationale, safety proofs, and process factors

Formal Methods and Validation • Specification validation • developing a formal model of a system specification often reveals errors and omissions • mathematical analysis of a formal specification is another way to discover specification problems • Formal verification • mathematical arguments are used to demonstrate that a program or design is consistent with its formal specification

Formal Validation Problems • The formal model of the specification is not likely to be understood by the domain expert • this makes it hard to check that the formal model is an accurate representation of the system specification • a consistently wrong specification is useless • Verification does not scale-up • verification is a complex, error-prone process • the cost of verification increases exponentially with system size

Formal Methods in Practice • Use of formal methods in specification writing and verification may not guarantee correctness • Use of formal methods helps to increase confidence in a system by demonstrating that some classes of errors are not present • Formal verification is only likely to the used in small critical system components • 5 or 6 KLOC seems to be the component size limit for current formal verification techniques

Safety Proofs • Safety proofs are used to show that a system cannot reach an unsafe state • Correctness proofs are used to show that system code conforms to its specification • Safety proofs are based on proof by contradiction • Assume that an unsafe state can be reached • Show this assumption is contradicted by program code • Safety proofs may be presented graphically

Safety Proof Construction • Establish safe exit conditions for a component • Starting with the end of the code, work backwards until all paths leading to the exit are identified • Assume the exit condition is false • Show that for each path leading to the exit, the assignments made during path execution contradict the assumption of a false exit condition

Safety Validation • Design validation • design is checked to ensure that hazards do not arise that cannot be handled without causing an accident • Code validation • code is checked for conformance to specification and to ensure that the code is a true implementation of the design • Run-time validation • using run-time checks to monitor to make sure system does not enter unsafe state during operation

Example from Sommerville:Gas Warning System • System to warn of poisonous gas. Consists of a sensor, a controller and an alarm • Two levels of gas are hazardous • Warning level - no immediate danger but take action to reduce level • Evacuate level - immediate danger. Evacuate the area • The controller takes air samples, computes the gas level and then decides whether or not the alarm should be activated

Is the gas sensor control code safe? • Gas_level: GL_TYPE ; loop -- Take 100 samples of air • Gas_level := 0.000 ;for i in 1..100 loop Gas_level := Gas_level + Gas_sensor.Read ;end loop ; Gas_level := Gas_level / 100 ;if Gas_level > Warning and Gas_level < Danger then Alarm := Warning ; Wait_for_reset ;elsif Gas_level > Danger then Alarm := Evacuate ; Wait_for_reset ;else • Alarm := off ; • end if ;end loop ;

or or or Graphical Safety Argument Gas_level > Warning and Alarm = off Unsafe state Path 1 Path 2 Path 3 Gas_level > Warning and Gas_level < Danger Gas_level > Danger Alarm = off Alarm = Evacuate Alarm = Warning contradiction contradiction

Condition Checking * This indicates a code problem, since danger exists and no alarm sounds.

Safety Assertions • Assertions should be included in the program indicating conditions that should hold at crucial lines of code • Assertions may be based on pre-computed limits for critical variables • Assertions may be used during formal program inspections or may be converted to run-time safety checks

Dynamic Validation • Concerned with validating system during its execution. • Testing techniques • analyzing the system outside of its operational environment • Run-time checking • checking during normal execution that a system is operating within its dependability envelop

Reliability Validation • Involves exercising the program to assess whether it has reached the required level of reliability or not • Can’t be done during normal defect testing process, because defect test data is not always typical of normal usage data • Statistical testing must be used where a statistically significant data sample based on simulated usage is used to assess reliability

Statistical Testing • Used to test for reliability, not fault detection • Measuring the number of errors allows the reliability of the software to be predicted • Error seeding is one approach to measuring reliability • An acceptable level of reliability should be specified before testing begins and the software should be modified until that level is attained

Estimating Number of Program Errors by Error Seeding • One member of the test team places a known number of errors in the program while other members try to find them. • Assumption: (s/S) = (n/N) • s = # seeded errors found during testing • S = # of seeded errors placed in program • n = # non-seeded (actual) errors found during testing • N = total # of non-seeded (actual) errors in program • This can be written as N = (S*n)/s

Error Seeding Example • Using the error seeding assumptions • if 75 of 100 seeded errors are found • we believe the we have found 75% of the actual errors • If we found 25 non-seeded errors that means actual # errors in the program is N = (100 * 25)/75 = 33 1/3

Confidence in Software • C = 1 • if n >N • meaning the actual # errors found during testing exceeds the # actual errors in the program • C = S / (S - N -1) • if n <= N • meaning the actual # errors fond during testing is less than the actual # program errors

Confidence Example • To achieve a 98% confidence level that our program is bug free (N = 0) how many seeded errors would need to be introduced and found by our test team? S/(S - N + 1) = 98/100 S/(S - 0 + 1) = 98/100 S/(S + 1) = 98/100 100S = (98S + 98) 2S = 98 S = 49

Better Approach • A more realistic approach to estimating confidence would be based on the number of seeded errors found, regardless of whether they have all been found or not. • If n <= N then C = C(S, s -1)/C(S + N + 1, N+s) • where C(S, s - 1) = S!/(s - 1)!*(S - s + 1)! C(S+N+1, N+s) = (S+N+1)!/(N+s)!*(S+1-s)!

Reliability Validation Process • Establish an operational profile for the system • Construct test data reflecting this operational profile • Test the system and observe both the number of failure and the times of the failures • Compute the reliability after a statistically significant number of failures have been observed

Operational Profiles • Set of test data whose frequency distribution matches frequency distribution of these inputs in normal system usage • Can be generated from real data collected from and existing system or from assumptions made about system usage patterns • Should be generated automatically if possible (this can be difficult for interactive systems) • Hard to predict pattern of unlikely inputs without some type of probability model

Reliability Growth Models • Mathematical models of system reliability change over time as system is tested and defects are removed • Can be used to predict system reliability by using current process data and extrapolating the reliability using the model equations • Depends on the use of statistical testing to measure reliability for each system version

Reliability Model Selection • There is no universally applicable growth model • Many reliability growth models have been proposed • Reliability does not always increase over time, some system changes may introduce new errors • Models predicting equal steps between releases are not likely to be correct • Reliability growth rates tend to slow down with times as frequently occurring faults are removed from software (90/10 rule)

Exponential Growth Models • Software reliability growth models fall into two major categories • time between failure models (MTBF) • fault count models (faults or time normalized rates) • Reliability growth models are usually based on formal testing data and several curves may need to be checked against the actual test results. • The exponential distribution is the simplest and most important distribution in reliability and survival studies.

Model Evaluation Criteria • Predictive validity • external means of verifying model correctness • Capability • model does what is needs to do • Quality of assumptions • Applicability • appropriateness to software design • Simplicity • easy to collect data and easy to use

Modeling Process • Examine data • Select a model to fit the data • Estimate model parameters • Perform goodness of fit test • Make reliability predictions based on fitted model

Reliability Model Fitting

Exponential Model • CDF = cumulative probability density function F(t) = 1 - exp(-t/c) = 1 - exp(-lambda*t) • PDF = probability density function f(t) = 1/c * exp (-t/c) = 1/lambda * exp(-lambda*t) lambda = 1/c = error detection or hazard rate t = time • In real applications we need K = total number of defects in additions to lambda.

Exponential Distribution ReliabilityModel Considerations • The more precise the input data the better the outcome. • The more data points available the better the model will perform. • When using calendar time for large projects, you need to verify homogeneity of testing effort. • person hours per time unit • test cases run • variations executed

To Normalize Testing Data • Calculate average # person hours of testing per week • Compute defect rates for each n person-hours of testing • Use allocated defect data as weekly defect as weekly input data for model

Reliability Validation Problems • Operational profile uncertainty • does the operational profile reflect actual usage • High cost of test data generation • statistical programming modeling is labor intensive work • Statistical uncertainty for high-reliability systems • it may be impossible to generate enough failures so draw statistically valid conclusions (e.g. missile defense or nuclear control systems)

Security Validation • Similar to safety validation in that the goal is to demonstrate that system cannot enter an insecure (or unsafe) state • The key differences between security and safety are • safety problems are accidental • security problems are deliberate • security problems tend to be generic • safety problems tend to be application domain specific

Security Validation Techniques • Experience-based validation • system is reviewed and analyzed in terms of the types of attack known to the validation team • Tool-based validation • security tools (e.g. password checkers) are used to analyze system in operation • Tiger teams • teams try to breach security by simulating attacks on the system

Safety Validation Techniques in Critical Systems: A Comprehensive Guide