Software Testing, Fault Injection, and Black Balls and Urns

Software Testing, Fault Injection, and Black Balls and Urns Jeffrey Voas, PhD, FIEEE, FAAAS Computer Scientist Jeff.Voas@nist.gov J.voas@ieee.org

Part 1Why Software Testing According to Operational Profiles is Not Sufficient

Terminology Error Fault, Defect, Bug, Flaw Failure Execution Infection Propagation

Dispel Myth #1: Models That software reliability modelsare capable of guaranteeing that software will always behave with some level of pre-specified, high integrity (e.g., safety-critical, mission-critical, business-critical, etc.).

Dispel Myth #2: Testing Traditional operational reliability testing (prior to software release) is sufficient to determine that the software will always behave with high integrity.

Myth # 1: Software Reliability Models

Failure Intensity Time Reliability

Problem 1: Time • Software does not wear out over time. If it is logically incorrect today, it will be logically incorrect tomorrow. • Testing some systems for 10,000 hours means a lot; for other systems, it means little. • Models need to consider the quality of the test cases and complexity of the software (e.g., 1 LOC versus 1M LOC, etc.).

Problem 2: Mass-Marketed Software Operational Profile • Established definition:Operational profile is:1. The set of input events that the software will receive during execution, along with the probability that the events will occur. • The probability density function is simply the derivative of the cumulative distribution function. • Modified definition:Operational profile (usage) is: 1. The set of input events that the software will receive during execution, along with the probability that the events will occur. 2. The set of context-sensitive input events generated by external hardware and software systems that the software can interact with during execution. This is the configuration (C) and machine (M).

Problem 3: Conflicting Results R1(r, y)= 0.99 R2(r, y)= 0.9 R3(r, y) = 0.94

Three United States Regulatory Positions • U.S. Nuclear Regulatory Commission is not for, not against; it is up to the reviewer • U.S. Food and Drug Administration 510(K) never mentions it • U.S. Federal Aviation Administration • Standard DO-178B: “… currently available methods do not provide results in which confidence can be placed … ” [Section 12.3.4]

The overall problem with software reliability models is that while they are quantitative, their results should probably only be trusted in a qualitative manner, such as when to halt testing (MTTF). Great for trend analysis over time

Myth # 2: Operational Reliability Testing – Legal Inputs

Repeated Trials Operational Reliability Testing

Operational testing would be sufficient if exhaustive testing under every possible operational input scenario (legal or anomalous) could be performed.

Legal vs. Anomalous Scenarios Input requirements: x>4 and y<300 and z = true Legal: x=5 and y=200 and z = true Anomalous: x = 0 and y= 299 and z = true

Dificulty Testing with only legal inputs is generally infeasible (232 x 232 = 264), and thus we are forced to rely on predictors such as software reliability models to quantify “reliability,” MTBF, when to halt testing, etc. And this infeasibility problem is the same for anomalous inputs and possibly worse!

Large vs. Larger is almost always ignored during specification and V&V Anomalous Legal

Question: Why Then Have Software Reliability Models and Perform Reliability Testing?

The “Culprit” Phenomenon Software Behavior’s Unpredictability It is more difficult to directly measure software quality than to achieve it.

Reasons for Unpredictability Software systems are inherently unpredictable due to: • Rare inputs • Unanticipated events (faults and bad input data) that corrupt internal program states during execution • Design Oversights • Non-exhaustive testing • These 4 issues collapse down to the problem of COMPLEXITY both of the software and input domain

And in Software … • The ability to pinpoint precisely how software will behave in the future requires, at a minimum, knowledge of the consequences of each fault in the software. • This requires knowing about each fault, and knowing about each fault requires selecting the appropriate inputs that detect those faults during test. Not practical. • Further, reliability testing can encourage faults of severe consequence to hide – Why? Because the most likely inputs events are the ones employed during testing.

Black vs. White • Black ball = input vector on which the software fails. • White ball = input vector on which the software succeeds. • For each input vector, the associated ball(s) representing it must be either white or black. • If testing were exhaustive, 100% predictability would be achieved. Balls are input vectors Urn is software

Scenario 1: • This urn represents a software system that fails on every possible input. Software That Always Fails

Scenario 2: • This urn represents a software system that succeeds on every possible input Correct Code

Scenario 3: Typical Code • This urn represents virtually all software in use today. • From a testing perspective, how can you tell this urn from the previous one?

And thus the real problem from a testing perspective is …

Fault Density One fault, density is 3 • A fault’s density is the number of inputs that cause failure for that specific fault. • For illustration purposes, a fault’s density is the number of black balls that are hooked together by one chain. • Low fault densities are the reason that faults hide from tests.

Fault Density Three faults with a density of 1 • This urn also represents a program that has three inputs that cause failure.

ANDs vs. ORs Question In which urn is testing more likely to find all black balls (assuming the same ratio of black vs. white balls per urn)?

.5 .5 probability of selection probability of selection .25 .25 0 0 1 1 2 2 3 3 input vector input vector Two Key Points 1. Different Operational Profiles Reshape Urns Different test profiles, different urns

Key Point The Number of Faults May or May Not Reshape the Urn • i.e., the number of faults does not necessarily impact the reliability of the software.

.5 probability of selection .25 0 1 2 3 input vector One fault = 50% Pr[failure]

.5 probability of selection .25 0 1 2 3 input vector Two faults = 50% Pr[failure]

Voas Position Statement Unfortunately, software engineering’s “current wisdom” is geared towards lowering the number of faults instead of increasing the size of the faults to magnify their detectability. This is the goal of software design-for-testability (DFT).

And So the Question Becomes What can be done to more accurately predict how a software component/application will behave in the future?

Answer Attempt to flush out rarely occurring behaviors that operational reliability testing is likely to overlook

Recommendations • Test with respect to complete environment using the broader definition of operational profile • Perform off-nominal (rare input) testing – mangle and/or invert the “assumed” operational profile • Perform software fault injection • Combine ALL THREE ABOVE

Summary • Reliability testing is necessary but not sufficient. Do not read more into the results of this form of testing than you should. • Every software testing profile, including operational profiles, can engender faults of severe (but infrequent) consequence to hide by decreasing densities. Therefore, not only is reliability testing insufficient, but its results can be misleading. • Reliability testing is only one technique in the “bag of tricks” for assessing software integrity. Other techniques should be applied if the costs are tolerable.

Part 2Fault Injection Introduce: • An alternative form of dynamic software testing, and to contrast it to traditional reliability testing (along with several key applications), • An interesting problem that occurs when test cases fail to reveal faults, why this occurs, and what, if anything, fault injection can do towards addressing this problem. • How this technique applies to the interoperability (non-interference) problem and the testing stoppage criteria problem

Software Fault Injection • Is a form of dynamic software testing • Is like “crash-testing” software • Code-based or interface based • Demonstrates the consequences of incorrect code or data • Is a “what if” game • The more you play “what if,” the more confident you become that your software can deal with anomalous and unanticipated events

Reliability Testing vs. Fault Injection • Reliability testing executes the software under expected scenarios, and it tests for correct behavior • Fault injection executes the software under anomalous scenarios, and tests to see “if this bad thing happens, what are the consequences”? • Fault injection fills a gap in predicting software behaviors.

Inputs from Hardware or Humans Basic Fault Injection Process Acceptable? Software Acceptable? Outputs to Humans or Hardware Environment/ Target System

Acceptable Acceptablerefers to output or internal behaviors that do not violate pre-defined definitions or predicates

Environment/Target System Is our Modified definition:Operational profile should be: 1. Above definition + 2. The set of context-sensitive input events generated by external hardware and software systems that the software can interact with during execution. This includes the configuration (C) and machine (M) and human (H).

Usage profile: Features Used?Keystrokes? Files Loaded?

The “real” profile also includes the input signals from the “Invisible Users” So the “real” profile is mixture of these two spaces

Code/Syntax Mutation e.g., replace x+1 with x+2 Data Mutatione.g., replace x with perturb (x) State anomalye.g., corrupting programmer variable, memory, time, etc. Two Basic Ways to Implement Software FI

Two Key Decisions • What anomalies should be injected? • Depends on the application • What should be observed for? • Acceptable

Anomaly Definition: Corrupted internal state information that exists during software execution for only a snapshot in time. Anomalies are the final ingredients that precipitate software failure.

Software Testing, Fault Injection, and Black Balls and Urns