ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis

ECE 453 – CS 447 – SE 465 Software Testing & Quality AssuranceInstructorKostas Kontogiannis

Overview • Theory of Program Testing • Goodenough and Gerhart’s Theory • Weyuker and Ostrand’s Theory • Gourlay’s Theory

Goodenough and Gerhart • The theory provides: • A fundamental testing concept • Identification of few types of program errors • A framework for test data selection • Proposed in 1975 and published in an IEEE TSE paper

Initial Comments • In Goodenough’s theory, to find test data that reliably reveal errors, one must do the following: • Identify all conditions relevant to the correct operation of a program • Select test data to exercise all possible combinations of these conditions • The above idea of selecting test data leads to the following terms: • Test data: Test data are actual values from a program’s input domain that collectively satisfy some selection criterion • Test predicate: A test predicate is a description of conditions and combinations of conditions relevant to the program’s correct operation • Test predicates describe whataspects of a program are to be tested. Test data cause these aspects to be tested. • Test predicates are the motivating force for test data selection • Components of test predicates arise first and primarily from the specification of a program • As implementations are considered, further conditions and predicates may be added

Fundamental Concepts Input Domain D d P(d) Program P T The result of executing P with input d is denoted as P(d) T is a subset of the domain D

Terminology - 1 • OK(d): A predicate that is true if and only if P(d) is an acceptable outcome for the given input d. • SUCCESSFUL(T): For a given T that is a subset of D, T is a successful test set if and only if for every t in T, OK(t). • A test set T constitutes an ideal test if OK(t) for every t in T implies that OK(d) for every d in D. • An ideal test is interpreted as a test set that is a sample of the input domain and can be used to conclude on the outcome of the whole domain.

Terminology - 2 • Selection criterion C: A collection of predicates and properties that allow to select a particular test set data T form the domain D. • Reliable criterion C: Every test selected using a predicate in C is either successful, or no test selected by a predicate in C is successful. Reliability refers to consistency. • Valid criterion C: A selection criterion C is valid if and only if whenever program P is incorrect, then we can select through C a test set T that is not successful for P. • Theorem: If C is a reliable and valid criterion then any test case selected using C is an ideal test. • In practice it is difficult to find, if it exists, a reliable and valid criterion

Conditions for Reliable Criteria Selection • The minimal conditions that need be satisfied for a test criterion to be considered reliable are: • Every individual branching condition in the program graph must be represented by an equivalent condition in C • Every potential termination condition in the program (i.e. an overflow) must be represented by a condition in C • Every condition relevant to the correct operation of the program that is implied by the specification and knowledge of the program’s structure, must be represented as a condition in C • Test predicates must be independentt

Comments • The reasons that it is difficult (if not impossible) to find always a valid and reliable test include: • Faults are unknown apriori, so it is impossible to prove the reliability ans validity of a criterion. A criterion is guaranteed to be both reliable and valid is the one that selects the entire input domain. • Neither reliability nor validity is preserved during the debugging process, where faults keep disappearing (or new ones appear). • If the program P is correct (fault free), then any test will be successful and every selection criterion is reliable and valid • If the program P is not correct, there is in general n way of knowing whether a criterion is ideal without knowing the errors in P.

Terminology – 3 • Let D be the input domain for program P. Let C denote s set of selection predicates such that if d D and d satisfies the predicate c C then we say c(d) is true. • We can then define COMPLETE(C, T) as being a predicate such that: COMPLETE(C,T) = • However, the definitions of an ideal test and thoroughness of a test do not reveal a relationship between them. The theory attempts to establish this relationship.

Program Error Classification in Goodenough’s Theory • Logic Fault: Problems with program not the resources • Requirements fault • Design faults • Construction fault • Missing control-flow paths • Inappropriate path selection • Inappropriate or missing action • Performance fault

A Process towards Testing in Goodenough – Gerhart Theory • Let B a set of faults in a program P, which are revealed by an ideal test TI. • Also let the software tester to define a set of test predicates C1, and design a set of test cases T1, such that COMPLETE(C1, T1), and that these test cases T1, reveal a set of faults B1 that may be a subset of B. • Incrementally, the tester may identify a larger set of predicates C2 such that C1 is a subset of C2, and design a new set of test cases T2, such that T1 is a subset of T2, and COMPLETE(C2, T2), revealing faults B2 such that B1 is a subset of B2. • Continuing incrementally the process the objective is to identify a set of predicates Ck, a set of test cases Tk, that satisfy COMPLETE(Ck, Tk) and reveal Bk in a way that Tk approximates the ideal set TI

Failure Intensity Reduction Concept Initial failure intensity, l0 λ: Failure intensity λ0: Initial failure intensity at start of execution μ: Average total number of failures at a given point in time v0: Total number of failures over infinite time l: failure intensity Basic Log. v0 m: Mean failures exp.

Gourlay’s Theory • The theory assumes that the program specifications are correct, and the specification is the sole arbiter of the correctness of a program. The program is said to be correct if it satisfies its specification. • Gourlay’s theory aims to establish a relationship between three sets of entities namely specifications S, programs P, and Tests T. • We can then extend the OK predicate as follows: • OK(p, t, s) : The predicate is true if the result of testing p with t is judged to be successful with respect to specification s. We aim to make the predicate OK(p, t, s) true for every t in T, where T is a subset of T • In this context a program is correct with respect to its specifications denoted by CORR(p,s), iff OK(p, s, t) for every t in T.

Testing Systems - 1 • A testing system is defined as a collection of <P, S, T, CORR, OK> for which for every p, s, t in P, S, T (where P, S, T are subsets of P, S, T ) CORR(p, s) implies OK(p, s, t). • Given a testing system <P, S, T, CORR, OK> , we can define a new system <P, S, T’`, CORR, OK`>, where T’ ` is the set of all subsets of T’, and where OK’(p, s, t) if and only if • We call this system set construction system. The set construction corresponds to a test that consists of a set of trials, and success of the test as a whole depends on the success of all trials. Failure of any one run is enough to invalidate the test.

Testing Systems - 2 • An another type of testing system is the choice construction. • In particular, given a testing system <P, S, T, CORR, OK> , we can define a new system <P, S, T’`, CORR, OK`>, where T’ ` is the set of all subsets of T’, and where OK’(p, s, t) if and only if • The choice construction models the situation in which a test engineer is given a number of alternative ways of testing the program, all of which is assumed to be equivalent.

Test Methods • A test method can be considered as a function M: PXS T • That is, in the general case, a test method takes a program P, and a specification S, and produces test cases T (where P, S, T are subsets of P,S, T respectively). • Test methods can be: • Program Dependent T = M(P)  (white box testing) • Specification Dependent T = M(S)  (black box testing) • Expectation Dependent T = M(S’), where S’ are the expectations of the customers, or the view the customers have on the specifications  (Acceptance testing)

Power of Test Methods - 1 • A fundamental problem in testing is to assess whether one test method is better than another (in recovering faults). • Let M. N be two testing methods, and let FM, FN be the faults that can be discovered by M, N respectively. • For M to be at least as good as N, we must have the situation that whenever N finds a fault, so does M. In other words FN is a subset of FM

Power of Test Methods - 2 • Let TN, TM, be the test cases produced by methods N, M respectively. The “investigative” power of methods N, M can be classified in two cases: • Case 1: TN TM.. In this case, method M is at least as good as method N • Case 2: TN, TM overlap, but TN TM. This case suggests that TM does not totally contain TN and in order to compare their fault detection ability we execute program P under both test sets TN, TM. Let FN, FM be the sets of faults discovered by TN, TM. If FN FM then we say that method M is at least as good as N. • These two cases are depicted in the following figures.

Power of Test Methods - 3 S N TM FM P TN FN M P Case 1 TN S N FM P FN TM M P Case 2

ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis