Software Quality Assurance

Software Quality Assurance William W. McMillan 16 March 2013

Quality Assurance = Testing?

Meaning of Quality • Error-free • How define an error? • Client is happy (we get paid!). • User is happy (we are loved!). • Stable (we won’t be bothered). • Doesn’t fail when needed (we aren’t sued). • Long-lasting (we can modify & add stuff later). • … ??

How would you define software quality?

Common Measures • Meets specifications • Safe • Testable • Maintainable • User-friendly • Users can be efficient • Meets standards • Portable • Learnable • Secure • Meets real requirements • Modular, reusable • Well-designed • Powerful (throughput) • Reliable

Categories of SQA • Verification • Meets specifications? • “Did we build the product correctly?” • Validation • Meets real needs of client and users? • “Did we build the correct product?” • Livability Assessment • Can we stand to fix, read, update, reuse, port… this thing?

Verification • We’ve developed user and client requirements (even if just in our heads). • We’ve developed a design and technical specifications (even if just in our heads). • We build a partial or whole system. • Verification determines if what we’ve built matches requirements and technical specs.

Give an example of how you’ve verified some of your own code.

Validation • We might verify a system to our satisfaction and deliver it. • But then the client and users find it to be: • Hard to use • Incomplete • Incorrect • What went wrong?

Validation • Object is to determine whether we’ve met the real needs of the client and users. • Requirements can be incorrect. • Why? • Some reasons: • Ambiguity of natural language • Changes in needs • We asked the wrong questions • Something lost in translation to design and specs

Give an example of when your code was verified, but not valid.

Livability • What would you like in a piece of software that you were going to be “married” to? • Broad category (previously discussed in course) • Modular • Reusable • Well documented • Portable (e.g., hardware specific stuff separated) • Modifiable (e.g., UI separate from “business” code) • Meets code conventions • …?

Software Testing • For verification, validation, or livability? • At least for V & V. • “Testing” implies executing code. • I.e., it’s dynamic • Most agree that testing is necessary to assure software quality. • But the reliance on testing has been challenged (to be discussed later).

Software Testing • Is it feasible to exhaustively test a program? • Say we have only 6 inputs and each one can take on one of 20 values. • That’s 206 = 64 million possible input vectors. • How about if we have dozens or hundreds of inputs and the ranges of values are much wider? • Testing almost always has to sample the input space.

Test-Driven Development • Used in agile methods. • Testing is part of development. • Define a new function or operation. • Define test cases for that component. • Develop the code until the tests are passed. • Future changes to system involve re-running previous tests to see what’s been broken.

Test-Driven Development • Advantages of this approach? • Is this a kind of exhaustive testing? • In what sense is the testing process part of specification? • What if the test cases defined don’t cover the input domain? • What if the functionality defined is not really what the client and users want?

Partition Testing • Divide up input space into equivalence classes • I.e., classes in which behavior of program is essentially the same. • Depends on domain knowledge and system requirements. • Sample each partition. • Special attention to boundaries between partitions.

Partition Testing Age 18 Low Medium High Family Income

Exercise Suppose a system is being developed to produce tuition bills from students’ class registrations. Each input object is one student’s class schedule for a single term. Each output is a bill sent to the student. What useful partitions of the input domain would you define? Within individual domains, in what ways is functioning uniform? Define some data values that would put cases on “boundaries” between partitions.

Random Testing • Sample randomly from input domain. • Uniform probability distribution is implied. • Many tests can be run automatically. • Can be combined with partition testing. • What are the advantages and disadvantages?

Statistical Testing • Sample probabilistically, but not uniformly. • Have probability distribution(s) from: • Theoretical model • Past data • Sample test cases according to expected input distributions. • “Operational Profile”

Statistical Testing • Say we’ve developed a new web site that delivers instruction to automobile technicians. • From past interactions with such services, we expect: • 65% of the user actions to be straight progression through the lessons • 20% to be answering self-test questions • 8% to be questions asked of the help system • 7% to be unexplained or confused

Statistical Testing • We generate test cases for the system in proportion to these expectations. • What measure sometimes defined under non-functional requirements might this kind of testing yield? • Think of a low-level network-traffic function that might be addressed through statistical testing.

Stress Testing • Used for systems that require • Heavy data transmission • Many transactions (DB access, user events, etc.) • Heavy-duty computation • Usually is statistical testing (automatically generated data). • Try to break the system via heavy loads. • Monitor performance and bottlenecks. • Improve where necessary.

What real system would benefit from stress testing?What non-functional requirements measures might be addressed through stress testing?

Regression Testing • After code is added to a system under development… • … or a change is made to a deployed system… • We re-run previous test suites to see if an unintended side effect has broken something. • Some firms do a daily system build and regression testing to see what came off the rails. • Used constantly in agile methods.

If regression testing frequently uncovers faults, what advice would you give the developers?

Unit vs. Integration Testing • Some testing is aimed at single methods or one class. • Whole systems or large increments? • Have to do both • Top-down (use stubs of lower-level) • Bottom-up (use drivers in lieu of higher-level) • Integration issues: • Regression testing • Interfaces between parts • Coupling

What code integration problems have you encountered in the past?

Mutation Testing • Aimed at getting adequate test data set. • If program works with these data then you have confidence the program is correct. • To see if data set is adequate, try it with intentional mutations of the program. • Test should fail. If not, you don’t have an adequate data set.

Mutation Testing Definitions of propositions: D: The data set adequately tests the program. R: The program runs correctly. M: The program used in testing is a mutation of the target program. The line of reasoning: (M and D)  R’ The program should run incorrectly. R  (M and D)' R  (M' or D') If the program does run, we must not be using a mutation or the test data are inadequate.

Coverage Testing • Test cases developed to maximally cover the code in some sense. • (Partition testing “covers” input partitions, but here it’s code coverage that is the goal.) • Systems have failed because some instructions were never executed in tests. • Might want to try to execute as many statements as possible in testing.

Coverage Testing • Might want to ensure that every decision statement (if, switch, while, etc.) is executed. • Or that every pair consisting of a variable definition and its use in computation is covered. • Or that every pair consisting of a variable definition and its use in a decision is covered. • Use software tools for this kind of testing.

What other kind of code coverage could be defined?

McCabe Metric • Graph theoretic measure of code complexity. • Has implications for code coverage. • Turn all decisions into binary decisions. • if (x > 0) is a binary decision. • if (x > 0 && y < 10) needs to be broken into two decision “nodes” • All straight-line blocks of statements are made into single nodes.

McCabe Metric “Cyclomatic Complexity” # edges – # nodes + 2 9 edges, 8 nodes, so McCabe metric is 3 = number of enclosed regions plus background

McCabe Metric • Gives maximum number of test cases needed to execute all statements. • What else is this measure good for?

User Testing • Goals include determining: • Usability • Learnability • Correctness (verification) • Match with needs (validation) • Requirements (from prototypes) • Participants: • Real potential users • Handy stand-ins

User Testing • Can be done • Throughout development • At delivery (part of acceptance testing) • After deployment • Users use an executable prototype, a functioning increment, or a complete system. • Higher investment in time and money allows more formal tests. • Even very casual user testing can be very beneficial.

User Testing • Exploratory (“playing” with the system) • Requirements discovery • Style, likability • Semi-formal • Ask user to accomplish some task • Note questions, confusions, etc. • Formal • Controlled setting and method • Times and actions recorded

User Testing • Formal testing should have clear research questions. • What are some examples of formal research questions you might ask about using a web content management system? • What measurements would you want to have? • How would you run a study in a controlled setting (say a usability lab)?

Beta Testing • When have you had contact with this kind of testing? • Why do you think firms employ it? • What difficulties might be associated with this kind of testing? • If you were putting a product into beta testing, what would you do to make the effort pay off?

Static Evaluation • Testing is dynamic, i.e., code is executed. • Static techniques do not involve running code. • The main approaches we’ll look at are: • Formal verification of code correctness • Code inspections and walkthroughs • Static code analysis • Automatic model checking

What do we mean when we say that a technique, tool, or language is “formal”?

Formal Verification • Code is a mathematical or logical entity. • It has well-defined syntax and semantics. • If specifications are formally stated, why can’t we use a formal (proof-based) method to determine whether the code will work to specification? • In this view, reliance on testing is seen as an embarrassment or as a sign of an insufficiently educated computer scientist.

Formal Verification • State formal preconditions and post-conditions. • Basic strategy: preconditions  code post-conditions • Need formal definitions of code semantics. • Simple example: // pre: x > 10 x = x + 5; // post: x > 15 (by addition, assignment, substitution)

Formal Verification • Requires sophisticated person to do. • Can be time consuming. • Proofs can contain human errors. • Doesn’t take into account things like data transmission times, sensor glitches, disk faults, etc. • Where would this be most valuable?

Formal Verification  Unit Testing • Unit testing (say with JUnit) employs pre and post conditions (or assertions): • With these inputs, I should get so-and-so results. • But these are assertions about specific test cases, not general assertions. • A test case can succeed, but the code can be wrong. • In theory, formal verification proves correctness for all cases.

Code Inspections & Walkthroughs • Not exactly the same things, but we’ll combine. • Procedure: • Code is written and distributed. • At the inspection meeting, moderator leads, presenter presents the code (not the author), a scribe records, and inspectors comment on correctness and other features of the code. • Defects and places that need improvement are noted and the author reworks the code.

Code Inspections & Walkthroughs • Similarity to formal verification. • Specifications and the code semantics are central. • Presenter and/or author are trying to “prove” that the code is correct. • Inspectors are evaluating whether the details of the code will lead to desired outcomes. • Can be very effective in finding errors, but is not so formal that one needs special skills.

Software Quality Assurance