CS115 Class 16: Testing (continued)

CS115 Class 16: Testing (continued) • Due today • Review: No Silver Bullet • For next time • Deliverable: Unit Tests – one per group member • Review: The New Methodology • start thinking about software process • how much “process” do you need? • is waterfall good? is agile better? • for what kind of project? • 10 week, UI-intensive, web services, critical infrastructure, evolving requirements, ...

Standard Testing Questions • How shall we generate/select test cases? • Did this test execution succeed or fail? • How do we know when to stop testing? • budget • bug trends • statement coverage, branch coverage, etc

2. Was this test execution correct?

What is an Oracle? • Oracle = alternative implementation of the spec. • Examples of oracles • The “eyeball oracle” • Expensive, not dependable, lack of automation • A prototype, or sub-optimal implementation • E.g., bubble-sort as oracle for quick sort output input Program compare correct output Oracle

Record-and-Replay Oracles • Record prior runs • Test recording is usually very fragile • Breaks if environment changes anything • E.g., location, background color of textbox • More generally, automation tools cannot generalize • They literally record exactly what happened • If anything changes, the test breaks • A hidden strength of manual testing • Because people are doing the tests, ability to adapt tests to slightly modified situations is built-in

Result Checking • Easy to check the result of some algorithms • E.g., computing roots of polynomials, vs. checking that the result is correct • E.g., executing a query, vs. checking that the results meet the conditions • Not easy to check that you got all results though ! output input check Program

Assertions • Use assert(...) liberally • Documents important invariants • Makes your code self-checking • And does it on every execution ! • You still have to worry about coverage • May need to write functions that check invariants • zcheck() –compiler directive that checks for zero value of loop count or dimension in statement following (FORTRAN) • Opinion: most programmers don’t use assert enough

Assert example switch(suit) { case Suit.CLUBS: ... break; case Suit.DIAMONDS: ... break; case Suit.HEARTS: ... break; case Suit.SPADES: ... } default: assert false : suit; //programmer believes only four values possible for suit

Another assert example for control flow invariant void foo() { for (...) { if (...) return; } assert false; //Execution should never reach this point! }

3. How shall we generate/select tests? Could focus on code coverage. What else?

Black and White Box Testing • Black-Box • Don’t “see inside” to code • Specification/requirements driven • Equivalence class, boundary, & output-driven tests • White-Box (sometimes “glass box”) • See “inside the box” • Source code driven • Statement, branch, path coverage

Scope • Unit • focus on correctness of module, component or class • use both white- and black-box • Integration • focus on correctness of composition of units • mostly black-box, some white-box • System (entire application) • overall correctness, performance, robustness • black-box • Acceptance (purchase/deployment trigger) • fitness for use by single customer • black-box and negotiation • Alpha/Beta (market drivers to quality)

Random Testing • About ¼ of Unix utilities crash when fed random input strings • Up to 100,000 characters • See Fuzz Testing paper on web page • http://www.cs.wisc.edu/~bart/fuzz/fuzz.html

A Favorite Bug csh !0%8f • ! is the history lookup operator • No command beginning with 0%8f • csh passes an error “0%8f: Not found” to an error printing routine • Which prints it with printf() • Lesson: user input should never go to printf()

What Fuzz Says About Unix • What sort of bugs does random testing find? • Buffer overruns • Format string errors • Signed/unsigned characters • Failure to handle return codes • Race conditions • Nearly all of these are problems with C ! • Mostly fixed in Java • Except for races

(Semi-)Exhaustive Testing • Folk wisdom • If there is a bug, there is a small test case that exhibits the bug • Example • You have a list processing function with a bug • What is the size of the smallest list that exhibits the bug? • Answer: probably much closer to 3 than to 1,000,000

(Semi-)Exhaustive Testing • Idea • Run code systematically on small examples • Key is systematic • Try all examples up to a certain size • If data space too large for some values, randomize • E.g., all lists of length 5 or less • With randomly chosen integer element values

Regression Testing • Idea • When you find a bug, • Write a test that exhibits the bug, • And always run that test when the code changes, • So that the bug doesn’t reappear • Without regression testing, it is surprising how often old bugs reoccur

Nightly Build • Build and regression test the system regularly • Every night • From SVN document repository • Why? Because it is easier to fix problems earlier • Easier to find the cause after one change than after 1,000 • Avoids new code from building on the buggy code • Helps build discipline: “test then commit” • but testing may be slow

Fault Models • “Intuition” • founded on strong understanding/experience of software, this application, and its environment • common faults in • particular language, eg buffer overruns in C • implementation, eg race conditions in concurrent programs • particular environment: OS, UI, FS, ... • particular design style • sociological (management, process, context) • Develop through repeated • fault analysis & categorization • trend analysis (bug tracking systems) • 80/20 rule: 20% of modules have 80% of errors

x2,y2 x,y x1,y1 Testing Exercise • A program is required to identify whether a rectangle contains a point • The point is identified by its x-y coordinates • The rectangle is identified by two corner points • Draw 3 to 5 interesting cases for this problem • When are two test cases “significantly different”?

Equivalence Class Testing • Method • Identify the entire input data space • Partition into different equivalence classes • Select a data element from each class • Run test case using this data element • Goal of equivalence class testing is to select classes so behavior of SUT can be assumed correct for all data in class if known correct for one data element of class • “fault model”

Equivalence Class Test Examples • Input space: integers from 1 to 100 • Partition: • normal input: [1, 100] • error input: [- infinity, 0] [101, infinity] • Test cases: one from each class: -1, 42, 200 • Input space: calendar dates • Partition: several dimensions needed here • valid and invalid dates, leap years, 30/31 day months, ... • Test cases: one from each class

Boundary Value Analysis • Special case of equivalence class testing • Choose equivalence classes • Test data is also taken from class boundaries • normally values at either side of each boundary • Previous example • partitions: [-infinity, 0] [1, 100] [101, infinity] • boundary values: 0, 1, 100, 101

x2,y2 x,y x1,y1 Test Planning Exercise Revisited • Reconsider the rectangle problem • Create test cases using equivalence class analysis • Invent a faulty implementation • Are your test cases adequate?

Common faults/defects and strategies to use • Logical conditions incorrect – equivalence class/boundary conditions • Calc in wrong part of loop – branch testing • non-terminating loop – branch testing • Preconditions not set up – outside of bounds • Null conditions not handled – equivalence class/boundary conditions • Off-by-one – boundary conditions • Operator precedence – inspections • Inappropriate algorithm – inspections • Numerical algorithm incorrect – boundary conditions • Deadlocks/livelocks/race conditions - inspections

Summary • Testing is hard • If done manually, also very expensive and boring • Use inspections! • Will save more time in testing and debugging • A number of techniques can make testing effective • Randomized testing • Exhaustive testing on small examples • Regression testing • Nightly build

Design for Testability • Avoid unpredictable results • No unnecessary non-deterministic behavior • Design in self-checking • At appropriate places have system check its own work • Asserts • May require adding some redundancy to the code • Have a test interface • Minimize interactions between features • Number of interactions can easily grow huge • Rich breeding ground for bugs

“Best Practices” Session • By now, you have now developed some expertise in your particular specialization • (tester, coder, documenter, facilitator) • Group by specialization to discuss • what knowledge you’ve gained • what works, what doesn’t • tips to help other teams • Short (5min) presentation at end

CS115 Class 16: Testing (continued)