Systematic Software Testing Using Test Abstractions

Darko Marinov Systematic Software TestingUsing Test Abstractions RIO Summer School Rio Cuarto, Argentina February 2011

red-black tree web sites 1 s2 0 3 s5 s1 s4 2 s3 XML document Java program <library> <book year=2009> <title>T1</title> <author>A1</author> </book> <book year=2010> <title>T2</title> <author>A2</author> </book></library> /library/book[@year<2010]/title class A { intf; } class B extends A { void m() { super.f = 0; } } Examples of Structurally Complex Data

Running Example: Inheritance Graphs Javaprogram Inheritancegraph interface I { void m(); } interface J { void m(); } class A implements I { void m() {} } class B extends A implements I,J { void m() {} } I J Properties 1. DAG 2. ValidJava A B .

pass 1 0 3 2 fail 2 2 0 0 3 3 3 0 Testing Setup inputs outputs • Examples of code under test • Compiler or refactoring engines, input: programs • Abstract data type, input/output: data structure • Web traversal code, input: web topology • XML processing code, input/output: XML document test oracle testgeneration code

Manual Fully automatic 1 1 1 0 0 0 3 3 3 inputs 2 2 2 2 2 2 0 0 0 3 3 3 tool code Semi automated inputs input setdescription tool Test Generation inputs

Test Abstractions • Each test abstraction describes a set of (structurally complex) test inputs • Example: inheritance graphs with up to N nodes • Workflow • Tester manually writes test abstractions • Tool automatically generates test inputs • Benefits • No need to manually write large number of test inputs • Useful for test generation, maintenance, and reuse

1 0 3 2 2 0 3 2. What language to use for test abstractions? 1. Which test inputsto generate? 3. How to generate test inputs from abstractions? Key Questions for Test Abstractions inputs testabstraction tool 4. How to check test outputs? 5. How to map failures to faults?

Which Inputs to Generate? • Bounded-exhaustive testing • Generate all inputs within given (small) bounds • Rationale: small-scope hypothesis [Jackson & Damon ISSTA’96] • Found bugs in both academia and industry • Java compilers, model checkers… [Gligoric et al. ICSE’10] • Refactoring engines in Eclipse/NetBeans[Daniel et al. FSE’07] • Web traversal code from Google[Misailovic et al. FSE’07] • XPath compiler at Microsoft[Stobie ENTCS’05] • Fault-tree analyzer for NASA[Sullivan et al. ISSTA’04] • Constraint solver, network protocol [Khurshid & Marinov J-ASE’04]

Generating test abstractions [FSE’07, FASE’09] • Testers describe how to generate test inputs • Tool: execution generators inputs execute bounds How to Describe/Generate Inputs? • Filtering test abstractions [SOFTMC’01, ASE’01, FME’02, ISSTA’02, OOPSLA’02, SAT’03, MIT-TR’03, J-ASE’04, SAT’05, LDTA’06, ALLOY’06, ICSE-DEMO’07, STEP’07, FSE’07, ISSTA’08, ICST’09, CSTVA’10] • Testers describe what properties test inputs satisfy • Tool: search filters inputs search bounds

Previously Explored Separately • Some properties easier/harder as filters/generators • Key challenge for combining: search vs. execution

UDITA: Combined Both [ICSE’10] • Extends Java with non-deterministic choices • Found bugs in Eclipse/NetBeans/javac/JPF/UDITA

Outline • Introduction • Example • Filtering Test Abstractions • Generating Test Abstractions • UDITA: Combined Filtering and Generating • Evaluation • Conclusions

Example: Inheritance Graphs Properties • DAG • Nodes in the graph should have no direct cycle • ValidJava • Each class has at most one superclass • All supertypes of interfaces are interfaces classIG { Node[ ]nodes; int size; static class Node { Node[ ] supertypes; boolean isClass; } }

Example Valid Inheritance Graphs G1 G2 G4 I J J C C G3 G5 A I J A D B C A B C B

Example Invalid Inheritance Graphs G1 G2 G4 I J J C C G3 G5 cycle A I J A D interface extends class B C A B C B class extends two classes class extends interface class implements class

(In)valid Inputs • Valid inputs = desired inputs • Inputs on which tester wants to test code • Code may produce a regular output or an exception • E.g. for Java compiler • Interesting (il)legal Java programs • No precondition, input can be any text • E.g. for abstract data types • Encapsulated data structure representations • Inputs need to satisfy invariants • Invalid inputs = undesired inputs

filters inputs search bounds Filtering Test Abstractions • Tool requires: • Filters: encode what properties inputs satisfy • Bounds • Tool provides: • All inputs within bounds that satisfy the properties

Language for Filters • Each filter takes an input that can be valid or invalid • Returns a boolean indicating validity • Experience from academia and industry showed that using a new language makes adoption hard • Write filters in standard implementation language (Java, C#, C++…) • Advantages • Familiar language • Existing development tools • Filters may be already present in code • Challenge: generate inputs from imperative filters

boolean isDAG(IG ig) { Set<Node> visited = new HashSet<Node>(); Set<Node> path = new HashSet<Node>(); if (ig.nodes == null || ig.size != ig.nodes.length) return false; for (Node n : ig.nodes) if (!visited.contains(n)) if (!isAcyclic(n, path, visited)) return false; return true; } boolean isAcyclic(Node node, Set<Node> path, Set<Node> visited) { if (path.contains(node)) return false; path.add(node); visited.add(node); for (int i = 0; i < supertypes.length; i++) { Node s = supertypes[i]; for (int j = 0; j < i; j++) if (s == supertypes[j]) return false; if (!isAcyclic(s, path, visited)) return false; } path.remove(node); return true; } Example Filter: DAG Property

1 0 3 2 2 0 3 3 0 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 Input Space • All possible object graphs with an IG root(obeying type declarations) • Natural input spaces relatively easy to enumerate • Sparse: # valid test inputs << # all object graphs

Bounded-Exhaustive Generation • Finite (small) bounds for input space • Number of objects • Values of fields • Generate all valid inputs within given bounds • Eliminates systematic bias • Finds all bugs detectable within bounds • Avoid equivalent (isomorphic) inputs • Reduces the number of inputs • Preserves capability to find all bugs

Example Bounds • Specify number of objects for each class • 1 object for IG: { IG0 } • 3 objects for Node: { N0, N1, N2 } • Specify set of values for each field • For size: { 0, 1, 2, 3 } • For supertypes[i]: { null, N0, N1, N2 } • For isClass: { false, true } • Previously written with special library • In UDITA: non-deterministic calls for initialization

IG initialize(int N) { IG ig = new IG(); ig.size = N; ObjectPool<Node> pool = new ObjectPool<Node>(N); ig.nodes = new Node[N]; for (int i = 0; i < N; i++) ig.nodes[i] = pool.getNew(); for (Node n : nodes) { int num = getInt(0, N − 1); n.supertypes = new Node[num]; for (int j = 0; j < num; j++) n.supertypes[j] = pool.getAny(); n.isClass = getBoolean(); } return ig; } Bounds Encoded in UDITA • Java with a few extensions for non-determinism static void mainFilt(int N) { IG ig = initialize(N); assume(isDAG(ig)); assume(isValidJava(ig)); println(ig); }

IG0 IG0 N0 N0 N2 N2 N1 N1 size size s[0] s[0] s[1] s[1] isC isC s[0] s[0] s[1] s[1] isC isC s[0] s[0] s[1] s[1] isC isC 3 N1 N1 2 null null 1 null null 3 null null null null null null false false false N0 N0 N0 N0 N0 N0 true true true N1 N1 N1 N1 N1 N1 N2 N2 N2 N2 N2 N2 Example Input Space • 1 IG object, 3 Node objects: total 14 fields(“nodes” and array length not shown) 0 1 2 3 4 * (4 * 4 * 2 * 3)3 > 219 inputs, but < 25 valid

Input Generation: Search • Naïve “solution” • Enumerate entire (finite) input space, run filter on each input, and generate input if filter returns true • Infeasible for sparse input spaces (#valid << #total) • Must reason about behavior of filters • Our previous work proposed a solution • Dynamically monitors execution of filters • Prunes large parts of input space for each execution • Avoids isomorphic inputs

generators inputs execute bounds Generating Test Abstractions • Tool provides: • Generators: encode how to generate inputs • Bounds • Tool requires: • All inputs within bounds (that satisfy the properties)

Example Generator: DAG Property N0 void generateDAG(IG ig) { for (int i = 0; i < ig.nodes.length; i++) { int num = getInt(0, i); ig.nodes[i].supertypes = new Node[num]; for (int j = 0, k = −1; j < num; j++) { k = getInt(k + 1, i − (num − j)); ig.nodes[i].supertypes[j] = ig.nodes[k]; } } } N1 N2 void mainGen(int N) { IG ig = initialize(N); generateDAG(ig); generateValidJava(ig); println(ig); }

boolean isDAG(IG ig) { Set<Node> visited = new HashSet<Node>(); Set<Node> path = new HashSet<Node>(); if (ig.nodes == null || ig.size != ig.nodes.length) return false; for (Node n : ig.nodes) if (!visited.contains(n)) if (!isAcyclic(n, path, visited)) return false; return true; } boolean isAcyclic(Node node, Set<Node> path, Set<Node> visited) { if (path.contains(node)) return false; path.add(node); visited.add(node); for (int i = 0; i < supertypes.length; i++) { Node s = supertypes[i]; for (int j = 0; j < i; j++) if (s == supertypes[j]) return false; if (!isAcyclic(s, path, visited)) return false; } path.remove(node); return true; } Filtering vs. Generating: DAG Property void generateDAG(IG ig) { for (int i = 0; i < ig.nodes.length; i++) { int num = getInt(0, i); ig.nodes[i].supertypes = new Node[num]; for (int j = 0, k = −1; j < num; j++) { k = getInt(k + 1, i − (num − j)); ig.nodes[i].supertypes[j] = ig.nodes[k]; } } } • Filtering arguably harder than generating for DAG • 20+ vs. 10 LOC • However, the opposite for ValidJava property

UDITA Requirement: Allow Mixing filters generators inputs UDITA bounds

Other Requirements for UDITA • Language for test abstractions • Ease of use: naturally encode properties • Expressiveness: encode a wide range of properties • Compositionality: from small to large test abstractions • Familiarity: build on a popular language • Algorithms and tools for efficient generation of concrete tests from test abstractions • Key challenge: search (filtering) and execution (generating) are different paradigms

UDITA Solution • Based on a popular language (Java) extended with non-deterministic choices • Base language allows writing filters for search/filtering and generators for execution/generating • Non-deterministic choices for bounds and enumeration • Non-determinism for primitive values: getInt/Boolean • New language abstraction for objects: ObjectPool class ObjectPool<T> { public ObjectPool<T>(int size, boolean includeNull) { ... } public T getAny() { ... } public T getNew() { ... } }

UDITA Implementation • Implemented in Java PathFinder (JPF) • Explicit state model checker from NASA • JVM with backtracking capabilities • Has non-deterministic call: Verify.getInt(int lo, int hi) • Default/eager generation too slow • Efficient generation by delayed choice execution • Publicly available JPF extensionhttp://babelfish.arc.nasa.gov/trac/jpf/wiki/projects/jpf-delayed • Documentation on UDITA web page http://mir.cs.illinois.edu/udita

Delayed Choice Execution • Postpone choice of values until needed • Lightweight symbolic execution • Avoids problems with full blown symbolic execution • Even combined symbolic and concrete execution has problems, especially for complex object graphs Eager int x = getInt(0, N); … y = x; … … = y; // non-copy Delayed int x = Susp(0, N); … y = x; … force(y); … = y; // non-copy non-determinism

Object Pools • Eager implementation of getNew/getAny by reduction to (eager) getInt • Simply making getInt delayed does not work because getNew is stateful • We developed a novel algorithm for delayed execution of object pools • Gives equivalent results as eager implementation • Polynomial-time algorithm to check satisfiability of getNew/getAny constraints (disequality from a set) • Previous work on symbolic execution typically encodes into constraints whose satisfiability is NP-hard (disjunctions)

Evaluation • UDITA • Compared Delayed and Eager execution • Compared with a previous Generating approach • Compared with Pex (white-box) • Case studies on testing with test abstractions • Some results with filtering test abstractions • Some results with generating test abstractions • Tested refactoring engines in Eclipse and NetBeans, Java compilers, JPF, and UDITA

Eager vs. Delayed .

Generating vs. UDITA: LOC

Generating vs. UDITA: Time

UDITA vs. Pex • Compared UDITA with Pex • State-of-the-art symbolic executionengine from MSR • Uses a state-of-the-art constraint solver • Comparison on a set of data structures • White-box testing: tool monitors code under test • Finding seeded bugs • Result: object pools help Pex to find bugs • Summer internship to include object size in Pex

Some Case Studies with Filtering • Filtering test abstractions used at Microsoft • Enabled finding numerous bugs in tested apps • XML tools • XPath compiler (10 code bugs, test-suite augmentation) • Serialization (3 code bugs, changing spec) • Web-service protocols • WS-Policy (13 code bugs, 6 problems in informal spec) • WS-Routing (1 code bug, 20 problems in informal spec) • Others • SSLStream • MSN Authentication • 

Some Case Studies with Generating • Generating test abstractions used to test Eclipse and NetBeans refactoring engines [FSE’07] • Eight refactorings: target field, method, or class • Wrote about 50 generators • Reported 47 bugs • 21 in Eclipse: 20 confirmed by developers • 26 in NetBeans: 17 confirmed, 3 fixed, 5 duplicates,1 won't fix • Found more but did not report duplicate or fixed • Parts of that work included in NetBeans

pass fail one code <title>T1</title><title>T2</title> <library> <book year=2001> <title>T1</title> <author>A1</author> </book> <book year=2002> <title>T2</title> <author>A2</author> </book> <book year=2003> <title>T3</title> <author>A3</author> </book></library> /library/book[@year<2003]/titl =? another code <title>T1</title><title>T2</title> Typical Testing Scenario • Create a model of test inputs (e.g., Java ASTs) • Write filters/generators for valid inputs • UDITA generates valid inputs • Translate from model to actual inputs (pretty-print) • Run on two code bases, compare outputs filters/generators prettyprinter UDITA bounds

Some More Bugs Found • Eclipse vs. NetBeans refactoring engines • 2 bugs in Eclipse and 2 bugs in NetBeans • Sun javac compiler vs. Eclipse Java compiler • 2 reports (still unconfirmed) for Sun javac • Java PathFinder vs. JVM • 6 bugs in JPF (plus 5 more in an older version) • UDITA Delayed vs. UDITA Eager • Applying tool on (parts of) itself • 1 bug in UDITA (patched since then)

Example Bug Generated program Incorrect refactoring import java.util.List; class Aimplements B, D { public List m() { List l = null; A a = null; l.add(a.m());return l; } } interface D { public List m(); } interface B extends C { public List m(); } interface c { public List m(); } import java.util.List; class Aimplements B, D { public List<List> m() {List<List<List>> l = null; A a = null; l.add(a.m());return l; } } interface D { public List<List> m(); } interface B extends C { public List<List> m(); } interface c { public List<List> m(); }

Summary • Testing is important for software reliability • Necessary step before (even after) deployment • Structurally complex data • Increasingly used in modern systems • Important challenge for software testing • Test abstractions • Proposed model for complex test data • Adopted in industry • Used to find bugs in several real-world applications

Systematic Software Testing Using Test Abstractions