Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation Patrice GodefroidShuvendu K. LahiriCindy Rubio-González Microsoft Research University of Wisconsin– Madison International Static Analysis Symposium – September 2011

Background • Systematic Dynamic Test Generation (= DART) New inputs Run program Symbolically execute program Constraints Negate and solve constraints Recorded trace Valid input And the process repeats (possibly forever!) • Used in many tools • EXE, CUTE, SAGE, PEX, KLEE, BitScope, Apollo, etc.

SAGE @ Microsoft • #1 application for SMT solvers today (CPU usage) • 1st whitebox fuzzer for security testing • 200+ machines (since 2008) • 1 billion+ constraints • 100s of apps, 100s of security bugs • Example: Win7 file fuzzing • Found ~1/3 of all fuzzing bugs • Millions of dollars saved • for Microsoft + time/energy for the world

Compositional Test Generation Compositional Dynamic Test Generation • Compute summaries that can be reused later • Avoid retesting • Can provide the same path coverage exponentially faster! Systematically executing all feasible paths does not scale

Example of Function Summary 1 intis_positive(int x) { if (x > 0) return 1; return 0; 4 } Where ret denotes the value returned by the function is_positive

Function Summaries • Function summary for a function f • Logic formula over constraints • Derived by successive iterations and defined as a disjunction of formulas Conjunction of constraints on the outputs of f Conjunction of constraints on the inputs of f • Can be computed automatically from the path constraint for the intraprocedural path

Must Summaries • Symbolic execution of large programs imprecise • Complex program statements • Calls to operating-system and library functions Assume hash is a complex or unknown function • Concrete values simplified constraints • Under-approximate path constraints • Summaries become must summaries Assume if g is invoked with y = 45, then hash(45) = 987 1 intg(int x, int y) { if ((x > 0) && (hash(y) > 10)) return1; 4 return 0; 5 } Under-approximate with smaller precondition

Must Summaries • Defined as quadruple ⟨lp, P, lq, Q⟩ where: Ip P summary precondition holding at lp Qsummary postcondition holding at lq lq Prog

Some Facts About Summaries • Time to be produced: weeks/months • Number of summaries: millions • Number of instructions executed between lp and lq: can be hundreds of thousands

Incremental Compositional Test Generation • Have to start from scratch if there is a small code change Incremental compositional test generation • As in smart/selective regression testing • Reuse summaries still valid in new program • Recompute invalid summaries

Must Summary Checking • Given a valid must summary for a program and a new version of the program, is the summary still valid for the new version? • Intraprocedural summaries • locations lpandlqare in a same function f • function fdoes not return between lp to lq when the summary is generated

Some proposals • Naïve • For each summary, record executed instructions • Too expensive, ~100K of instructions executed • Runtime overhead • Our proposal • Verifystatically what summaries are valid in order to reuse them • Less precise than recomputing summaries from scratch, but cheaper

Algorithms 1. Static Change Impact Analysis 2. Predicate-Sensitive Change Impact Analysis 3. Must Summary Validity Checking Analysis

Phase 1: Static Change Impact Analysis • Impact analysis of code changes in the control-flow and call graphs of the program Ip Ip lq lq Old program New program

Modified Instructions and Functions • Instruction i of a program Prog is modified if: • i is changed or deleted in Prog’ or • Its ordered set of immediate successors has changed • Function f in a program Prog is modified if f: • contains a modified instruction • calls a modified function • calls an unknown function

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Construct call graph for the program 1

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find modified and unknown functions Map summaries, construct control-flow graphs Find indirectly modified and unknown functions 3 2 4 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find summaries as valid or invalid 5 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S

Phase 2: Predicate-Sensitive Change Impact Analysis • Exploit the predicates P and Q in a summary Ip P: x>0y<10 if(x > 0) Invalidated by Phase 1 if (y==0) w = w + 1 w = 0 w = 1 lq ... Q: w = 0 Old program

Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { ... if (x > 0) { if (y == 10) w++; // MODIFIED else w = 0; } else { w = 1; // MODIFIED } ... Ip P: x>0y<10 lq Q:w = 0 return; } Old program

Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { 1 gotolp; ... assume P; modified = false; if (x > 0) { if (y == 10) { modified = true; w++; } else w = 0; } else { modified = true; w = 1; } assert(Q ¬modified); ... 2 Ip P: x>0y<10 3 3 4 lq Q:w = 0 return; } Instrumented old program

Phase 2: Predicate-Sensitive Change Impact Analysis • Check assertion in instrumented code does not fail for all possible inputs • Verification-condition based program verifier • Create logic formula from program with assertions • Check formula validity using theorem prover • If valid, the assertion does not fail in any execution

Phase 3: Must Summary Validity Checking • Check must summary validity against some code, independently of code changes Ip P: x < 0 if(x < 0) Invalidated by Phase 1 and Phase 2 if (y < 0) r = 1 r = 0 r = 4 w = 1 lq ... Q: r 0 Old program New program

Phase 3: Must Summary Validity Checking void bar() { ... if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } ... Ip P: x < 0 lq Q: r 0 return; } New program

Phase 3: Must Summary Validity Checking void bar() { 1 reach_lq = false; gotolp; ... assume P; if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } assert(Q); reach_lq = true; ... assert(reach_lq); 2 Ip P: x < 0 3 lq Q: r 0 4 return; } Instrumented new program

Phase 3: Must Summary Validity Checking • Check that assertions hold in the instrumented program for all possible inputs

Result Validated summaries can be reused • Because of soundness Invalidated summaries are discarded and need to be recomputed • New tests are generated to cover their preconditions Algorithms can be used in isolation or in a pipeline

Experimental Results

Implementation Details Old DLL NewDLL Old DLL NewDLL Summaries Produced by SAGE Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Library to statically analyze Windows binaries Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Used in pipeline or isolation Valid/Invalid Summaries

Implementation Details Procedure (x86) Summary ⟨lp,P,lq,Q⟩ Vulcan Translator from X86 to BoogiePL Sound translation Instrumented BPL file (Phase 2 or Phase 3) Boogie/Z3

Benchmarks • Image parsers embedded in Windows • ANI, GIF and JPEG • Ran SAGE to generate summaries (small sample) • 286 for ANI, 288 for GIF and 517 for JPEG • Identified the DLLs involved • 3 for ANI, 4 for GIF and 8 for JPEG • Compared old version against a randomly picked newer version • Delta ~1 to 3 years

Difference Between Program Versions Modified functions: 3% - 10% Indirectly modified functions: 30% - 45% Indirectly unknown functions: 60% - 74% Unknown functions: 27% - 37%

Applying Phases in Isolation # Validated Summaries # Validated Summaries 31% 58% 85% 30% 69% 92% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 94% 33%

Applying Phases in Pipeline FashionPhase 1 → Phase 2 → Phase 3 # Validated Summaries # Validated Summaries 58% 27% 4% 69% 25% 1% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 35% 1%

Running Time (Isolation) # Minutes # Minutes # Minutes Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking

Running Time Phase 1 → Phase 2 → Phase 3 # Minutes 43 min 28min 41min Preliminary results show that statically validating must summaries is up to 20 times faster than recomputing them! Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking

Summary • Formulated the problem of statically validating must summaries • Described three approaches for validating must summaries • Presented a preliminary evaluation on three large Windows image parsers • Demonstrated the effectiveness of static must summary checking • Validated hundreds of must summaries in minutes

Questions? Old DLL NewDLL Old DLL NewDLL Summaries Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Valid/Invalid Summaries

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation

Presentation Transcript

Validating and Improving Test-Case Effectiveness

Concurrent Test Generation

Validating Wireless Protocol Conformance Test Cases

Feedback Directed Dynamic Recompilation for Statically Compiled Languages

Automatic Assumption Generation for Compositional Verification

COMPOSITIONAL

Mining Gigabytes of Dynamic Traces for Test Generation Suresh Thummalapenta

Automatic Test Generation

Test Data Generation

Dynamic Glyph Generation

Automated Test Generation

Hierarchical Test Generation

Chapter 14 : Statically Linked Shared Libraries Dynamic Shared Libraries

Concurrent Test Generation

Statically Indeterminate

Must Know for SOL Test

Incremental Update for a Compositional SDN Hypervisor

RUGRAT: Runtime Test Case Generation using Dynamic Compilers

Automatic Assumption Generation for Compositional Verification

Test generation

Automatic Test Generation

Test Data Generation