1 / 38

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation. Patrice Godefroid Shuvendu K. Lahiri Cindy Rubio-González. Microsoft Research University of Wisconsin – Madison.

sven
Download Presentation

Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statically Validating Must Summaries for Incremental Compositional Dynamic Test Generation Patrice GodefroidShuvendu K. LahiriCindy Rubio-González Microsoft Research University of Wisconsin– Madison International Static Analysis Symposium – September 2011

  2. Background • Systematic Dynamic Test Generation (= DART) New inputs Run program Symbolically execute program Constraints Negate and solve constraints Recorded trace Valid input And the process repeats (possibly forever!) • Used in many tools • EXE, CUTE, SAGE, PEX, KLEE, BitScope, Apollo, etc.

  3. SAGE @ Microsoft • #1 application for SMT solvers today (CPU usage) • 1st whitebox fuzzer for security testing • 200+ machines (since 2008) • 1 billion+ constraints • 100s of apps, 100s of security bugs • Example: Win7 file fuzzing • Found ~1/3 of all fuzzing bugs • Millions of dollars saved • for Microsoft + time/energy for the world

  4. Compositional Test Generation Compositional Dynamic Test Generation • Compute summaries that can be reused later • Avoid retesting • Can provide the same path coverage exponentially faster! Systematically executing all feasible paths does not scale

  5. Example of Function Summary 1 intis_positive(int x) { if (x > 0) return 1; return 0; 4 } Where ret denotes the value returned by the function is_positive

  6. Function Summaries • Function summary for a function f • Logic formula over constraints • Derived by successive iterations and defined as a disjunction of formulas Conjunction of constraints on the outputs of f Conjunction of constraints on the inputs of f • Can be computed automatically from the path constraint for the intraprocedural path

  7. Must Summaries • Symbolic execution of large programs imprecise • Complex program statements • Calls to operating-system and library functions Assume hash is a complex or unknown function • Concrete values simplified constraints • Under-approximate path constraints • Summaries become must summaries Assume if g is invoked with y = 45, then hash(45) = 987 1 intg(int x, int y) { if ((x > 0) && (hash(y) > 10)) return1; 4 return 0; 5 } Under-approximate with smaller precondition

  8. Must Summaries • Defined as quadruple ⟨lp, P, lq, Q⟩ where: Ip P summary precondition holding at lp Qsummary postcondition holding at lq lq Prog

  9. Some Facts About Summaries • Time to be produced: weeks/months • Number of summaries: millions • Number of instructions executed between lp and lq: can be hundreds of thousands

  10. Incremental Compositional Test Generation • Have to start from scratch if there is a small code change Incremental compositional test generation • As in smart/selective regression testing • Reuse summaries still valid in new program • Recompute invalid summaries

  11. Must Summary Checking • Given a valid must summary for a program and a new version of the program, is the summary still valid for the new version? • Intraprocedural summaries • locations lpandlqare in a same function f • function fdoes not return between lp to lq when the summary is generated

  12. Some proposals • Naïve • For each summary, record executed instructions • Too expensive, ~100K of instructions executed • Runtime overhead • Our proposal • Verifystatically what summaries are valid in order to reuse them • Less precise than recomputing summaries from scratch, but cheaper

  13. Algorithms 1. Static Change Impact Analysis 2. Predicate-Sensitive Change Impact Analysis 3. Must Summary Validity Checking Analysis

  14. Phase 1: Static Change Impact Analysis • Impact analysis of code changes in the control-flow and call graphs of the program Ip Ip lq lq Old program New program

  15. Modified Instructions and Functions • Instruction i of a program Prog is modified if: • i is changed or deleted in Prog’ or • Its ordered set of immediate successors has changed • Function f in a program Prog is modified if f: • contains a modified instruction • calls a modified function • calls an unknown function

  16. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Construct call graph for the program 1

  17. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find modified and unknown functions Map summaries, construct control-flow graphs Find indirectly modified and unknown functions 3 2 4 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S

  18. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Phase 1: Static Change Impact Analysis Find summaries as valid or invalid 5 IM IM IM IU IU U M S M S IM IU M IU S IU IU U S IU S S

  19. Phase 2: Predicate-Sensitive Change Impact Analysis • Exploit the predicates P and Q in a summary Ip P: x>0y<10 if(x > 0) Invalidated by Phase 1 if (y==0) w = w + 1 w = 0 w = 1 lq ... Q: w = 0 Old program

  20. Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { ... if (x > 0) { if (y == 10) w++; // MODIFIED else w = 0; } else { w = 1; // MODIFIED } ... Ip P: x>0y<10 lq Q:w = 0 return; } Old program

  21. Phase 2: Predicate-Sensitive Change Impact Analysis void foo() { 1 gotolp; ... assume P; modified = false; if (x > 0) { if (y == 10) { modified = true; w++; } else w = 0; } else { modified = true; w = 1; } assert(Q ¬modified); ... 2 Ip P: x>0y<10 3 3 4 lq Q:w = 0 return; } Instrumented old program

  22. Phase 2: Predicate-Sensitive Change Impact Analysis • Check assertion in instrumented code does not fail for all possible inputs • Verification-condition based program verifier • Create logic formula from program with assertions • Check formula validity using theorem prover • If valid, the assertion does not fail in any execution

  23. Phase 3: Must Summary Validity Checking • Check must summary validity against some code, independently of code changes Ip P: x < 0 if(x < 0) Invalidated by Phase 1 and Phase 2 if (y < 0) r = 1 r = 0 r = 4 w = 1 lq ... Q: r 0 Old program New program

  24. Phase 3: Must Summary Validity Checking void bar() { ... if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } ... Ip P: x < 0 lq Q: r 0 return; } New program

  25. Phase 3: Must Summary Validity Checking void bar() { 1 reach_lq = false; gotolp; ... assume P; if (x < 0) { if (y < 0) r = 1; else { r = 4; // r = 0 in old code } } assert(Q); reach_lq = true; ... assert(reach_lq); 2 Ip P: x < 0 3 lq Q: r 0 4 return; } Instrumented new program

  26. Phase 3: Must Summary Validity Checking • Check that assertions hold in the instrumented program for all possible inputs

  27. Result Validated summaries can be reused • Because of soundness Invalidated summaries are discarded and need to be recomputed • New tests are generated to cover their preconditions Algorithms can be used in isolation or in a pipeline

  28. Experimental Results

  29. Implementation Details Old DLL NewDLL Old DLL NewDLL Summaries Produced by SAGE Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Library to statically analyze Windows binaries Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Used in pipeline or isolation Valid/Invalid Summaries

  30. Implementation Details Procedure (x86) Summary ⟨lp,P,lq,Q⟩ Vulcan Translator from X86 to BoogiePL Sound translation Instrumented BPL file (Phase 2 or Phase 3) Boogie/Z3

  31. Benchmarks • Image parsers embedded in Windows • ANI, GIF and JPEG • Ran SAGE to generate summaries (small sample) • 286 for ANI, 288 for GIF and 517 for JPEG • Identified the DLLs involved • 3 for ANI, 4 for GIF and 8 for JPEG • Compared old version against a randomly picked newer version • Delta ~1 to 3 years

  32. Difference Between Program Versions Modified functions: 3% - 10% Indirectly modified functions: 30% - 45% Indirectly unknown functions: 60% - 74% Unknown functions: 27% - 37%

  33. Applying Phases in Isolation # Validated Summaries # Validated Summaries 31% 58% 85% 30% 69% 92% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 94% 33%

  34. Applying Phases in Pipeline FashionPhase 1 → Phase 2 → Phase 3 # Validated Summaries # Validated Summaries 58% 27% 4% 69% 25% 1% Total Validated: 256/286 (90%) Total Validated: 274/288 (95%) # Validated Summaries Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking Total Validated: 501/517 (97%) 61% 35% 1%

  35. Running Time (Isolation) # Minutes # Minutes # Minutes Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking

  36. Running Time Phase 1 → Phase 2 → Phase 3 # Minutes 43 min 28min 41min Preliminary results show that statically validating must summaries is up to 20 times faster than recomputing them! Phase 1: Change Impact Phase 2: Predicate Sensitive Phase 3: Validity Checking

  37. Summary • Formulated the problem of statically validating must summaries • Described three approaches for validating must summaries • Presented a preliminary evaluation on three large Windows image parsers • Demonstrated the effectiveness of static must summary checking • Validated hundreds of must summaries in minutes

  38. Questions? Old DLL NewDLL Old DLL NewDLL Summaries Old DLL NewDLL Map summaries, find modified insts and funcs (C++) Vulcan Phase 1 Change Impact Phase 2 Predicate Sensitive Phase 3 Validity Checking Valid/Invalid Summaries

More Related