1 / 51

Dynamically Discovering Likely Program Invariants to Support Program Evolution

Dynamically Discovering Likely Program Invariants to Support Program Evolution. Presented By: Wes Toland, Geoff Gerfin. Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin. Outline. Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants

chandler
Download Presentation

Dynamically Discovering Likely Program Invariants to Support Program Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin

  2. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  3. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  4. Invariants • What are invariants? • A constraint over a variable’s values • A relationship between multiple variable values. • Defined as mathematical predicates (Example: n >= 0)

  5. Importance of Invariants • In program development: • Refining a specification • Aid in runtime checking • In software evolution: • Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. • Violation of invariant results in a bug.

  6. Daikon • Programmers do not usually explicitly annotate or document code with invariants. • Daikon proposes to automatically determine program invariants and report them in a meaningful manner.

  7. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  8. Daikon’s Infrastructure

  9. Daikon’s Infrastructure: Original Program i,s := 0,0; do i != n -> i,s := i + 1, s + b[i] od

  10. Daikon’s Infrastructure: Instrumented Program print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

  11. Daikon’s Infrastructure: Trace File print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

  12. Daikon’s Infrastructure: Invariants Determined Invariants 1.) n >= 0 2.) s = SUM(B) 3.) i >= 0

  13. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  14. Code Instrumentation (1/6)

  15. Code Instrumentation (2/6) • Daikon’s front-end modifies source code to trace specific variables at points of interest: • Function entry points (pre-conditions) • Function exit points (post-conditions) • Loop heads (loop invariants) • The trace data is used as input to Daikon’s back-end, which is used to infer invariants

  16. Code Instrumentation (3/6) • Daikon uses an abstract syntax tree for code instrumentation. • What is an AST?

  17. Code Instrumentation (4/6) How could this be useful for code instrumentation?

  18. Code Instrumentation (5/6) • AST is used by Daikon to determine which variables are in scope at each point of interest. • Code is inserted into program point to write the values for all variables in scope to a file in a specific format.

  19. Code Instrumentation (6/6) • Status variables are created for each original program variable and are passed along throughout function calls. • Status variables: • Modification timestamp (Used to prevent garbage output) • Smallest and largest indices (for arrays and pointers) • Linked list flag • Status variables are updated when a program manipulates its associated variable.

  20. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  21. Data Trace Generation (1/2)

  22. Data Trace Generation (2/2) Instrumented Code Data Trace DB print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

  23. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  24. Inferring Invariants

  25. Types of Invariants (1/3)

  26. Types of Invariants (2/3)

  27. Types of Invariants (3/3) • Single-sequence variables: • Range (min and max values) • Ordering (increasing or decreasing) • Invariants over all elements (Given array[size], all elements >= c) • Two-sequence variables • Linear relationship ( y[100] = a*x[100] + b ) • Comparison ( x < y where x[i] = y[i]-1 ) • Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] • Sequence and numeric variables: • Membership: ( i € s)

  28. Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?

  29. Inferring Invariants (2/5)

  30. Inferring Invariants (3/5) • Daikon can identify from this trace that for all samples, x = orig(x)

  31. Inferring Invariants (4/5) • Daikon can identify from this trace that for all samples, y = orig(y) = 1.

  32. Inferring Invariants (5/5) • Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. • Is this invariant too limited?

  33. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  34. Uses of Invariants (1/2) • Explicated Data Structures • Clearly define undocumented data structures without looking through code. • Confirmed and contradicted expectations • Assert an understanding of code functionality. • Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). • Bug Discovery

  35. Uses of Invariants (2/2) • Identify limited use of procedures • Identify procedures that have unnecessary functionality based on the input. • Demonstrate test suite inadequacy • Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. • Validate program changes • After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. • If they match, the programmer can be more confident that the modifications did not have adverse effects.

  36. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  37. Evaluation Overview • Asserting Daikon’s Invariant Detection • Performance Evaluation • Stability Evaluation

  38. Asserting Daikon’s Invariant Detection • Simple accuracy evaluation of Daikon • A sample program was taken from The Science of Programming • The “gold standard” of invariant identification • Program had documented precondition, postcondition, and loop variant specifications • Daikon reproduced all documented specifications plus some additional invariants: • Erroneously omitted (omitted in documentation) • Information about the test suite • Extraneous (Redundant invariants)

  39. Performance Evaluation • Siemen’s replace program is used over varying test cases and number of variables. • Most important factor: number of variables over which invariants are checked • This is not the total number of program variables, rather it is the number of variables in a program point’s scope. • Invariant detection time grows quadratically with this factor. • Additionally, invariant detection time grows linearly with test suite size.

  40. Performance Evaluation

  41. Stability Evaluation • Number of test cases affects different types of invariants in different ways: • Note that the identical unary invariants do not vary much as the number of test cases are increased. • However, the number of differing unary invariants varies largely.

  42. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  43. Related Work (1/2) • Static Approaches to Inferring Invariants • Operate on program text, not test runs (symbolic execution) [Hoare69]. • Advantages • Reported invariants are true for any program run (but not necessarily exhaustive). • Theoretically, static approaches can detect all sound invariants if a program is run to convergence. • Limitations • Omit properties that are true but uncomputable. • Pointer manipulation is impossible to approximate.

  44. Related Work (2/2) • Dynamic Approaches to Inferring Invariants • Event traces [Blum93]. • Uses a state machine instead of AST. • Advantage: Lower data storage requirements. • Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93].

  45. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

  46. Limitations (1/2) • Accuracy of inferred invariants depends on quality and completeness of test cases • Additional test cases could provide data that will lead to additional invariants to be inferred. • Additionally, invariants may only hold true for cases in test suite • Daikon produces gigabytes of trace data, even while analyzing trivial programs. • The initial prototype implementation ran out of memory when testing 5,542 test cases

  47. Limitations (2/2) • The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. • Daikon does not yet follow arbitrary-length paths through recursive structures. • Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). • Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. • Exact memory locations could be traced. • This approach has many more obstacles.

  48. Future Work (1/2) • Ernst et. al. planned on increasing relevance and performance after this work by: • Reducing redundant invariance. • Removing relations from variables that can be statically proven to be unrelated. • Ignoring variables that have not been assigned since their last instrumentation. • Converting the implementation of Daikon from Python to C. • Checking fewer invariants (useful when programmer wants to focus on specific part of code)

  49. Future Work (2/2) • Since paper publication: • Additional front-end support: • 2002: Perl (dfepl front-end implementation) • 2005: C++ (Kvasir front-end implementation) • 2003: Various performance improvements: • Handle data trace files incrementally • Original implementation stored entire trace file in memory • 2005: IDE Plug-in support for Visual Studio

  50. Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

More Related