Dynamically Discovering Likely Program Invariants to Support Program Evolution

Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin

Outline • Introduction • Overview • Code Instrumentation • Data Trace Generation • Inferring Invariants • Uses of Invariants • Evaluation • Related Work • Limitations & Future Work • Discussion

Invariants • What are invariants? • A constraint over a variable’s values • A relationship between multiple variable values. • Defined as mathematical predicates (Example: n >= 0)

Importance of Invariants • In program development: • Refining a specification • Aid in runtime checking • In software evolution: • Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. • Violation of invariant results in a bug.

Daikon • Programmers do not usually explicitly annotate or document code with invariants. • Daikon proposes to automatically determine program invariants and report them in a meaningful manner.

Daikon’s Infrastructure

Daikon’s Infrastructure: Original Program i,s := 0,0; do i != n -> i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Instrumented Program print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Trace File print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Invariants Determined Invariants 1.) n >= 0 2.) s = SUM(B) 3.) i >= 0

Code Instrumentation (1/6)

Code Instrumentation (2/6) • Daikon’s front-end modifies source code to trace specific variables at points of interest: • Function entry points (pre-conditions) • Function exit points (post-conditions) • Loop heads (loop invariants) • The trace data is used as input to Daikon’s back-end, which is used to infer invariants

Code Instrumentation (3/6) • Daikon uses an abstract syntax tree for code instrumentation. • What is an AST?

Code Instrumentation (4/6) How could this be useful for code instrumentation?

Code Instrumentation (5/6) • AST is used by Daikon to determine which variables are in scope at each point of interest. • Code is inserted into program point to write the values for all variables in scope to a file in a specific format.

Code Instrumentation (6/6) • Status variables are created for each original program variable and are passed along throughout function calls. • Status variables: • Modification timestamp (Used to prevent garbage output) • Smallest and largest indices (for arrays and pointers) • Linked list flag • Status variables are updated when a program manipulates its associated variable.

Data Trace Generation (1/2)

Data Trace Generation (2/2) Instrumented Code Data Trace DB print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

Inferring Invariants

Types of Invariants (1/3)

Types of Invariants (2/3)

Types of Invariants (3/3) • Single-sequence variables: • Range (min and max values) • Ordering (increasing or decreasing) • Invariants over all elements (Given array[size], all elements >= c) • Two-sequence variables • Linear relationship ( y[100] = a*x[100] + b ) • Comparison ( x < y where x[i] = y[i]-1 ) • Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] • Sequence and numeric variables: • Membership: ( i € s)

Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?

Inferring Invariants (2/5)

Inferring Invariants (3/5) • Daikon can identify from this trace that for all samples, x = orig(x)

Inferring Invariants (4/5) • Daikon can identify from this trace that for all samples, y = orig(y) = 1.

Inferring Invariants (5/5) • Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. • Is this invariant too limited?

Uses of Invariants (1/2) • Explicated Data Structures • Clearly define undocumented data structures without looking through code. • Confirmed and contradicted expectations • Assert an understanding of code functionality. • Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). • Bug Discovery

Uses of Invariants (2/2) • Identify limited use of procedures • Identify procedures that have unnecessary functionality based on the input. • Demonstrate test suite inadequacy • Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. • Validate program changes • After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. • If they match, the programmer can be more confident that the modifications did not have adverse effects.

Evaluation Overview • Asserting Daikon’s Invariant Detection • Performance Evaluation • Stability Evaluation

Asserting Daikon’s Invariant Detection • Simple accuracy evaluation of Daikon • A sample program was taken from The Science of Programming • The “gold standard” of invariant identification • Program had documented precondition, postcondition, and loop variant specifications • Daikon reproduced all documented specifications plus some additional invariants: • Erroneously omitted (omitted in documentation) • Information about the test suite • Extraneous (Redundant invariants)

Performance Evaluation • Siemen’s replace program is used over varying test cases and number of variables. • Most important factor: number of variables over which invariants are checked • This is not the total number of program variables, rather it is the number of variables in a program point’s scope. • Invariant detection time grows quadratically with this factor. • Additionally, invariant detection time grows linearly with test suite size.

Performance Evaluation

Stability Evaluation • Number of test cases affects different types of invariants in different ways: • Note that the identical unary invariants do not vary much as the number of test cases are increased. • However, the number of differing unary invariants varies largely.

Related Work (1/2) • Static Approaches to Inferring Invariants • Operate on program text, not test runs (symbolic execution) [Hoare69]. • Advantages • Reported invariants are true for any program run (but not necessarily exhaustive). • Theoretically, static approaches can detect all sound invariants if a program is run to convergence. • Limitations • Omit properties that are true but uncomputable. • Pointer manipulation is impossible to approximate.

Related Work (2/2) • Dynamic Approaches to Inferring Invariants • Event traces [Blum93]. • Uses a state machine instead of AST. • Advantage: Lower data storage requirements. • Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93].

Limitations (1/2) • Accuracy of inferred invariants depends on quality and completeness of test cases • Additional test cases could provide data that will lead to additional invariants to be inferred. • Additionally, invariants may only hold true for cases in test suite • Daikon produces gigabytes of trace data, even while analyzing trivial programs. • The initial prototype implementation ran out of memory when testing 5,542 test cases

Limitations (2/2) • The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. • Daikon does not yet follow arbitrary-length paths through recursive structures. • Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). • Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. • Exact memory locations could be traced. • This approach has many more obstacles.

Future Work (1/2) • Ernst et. al. planned on increasing relevance and performance after this work by: • Reducing redundant invariance. • Removing relations from variables that can be statically proven to be unrelated. • Ignoring variables that have not been assigned since their last instrumentation. • Converting the implementation of Daikon from Python to C. • Checking fewer invariants (useful when programmer wants to focus on specific part of code)

Future Work (2/2) • Since paper publication: • Additional front-end support: • 2002: Perl (dfepl front-end implementation) • 2005: C++ (Kvasir front-end implementation) • 2003: Various performance improvements: • Handle data trace files incrementally • Original implementation stored entire trace file in memory • 2005: IDE Plug-in support for Visual Studio

Dynamically Discovering Likely Program Invariants to Support Program Evolution

Dynamically Discovering Likely Program Invariants to Support Program Evolution

Presentation Transcript

Dynamically Detecting Likely Program Invariants

Market Evolution Program

Market Evolution Program

Competent Program Evolution

HHW Program Evolution

Discovering and Exploiting Program Phases

Evolution of Uval's work program: from support to evaluation

Market Evolution Program

Market Evolution Program

Dynamically load libraries during program execution.

Using Likely Program Invariants to Detect Hardware Errors

Market Evolution Program

Market Evolution Program

Dynamically Discovering Likely Program Invariants

Dynamically Discovering Likely Program Invariants to Support Program Evolution

Using Likely Program Invariants to Detect Hardware Errors

Market Evolution Program

Market Evolution Program

Automated Support for Program Refactoring Using Invariants

Enriched Support Program Indigenous Enriched Support Program