Improving the Invariant Model of DIDUCE

Improving the Invariant Model of DIDUCE

Improving the Invariant Model of DIDUCE

Presentation Transcript

### Improving the Invariant Model of DIDUCE

CS 343 -- Research Proposal

12 June 2002

Katy Innes and Andy Westbrook

Overview

- Review of DIDUCE
- What’s wrong with DIDUCE’s current model?
- How do we propose to fix it?
- Related work
- Other presentations
- Summer break!

Review of DIDUCE

- A dynamic invariant checker
- Instruments user-specified portions of the Java Bytecode for a particular program
- Maintains a hypothetical invariant for the value of many variables at selected program points
- Does so using a a bitmask which is the “meet” of all values seen so far
- The meet operator used in this case is the bitwise-or operator

What’s Wrong with DIDUCE?

- The invariant model
- It is heavily associated with the binary representation of integers
- If a variable is allowed to take on values 1 and 4, it must also be allowed to take on value 5

- This model is of little use for floating point numbers
- Empirically, this model has been shown to be meaningful with reference types only for distinguishing between null and non-null

For Example

- The paper mentions a bug found in MAJC where a state variable takes on a new state
- This variable is 0 for empty, 1 for occupied, or 2 for pending
- The error occurs when it takes on 2 for the first time

- But, if the variable took on 1 for empty, 2 for occupied, and 3 for pending DIDUCE would not find this bug

- Would DIDUCE be better if it could handle either case?

Our Improvement (Perhaps)

- Rather than use a bit vector for each invariant, we will use a set of ranges
- For example, we might associate the range 1-2 with the previous example
- We might have multiple ranges, or ranges of width one

- To handle reference types, we would assign each class type a number and treat reference types as integers taking on the number corresponding to the type to which they point

Confidence

- We developed a measurement of confidence for each range in an invariant
- It is
- This rewards small ranges that contain a large number of observed values

Reporting Violations

- When we observe a value that does not fall into a range, we report a violation
- These violations are sorted by the confidence of the invariant model violated.
- This confidence is the mean of the confidences of the ranges defining the invariant

- We also create a new range for that invariant, containing just the observed value

Efficiency Improvements

- To improve efficiency ranges are merged
- For two ranges to be merged, the difference in the confidence between the initial range with higher confidence and the newer range must be less than some empirically determined constant
- This will result in merging ranges that are close together and have similar confidence

- We will also limit the number of ranges per program point and will drop ranges with low confidence

More Efficiency Improvements

- Deinstrumentation
- When the program has been running for suitably long period of time and has no high confidence ranges for a particular invariant, we stop checking that invariant
- We hypothesize that this will eliminate checking of variables that hold random or arbitrary values or can take on most of the values allowed by their type

Related Work

- Daikon- tracks all observed values and then, after completion, determines invariants
- Requires extensive training data
- This provides better invariants than our proposal but at a much, much higher cost

- A number of languages (e,g. Ada) support range-based subtyping
- This supports our hypothesis that ranges are meaningful invariants

