1 / 38

# STP: A Decision Procedure for Bit-vectors and Arrays - PowerPoint PPT Presentation

David L. Dill Stanford University. STP: A Decision Procedure for Bit-vectors and Arrays. Software analysis tools present unique challenges for decision procedures. Theories must match programming language semantics Operations are on bit-vectors, not integers

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

STP: A Decision Procedure for Bit-vectors and Arrays

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

David L. Dill

Stanford University

### Software analysis tools present unique challenges for decision procedures

• Theories must match programming language semantics

• Operations are on bit-vectors, not integers

• Arrays (for modelling memories)

• Must handle very large inputs with

• Deeply nested array writes

• Many linear equations

• Many variables

• Decision procedure is called many times.

### What went before

• Series of decision procedures: SVC, CVC, CVCL

• All of these had combinations of first-order theories

• Equality

• Uninterpreted functions and predicates

• Boolean connectives

• Linear arithmetic over real numbers (and integers, in the case of CVCL)

• But not quantifiers.

• CVCL was in use in EXE (or it’s predecessor – Dawson Engler research group)

### Combination of theories

• The core strategy of SVC, CVC, CVCL was based on dynamically breaking down formulas into conjunctions of “atomic formulas”

• Atomic formulas have no Boolean connectives (correspond to propositional variables).

• Recursively assert/deny alpha (deny = assert negation of)

• Simplify after assertion

• When simplified formula is conjunction of literals, use special decision procedures.

### SAT vs. CVCL

• CVC/CVCL used CHAFF-like SAT solver to choose splitting variables

• … but puts lots of slow stuff in the inner loop!

### STP

• CVCL was already used by Engler’s group, but was unfixably slow.

• There existed many examples generated by Engler

• Vijay Ganesh and I decide to try a different approach, inspired by UCLID (Seshia & Bryant).

• Put SAT at the bottom, with unmodified inner loop.

• Preprocess formula for higher-level reasoning (bit-vectors and arrays).

### STP

• Decides satisfiability of formulas over

• Bit-vector Terms

• Constants

• +, -, *, (signed) div, (signed) mod

• Concatenation, Extraction

• Left/Right Shift, Sign-extend, bitwise-Booleans

• Array Terms

• Write(Array, index, val)

• But no array equality

• Predicates: =, signed & unsigned comparisons

• If satisfiable, produces a model.

### Comparison with Saturn

• STP is a separate “component” (can be stand-alone, or used through an API).

• Programming language or other tool is separate

• General input language (can define 23-bit bit-vector types if you want).

• Signed/unsigned encoded in operators, not in data types.

• No “points to”, heap, etc.

• Implements signed/unsigned multiply, divide, remainder (but no floating point).

### Some projects using STP

• STP has evolved (maintained by someone I don’t know in Australia)

• Several projects have used it.

• EXE : Bug Finder by Dawson Engler, Cristian Cadar and others at Stanford

• Klee : Cadar, Dunbar, Engler

• MINESWEEPER: Bug Finder by Dawn Song and her group at CMU

### Main Ideas of STP

• Eager translation to CNF with word-level pre-processing

• Theories not in “inner loop” of SAT solver, unlike Nelson-Oppen approaches (e.g. CVCL).

• Abstraction-Refinement for arrays.

• Laziness to counterbalance eagerness

• Solve linear formulas mod 2n in P-time

### Bitvector theory

• Data type: BV(n), where n is constant.

• Almost all machine bitvector operations

• Change length (sign extended and not).

• Concatenate bitvectors, extract bits from BVs.

• Signed and unsigned arithmetic: +, -, *, /, %, <,>, etc.

• AND, OR, NOT, XOR, etc.

### Array theory

• Array type: BV(n) -> B(m)

• read(A, i) – value of A[i]

• write(A, i, v) – copy of A updated at index i with value v.

• No destructive modification – write returns a new array, which is updated old array.

• Sometimes used to represent heap storage.

### Array theory

Identities:

STP has a limited theory

No comparison of whole arrays, e.g. write(A, i, v) = write(A, j, w)

This makes things easier (see http://sprout.stanford.edu/PAPERS/LICS-SBDL-2001.pdf

if you don’t like “easier”).

### Implementation

• DAG representation of expressions

• Same subexpression structure = same pointer.

• Maintained by hashing.

• No destructive operations on DAGs (modification requires new nodes).

• Makes substitution, equality check very efficient.

• Often log size of tree expression representation.

### DAGs

• All recursive traversals must be “memoized”

• Want traversal to be linear in size of DAG, not tree.

• First thing to think about when functions don’t finish: “Maybe I messed up memoization.”

• Updating nodes near the root less expensive than updating nodes near leaves.

### STP Architecture

Input Formula

Substitutions

Simplifications

Linear Solving

Array Abstraction

BitBlast

CNF Coversion

Refinement Loop

SAT Solver

Result

### Substitution is Important

• Inputs often have many simple equations (in EXE, this is how constant arrays are defined):

• x = 4

• A[4] = e

• Early pass to substitute these

• Allows constant evaluation

• Enables other optimizations

• Reduces non-constant indices in array reads.

### Word-level simplifications

• Many simple local rewrites:

• Bitwise Boolean identities (e.g. a AND !a = 00000, a XOR a = 11111, (a + b)[0:3] = a[0:3]+b[0:3], etc.

• Generally, avoid distributive laws because they cause blow-up.

• Be careful about “destroying sharing” in DAG.

• Flatten trees of associative operators

• Sort operands of commutative operation

• Tweak ordering for easy simplification

• Expressions numbered in order of creation (children < parents). Use this order to sort, but:

• Put constants first (F AND a AND …), (1 + 3 + x +)

• Arrange for x, !x to be adjacent (also x, -x)

v0 = t0

v1 = t1

.

.

.

vn = tn

(i1=i0) => v1=v0

(i2=i0) => v2=v0

(i2=i1) => v2=v1

with fresh variables

and axioms

.

.

.

• Problem : O(n2) axioms added, n is number of read indices

• Lethal, if n is large: n = 10000, # of axioms: ~ 100 million

• Blowup seems hard to avoid (e.g. UCLID).

• This is “aliasing” from another perspective.

### Abstraction/Refinement in STP

• Pose problem as a conjunction of formulas

• E.g., instantiated array read axioms

• Abstraction: solve for a proper subset of the formulas

• E.g., omit array read axioms

• Early exit if:

• Unsatisfiable

• Satisfiable and model actually satisfies unabstracted formula.

• Otherwise, add some omitted formulas to the abstracted formula and solve again.

• If at least one of these formulas is false in the model, that model will not be regenerated.

### Abstraction-Refinement Algorithm for Array Reads

Input

After Abstraction

v0 = 0

vi = 1

SATSolver

Counterexample

i = 0

v0 = 0

vi = 1

Refinement Step:

(i=0) => vi = 0

Rerun SATSolver

i=1

vi=1

Check Input

on Assignment

False:

Works well, even in satisfiable cases

• Satisfier often finds a model that minimizes aliasing

• Few axioms need to be added during refinement

• Typical number of refinement loops : < 3

### The Problemwith Array Writes

• Standard transformation

ite(i=j, v, read(A, i)) <- “if-then-else”

causes term blow-up

• Many different read expressions share write sub-terms.

• O(n*m) blow-up in expr DAG

• n is write term nesting levels

• m is number of read indices

=

R

R

j

k

W

W

i1

v1

A

i0

### The Problem with Array Writes

If (i1=j) v1

elsif (i0=j) v0

else R(A,j)

If (i1=k) v1

elsif(i0=k) v0

else R(A,j)

R(W(W(A,i0,v0),i1,v1),j) =

R(W(W(A,i0,v0),i1,v1),k)

=

=

ite

ite

=

=

v1

v1

i1

i1

j

k

ite

ite

R

=

=

v0

v0

A

i0

i0

j

k

v0

j

### Write transformation

Replace

with a fresh variable (e.g., v0)

and “axiom”

v0 = ite(i=j, v, read(A, j))

Abstraction omits axiom.

### Abstraction-Refinement Algorithmfor Array Writes

R(W(A,i,v),j)= 0

R(W(A,i,v),k)=1

i = j /= k

v /= 0

After Abstraction

v1=0

v2=1

i = j /=k

v/=0

SATSolver

v1=0, v2=1

i = j =0, k=1,

v = 1

False:

R(W(A,0,1),0)=0

R(W(A,0,1),1)=1

0 = 0 /= 1

1 /= 0

Refinement Step

v1=ite(i=j,v,R(A,j))

UNSAT

Check model

on original

formula

### Experimental Results:Array Writes

Examples courtesy Dawn Song (CMU) and David Molnar (Berkeley)

### Algorithm for SolvingLinear Bit-vector Equations

• Inspired by Barrett et al., DAC 1998

• Basic Idea in STP

• Solve for a variable and substitute it away

• If cannot eliminate a whole variable, eliminate as many bits as possible.

• Previous Work

• Mostly variants of Gaussian Elimination

• Solve-and-substitute is more convenient in a general decision procedure.

(3 bits)

3x + 4y + 2z = 0

2x + 2y + 2 = 0

4y + 2x + 2z = 0

Solve for x in

first eqn:

3-1 mod 8 = 3,

(3 bits)

2y + 4z + 2 = 0

4y + 6z = 0

Substitute x

x = 4y + 2z

### Algorithm for SolvingLinear Bit-vector Equations

All Coeffs Even

No Inverse

(3 bits)

2y + 4z + 2 = 0

4y + 6z = 0

(2 bits)

y[1:0] + 2z[1:0] + 1 = 0

2y[1:0] + 3z[1:0] = 0

Divide by 2

Ignore high-order

bits

### Algorithm for SolvingLinear Bit-vector Equations

(2 bits)

y[1:0] + 2z[1:0] + 1 = 0

2y[1:0] + 3z[1:0] = 0

Solve for y[1:0]

(2 bits)

y[1:0]=2z + 3

(2 bits)

3z[1:0] + 2 = 0

Substitute y[1:0]

### Algorithm for SolvingLinear Bit-vector Equations

(2 bits)

3z[1:0] + 2 = 0

Solve for z[1:0]

Solution (3 bits):

z[1:0] = 2

y[1:0] = 2z[1:0] + 3 = 3

y = y’ @ 2

z = z’ @ 3

x = 4y + 2z

(2 bits)

z[1:0]=2

### Equivalence checking of block cipher implementations

• Problem: Prove correctness of block ciphers (e.g., AES).

• Constant number of loop iterations

• No interesting heap usage

• Approach:

• Given two implementations, AES1 and AES2

• Turn them into big expressions by unrolling loops, etc.

• Prove that AES1(x) ≠ AES2(x) is unsatisfiable.

### How can this possibly work?

• Round 1

• Round 1

Many block ciphers consist of a fixed sequence of “rounds”.

Implementations of rounds in two algorithms may vary, but bits “between” rounds are equal.

So, we only have to prove individual rounds are equivalent.

• Round 2

• Round 2

• Round 3

• Round 3

• Round 4

• Round 4

### Equivalence checking

Equal for many test

inputs.

Only try to prove

Equivalence when nodes pass this test.

inputs

### Equivalence checking

Prove equal using

STP

### Equivalence checking

Replace b by a

everywhere in DAG.

This makes higher-level expressions more similar.

inputs