David l dill stanford university
Download
1 / 38

STP: A Decision Procedure for Bit-vectors and Arrays - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

David L. Dill Stanford University. STP: A Decision Procedure for Bit-vectors and Arrays. Software analysis tools present unique challenges for decision procedures. Theories must match programming language semantics Operations are on bit-vectors, not integers

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' STP: A Decision Procedure for Bit-vectors and Arrays' - noe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
David l dill stanford university

David L. Dill

Stanford University

STP: A Decision Procedure for Bit-vectors and Arrays


Software analysis tools present unique challenges for decision procedures
Software analysis tools present unique challenges for decision procedures

  • Theories must match programming language semantics

    • Operations are on bit-vectors, not integers

    • Arrays (for modelling memories)

  • Must handle very large inputs with

    • Many array reads

    • Deeply nested array writes

    • Many linear equations

    • Many variables

  • Decision procedure is called many times.


What went before
What went before decision procedures

  • Series of decision procedures: SVC, CVC, CVCL

  • All of these had combinations of first-order theories

    • Equality

    • Uninterpreted functions and predicates

    • Boolean connectives

    • Linear arithmetic over real numbers (and integers, in the case of CVCL)

    • But not quantifiers.

  • CVCL was in use in EXE (or it’s predecessor – Dawson Engler research group)


Combination of theories
Combination of theories decision procedures

  • The core strategy of SVC, CVC, CVCL was based on dynamically breaking down formulas into conjunctions of “atomic formulas”

    • Atomic formulas have no Boolean connectives (correspond to propositional variables).

    • Recursively assert/deny alpha (deny = assert negation of)

    • Simplify after assertion

    • When simplified formula is conjunction of literals, use special decision procedures.


Sat vs cvcl
SAT vs. CVCL decision procedures

  • CVC/CVCL used CHAFF-like SAT solver to choose splitting variables

  • … but puts lots of slow stuff in the inner loop!


STP decision procedures

  • CVCL was already used by Engler’s group, but was unfixably slow.

    • There existed many examples generated by Engler

    • Made correctness/performance testing easy.

  • Vijay Ganesh and I decide to try a different approach, inspired by UCLID (Seshia & Bryant).

    • Put SAT at the bottom, with unmodified inner loop.

    • Preprocess formula for higher-level reasoning (bit-vectors and arrays).


STP decision procedures

  • Decides satisfiability of formulas over

    • Bit-vector Terms

      • Constants

      • +, -, *, (signed) div, (signed) mod

      • Concatenation, Extraction

      • Left/Right Shift, Sign-extend, bitwise-Booleans

    • Array Terms

      • Read(Array, index)

      • Write(Array, index, val)

      • But no array equality

    • Predicates: =, signed & unsigned comparisons

  • If satisfiable, produces a model.


Comparison with saturn
Comparison with Saturn decision procedures

  • STP is a separate “component” (can be stand-alone, or used through an API).

    • Programming language or other tool is separate

  • General input language (can define 23-bit bit-vector types if you want).

  • Signed/unsigned encoded in operators, not in data types.

  • No “points to”, heap, etc.

  • Implements signed/unsigned multiply, divide, remainder (but no floating point).


Some projects using stp
Some projects using STP decision procedures

  • STP has evolved (maintained by someone I don’t know in Australia)

  • Several projects have used it.

    • EXE : Bug Finder by Dawson Engler, Cristian Cadar and others at Stanford

    • Klee : Cadar, Dunbar, Engler

    • MINESWEEPER: Bug Finder by Dawn Song and her group at CMU


Main ideas of stp
Main Ideas of STP decision procedures

  • Eager translation to CNF with word-level pre-processing

    • Theories not in “inner loop” of SAT solver, unlike Nelson-Oppen approaches (e.g. CVCL).

  • Abstraction-Refinement for arrays.

    • Laziness to counterbalance eagerness

  • Solve linear formulas mod 2n in P-time


Bitvector theory
Bitvector theory decision procedures

  • Data type: BV(n), where n is constant.

  • Almost all machine bitvector operations

    • Change length (sign extended and not).

    • Concatenate bitvectors, extract bits from BVs.

    • Signed and unsigned arithmetic: +, -, *, /, %, <,>, etc.

    • AND, OR, NOT, XOR, etc.


Array theory
Array theory decision procedures

  • Array type: BV(n) -> B(m)

  • read(A, i) – value of A[i]

  • write(A, i, v) – copy of A updated at index i with value v.

  • No destructive modification – write returns a new array, which is updated old array.

  • Sometimes used to represent heap storage.


Array theory1
Array theory decision procedures

Identities:

read(write(A, i, v), j) = ite(i = j, v, read(A, j))

STP has a limited theory

No comparison of whole arrays, e.g. write(A, i, v) = write(A, j, w)

This makes things easier (see http://sprout.stanford.edu/PAPERS/LICS-SBDL-2001.pdf

if you don’t like “easier”).


Implementation
Implementation decision procedures

  • DAG representation of expressions

    • Same subexpression structure = same pointer.

    • Maintained by hashing.

    • No destructive operations on DAGs (modification requires new nodes).

  • Makes substitution, equality check very efficient.

  • Often log size of tree expression representation.


DAGs decision procedures

  • All recursive traversals must be “memoized”

    • Want traversal to be linear in size of DAG, not tree.

    • First thing to think about when functions don’t finish: “Maybe I messed up memoization.”

  • Updating nodes near the root less expensive than updating nodes near leaves.


Stp architecture
STP Architecture decision procedures

Input Formula

Substitutions

Simplifications

Linear Solving

Array Abstraction

BitBlast

CNF Coversion

Refinement Loop

SAT Solver

Result


Substitution is important
Substitution is Important decision procedures

  • Inputs often have many simple equations (in EXE, this is how constant arrays are defined):

    • x = 4

    • A[4] = e

  • Early pass to substitute these

    • Allows constant evaluation

    • Enables other optimizations

    • Reduces non-constant indices in array reads.


Word level simplifications
Word-level simplifications decision procedures

  • Many simple local rewrites:

    • Bitwise Boolean identities (e.g. a AND !a = 00000, a XOR a = 11111, (a + b)[0:3] = a[0:3]+b[0:3], etc.

    • Generally, avoid distributive laws because they cause blow-up.

    • Be careful about “destroying sharing” in DAG.

  • Flatten trees of associative operators

  • Sort operands of commutative operation

    • Tweak ordering for easy simplification

    • Expressions numbered in order of creation (children < parents). Use this order to sort, but:

    • Put constants first (F AND a AND …), (1 + 3 + x +)

    • Arrange for x, !x to be adjacent (also x, -x)


Array reads
Array Reads decision procedures

v0 = t0

v1 = t1

.

.

.

vn = tn

(i1=i0) => v1=v0

(i2=i0) => v2=v0

(i2=i1) => v2=v1

Replace array reads

with fresh variables

and axioms

Read(A,i0) = t0

Read(A,i1) = t1

.

.

.

Read(A,in) = tn

  • Problem : O(n2) axioms added, n is number of read indices

  • Lethal, if n is large: n = 10000, # of axioms: ~ 100 million

  • Blowup seems hard to avoid (e.g. UCLID).

  • This is “aliasing” from another perspective.


Abstraction refinement in stp
Abstraction/Refinement in STP decision procedures

  • Pose problem as a conjunction of formulas

    • E.g., instantiated array read axioms

  • Abstraction: solve for a proper subset of the formulas

    • E.g., omit array read axioms

  • Early exit if:

    • Unsatisfiable

    • Satisfiable and model actually satisfies unabstracted formula.

  • Otherwise, add some omitted formulas to the abstracted formula and solve again.

    • If at least one of these formulas is false in the model, that model will not be regenerated.


Abstraction refinement algorithm for array reads
Abstraction-Refinement Algorithm for Array Reads decision procedures

Input

After Abstraction

Read(A,0)=0

Read(A,i)=1

v0 = 0

vi = 1

SATSolver

Counterexample

i = 0

v0 = 0

vi = 1

Refinement Step:

Add Axiom

(i=0) => vi = 0

Rerun SATSolver

i=1

vi=1

Check Input

on Assignment

False:

Read(A,0)=0

Read(A,0)=1


Experience with read abstraction refinement
Experience with decision proceduresRead Abstraction-Refinement

Works well, even in satisfiable cases

  • Satisfier often finds a model that minimizes aliasing

  • Few axioms need to be added during refinement

  • Typical number of refinement loops : < 3


The problem with array writes
The Problem decision procedureswith Array Writes

  • Standard transformation

    read(write(A, i, v), j) =

    ite(i=j, v, read(A, i)) <- “if-then-else”

    causes term blow-up

  • Many different read expressions share write sub-terms.

  • O(n*m) blow-up in expr DAG

    • n is write term nesting levels

    • m is number of read indices


The problem with array writes1

= decision procedures

R

R

j

k

W

W

i1

v1

A

i0

The Problem with Array Writes

If (i1=j) v1

elsif (i0=j) v0

else R(A,j)

If (i1=k) v1

elsif(i0=k) v0

else R(A,j)

R(W(W(A,i0,v0),i1,v1),j) =

R(W(W(A,i0,v0),i1,v1),k)

=

=

ite

ite

=

=

v1

v1

i1

i1

j

k

ite

ite

R

=

=

v0

v0

A

i0

i0

j

k

v0

j


Write transformation
Write transformation decision procedures

Replace

read(write(A, i, v), j)

with a fresh variable (e.g., v0)

and “axiom”

v0 = ite(i=j, v, read(A, j))

Abstraction omits axiom.


Abstraction refinement algorithm for array writes
Abstraction-Refinement Algorithm decision proceduresfor Array Writes

R(W(A,i,v),j)= 0

R(W(A,i,v),k)=1

i = j /= k

v /= 0

After Abstraction

v1=0

v2=1

i = j /=k

v/=0

SATSolver

v1=0, v2=1

i = j =0, k=1,

v = 1

False:

R(W(A,0,1),0)=0

R(W(A,0,1),1)=1

0 = 0 /= 1

1 /= 0

Refinement Step

Add Axiom to SAT

v1=ite(i=j,v,R(A,j))

UNSAT

Check model

on original

formula


Experimental results array writes
Experimental Results: decision proceduresArray Writes

Examples courtesy Dawn Song (CMU) and David Molnar (Berkeley)


Algorithm for solving linear bit vector equations
Algorithm for Solving decision proceduresLinear Bit-vector Equations

  • Inspired by Barrett et al., DAC 1998

  • Basic Idea in STP

    • Solve for a variable and substitute it away

    • If cannot eliminate a whole variable, eliminate as many bits as possible.

  • Previous Work

    • Mostly variants of Gaussian Elimination

    • Solve-and-substitute is more convenient in a general decision procedure.


Algorithm for solving linear bit vector equations1
Algorithm for Solving decision proceduresLinear Bit-vector Equations

(3 bits)

3x + 4y + 2z = 0

2x + 2y + 2 = 0

4y + 2x + 2z = 0

Solve for x in

first eqn:

3-1 mod 8 = 3,

(3 bits)

2y + 4z + 2 = 0

4y + 6z = 0

Substitute x

x = 4y + 2z


Algorithm for solving linear bit vector equations2
Algorithm for Solving decision proceduresLinear Bit-vector Equations

All Coeffs Even

No Inverse

(3 bits)

2y + 4z + 2 = 0

4y + 6z = 0

(2 bits)

y[1:0] + 2z[1:0] + 1 = 0

2y[1:0] + 3z[1:0] = 0

Divide by 2

Ignore high-order

bits


Algorithm for solving linear bit vector equations3
Algorithm for Solving decision proceduresLinear Bit-vector Equations

(2 bits)

y[1:0] + 2z[1:0] + 1 = 0

2y[1:0] + 3z[1:0] = 0

Solve for y[1:0]

(2 bits)

y[1:0]=2z + 3

(2 bits)

3z[1:0] + 2 = 0

Substitute y[1:0]


Algorithm for solving linear bit vector equations4
Algorithm for Solving decision proceduresLinear Bit-vector Equations

(2 bits)

3z[1:0] + 2 = 0

Solve for z[1:0]

Solution (3 bits):

z[1:0] = 2

y[1:0] = 2z[1:0] + 3 = 3

y = y’ @ 2

z = z’ @ 3

x = 4y + 2z

(2 bits)

z[1:0]=2


Experimental results solver for linear equations
Experimental Results: decision proceduresSolver for Linear Equations


Equivalence checking of block cipher implementations
Equivalence checking of block cipher implementations decision procedures

  • Problem: Prove correctness of block ciphers (e.g., AES).

    • Constant number of loop iterations

    • No interesting heap usage

  • Approach:

    • Given two implementations, AES1 and AES2

    • Turn them into big expressions by unrolling loops, etc.

    • Prove that AES1(x) ≠ AES2(x) is unsatisfiable.


How can this possibly work
How can this possibly work? decision procedures

  • Round 1

  • Round 1

Many block ciphers consist of a fixed sequence of “rounds”.

Implementations of rounds in two algorithms may vary, but bits “between” rounds are equal.

So, we only have to prove individual rounds are equivalent.

  • Round 2

  • Round 2

  • Round 3

  • Round 3

  • Round 4

  • Round 4


Equivalence checking
Equivalence checking decision procedures

Equal for many test

inputs.

Only try to prove

Equivalence when nodes pass this test.

inputs


Equivalence checking1
Equivalence checking decision procedures

Prove equal using

STP


Equivalence checking2
Equivalence checking decision procedures

Replace b by a

everywhere in DAG.

This makes higher-level expressions more similar.

inputs


ad