notes on cyclone extended static checking n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Notes on Cyclone Extended Static Checking PowerPoint Presentation
Download Presentation
Notes on Cyclone Extended Static Checking

Loading in 2 Seconds...

play fullscreen
1 / 57

Notes on Cyclone Extended Static Checking - PowerPoint PPT Presentation


  • 403 Views
  • Uploaded on

Notes on Cyclone Extended Static Checking. Greg Morrisett Harvard University. Static Extended Checking: SEX-C. Similar approach to ESC-M3/Java: Calculate a 1st-order predicate describing the machine state at each program point.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Notes on Cyclone Extended Static Checking' - Audrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
notes on cyclone extended static checking

Notes on CycloneExtended Static Checking

Greg Morrisett

Harvard University

static extended checking sex c
Static Extended Checking: SEX-C
  • Similar approach to ESC-M3/Java:
    • Calculate a 1st-order predicate describing the machine state at each program point.
    • Generate verification conditions (VCs) corresponding to run-time checks.
    • Feed VCs to a theorem prover.
    • Only insert check (and issue warning) if prover can't show VC is true.
    • Key goal: needs to scale well (like type-checking) so it can be used on every edit-compile-debug cycle.
example strcpy
Example: strcpy
  • strcpy(char ?d, char ?s)
  • {
  • while (*s != 0) {
  • *d = *s;
  • s++;
  • d++;
  • }
  • *d = 0;
  • }

Run-time checks are inserted to ensure that s and d are not NULL and in bounds.

6 words passed in instead of 2.

better
Better
  • strcpy(char ?d, char ?s)
  • {
  • unsigned i, n = numelts(s);
  • assert(n < numelts(d));
  • for (i=0; i < n && s[i] != 0; i++)
  • d[i] = s[i];
  • d[i] = 0;
  • }

This ought to have no run-time checksbeyond the assert.

even better
Even Better:
  • strncpy(char *d, char *s, uint n)
  • @assert(n < numelts(d) && n <= numelts(s))
  • {
  • unsigned i;
  • for (i=0; i < n && s[i] != 0; i++)
  • d[i] = s[i];
  • d[i] = 0;
  • }

No fat pointers or dynamic checks.

But caller must statically satisfy the pre-condition.

in practice
In Practice:
  • strncpy(char *d, char *s, uint n)
  • @checks(n < numelts(d) && n <= numelts(s))
  • {
  • unsigned i;
  • for (i=0; i < n && s[i] != 0; i++)
  • d[i] = s[i];
  • d[i] = 0;
  • }

If caller can establish pre-condition, no check.

Otherwise, an implicit check is inserted.

Clearly, checks are a limited class of assertions.

results so far
Results so far…
  • For the 165 files (~78 Kloc) that make up the standard libraries and compiler:
    • CLibs: stdio, string, …
    • CycLib: list, array, splay, dict, set, bignum, …
    • Compiler: lex, parse, typing, analyze, xlate to C,…
  • Eliminated 96% of the (static) checks
    • null : 33,121 out of 34,437 (96%)
    • bounds: 13,402 out of 14,022 (95%)
    • 225s for bootstrap compared to 221s with all checks turned off (2% slower) on this laptop.
  • Optimization standpoint: seems pretty good.
not all rosy
Not all Rosy:
  • Don't do as well at array-intensive code.
  • For instance, on the AES reference:
  • 75% of the checks (377 out of 504)
  • 2% slower than all checks turned off.
  • 24% slower than original C code.(most of the overhead is fat pointers)
  • The primary culprit:
    • we are very conservative about arithmetic.
    • i.e., x[2*i+1] will throw us off every time.
challenges
Challenges
  • Assumed I could use off-the-shelf technology.
  • But ran into a few problems:
  • scalable VC generation
    • previously solved problem (see ESC guys.)
    • but entertaining to rediscover the solutions.
  • usable theorem provers
    • for now, rolled our own
    • (not the real focus.)
verification condition generation
Verification-Condition Generation
  • We started with textbook strongest post-conditions:
  • SP[x:=e]A=A[a/x]x=e[a /x] (a fresh)
  • SP[S1;S2]A=SP[S2](SP[S1]A)
  • SP[if (e) S1else S2]A=
  • SP[S1](A e0)  SP[S2](A e=0)
why sp instead of wp
Why SP instead of WP?
  • SP[if (c) skip else fail]A=A c
  • When A  c then we can eliminate the check.
  • Either way, the post-condition is still A c.
  • WP[if (c) skip else fail]A =(c A) c
  • For WP, this will be propagated backwards making it difficult to determine which part of the pre-condition corresponds to a particular check.
1st problem with textbook sp
1st Problem with Textbook SP
  • SP[x:=e]A=A[a/x]x=e[a/x]
  • What if e has effects?
  • In particular, what if e is itself an assignment?
  • Solution: use a monadic interpretation:
  • SP : Exp  Assn  Term  Assn
for example
For Example:
  • SP[x] A =(x, A)
  • SP[e1+e2] A =let (t1,A1) = SP[e1] A
  • (t2,A2) = SP[e2] A1
  • in (t1 + t2, A2)
  • SP[x := e] A = let (t,A1) = SP[e] A
  • in (t[a/x], A1[a/x] x == t[a/x])
or as in haskell
Or as in Haskell
  • SP[x] = return x
  • SP[e1+e2] = do { t1 SP[e1] ;
  • t2 SP[e2] ;
  • return t1 + t2}
  • SP[x := e] = do { t SP[e] ;
  • replace [a/x] ;
  • and x == t[a/x] ;
  • return t[a/x] }
one issue
One Issue
  • Of course, this oversequentializes the code.
  • C has very liberal order of evaluation rules which are hopelessly unusable for any sound analysis.
  • So we force the evaluation to be left-to-right and match our sequentialization.
next problem diamonds
Next Problem: Diamonds
  • SP[if (e1) S11else S12 ;
  • if (e2) S21else S22 ;
  • ...
  • if (en) Sn1else Sn2]A
  • Textbook approach explodes paths into a tree.

SP[if (e) S1else S2]A=

SP[S1](A e0)  SP[S2](A e=0)

  • This simply doesn't scale.
    • e.g., one procedure had assn with ~1.5B nodes.
    • WP has same problem. (see Flanagan & Leino)
hmmm a lot like na ve cps
Hmmm…a lot like naïve CPS

Duplicate

result of 1st conditional

which duplicatesthe original

assertion.

  • SP[if (e1) S11else S12 ;
  • if (e2) S21else S22 ]A =
  • SP[S21] ((SP[S11](A e10)  SP[S12](A e1=0))  e20)
  • SP[S22] ((SP[S11](A e10)  SP[S12](A e1=0))e2=0)
aha we need a let
Aha! We need a "let":
  • SP[if (e) S1else S2]A =
  • letX=Ain (e0 SP[S1]X)  (e=0 SP[S2]X)
  • Alternatively, make sure we physically share A.
  • Oops:
  • SP[x:=e]X = X[a/x]x=e[a/x]
  • This would require adding explicit substitutions to the assertion language to avoid breaking the sharing.
handling updates necula
Handling Updates (Necula)
  • Factor outa local environment: A = {x=e1 y=e2 …}Bwhere neither B nor ei contains program variables (i.e., x,y,…)
  • Only the environment needs to change on update: SP[x:=3]{x=e1 y=e2 …}B ={x=3y=e2 …}B
  • So most of the assertion (B) remains unchanged and can be shared.
so now
So Now:
  • SP : Exp  (Env  Assn)  (Term  Env  Assn)
  • SP[x] (E,A) = (E(x), (E,A))
  • SP[e1+e2] (E,A) =
  • let (t1,E1,A1) = SP[e1] (E,A)
  • (t2,E2,A2) = SP[e2] (E,A1)
  • in (t1 + t2, E2, A2)
  • SP[x := e] (E,A) =
  • let (t,E1,A1) = SP[e] (E,A)
  • in (t, E1[x:=t], A1)
or as in haskell1
Or as in Haskell:
  • SP[x] = lookup x
  • SP[e1+e2] = do { t1 SP[e1] ; t2 SP[e2] ;
  • return t1 + t2 }
  • SP[x := e] = do { t SP[e] ; set xt;
  • return t }
slide23
Note:
  • Monadic encapsulation crucial from a software engineering point of view:
  • actually have multiple out-going flow edges due to exceptions, return, etc.
    • (see Tan & Appel, VMCAI'06)
  • so the monad actually accumulates (Term  Env  Assn) values for each edge.
  • but it still looks as pretty as the previous slide.
  • (modulo the fact that it's written in Cyclone.)
diamond problem revisited
Diamond Problem Revisited:
  • SP[if (e) S1else S2]{x=e1 y=e2 …}B =
  • (SP[S1]{x=e1 y=e2 …}Be0) 
  • (SP[S2]{x=e1 y=e2 …}Be=0) =
  • ({x=t1 y=t2…} B1) 
  • ({x=u1y=u2 …}B2) =
  • {x=axy=ay…}
  • ((ax= t1 ay = t2…B1) 
  • (ax= u1 ay = u2…B2))
how does the environment help
How does the environment help?

SP[if (a) x:=3 elsex:= y;

if (b) x:=5 elseskip;]{x=e1 y=e2}B

{x=vy=e2}

b=0 v=t

b0 v=5

a0 t=3

B

a=0  t=e2

tah dah
Tah-Dah!
  • I've rediscovered SSA.
    • monadic translation sequentializes and names intermediate results.
    • only need to add fresh variables when two paths compute different values for a variable.
    • so the added equations for conditionals correspond to -nodes.
  • Like SSA, worst-case O(n2) but in practice O(n).
  • Best part: all of the VCs for a given procedure share the same assertion DAG.
so far so good
So far so good:
  • Of course, I've glossed over the hard bits:
    • loops
    • memory
    • procedures
  • Let's talk about loops first…
widening
Widening:
  • Given AB, calculate some C such that A  C and B  C and |C| < |A|, |B|.
  • Then we can compute a fixed-point for loop invariants iteratively:
    • start with pre-condition P
    • process loop-test & body to get P'
    • see if P'  P. If so, we're done.
    • if not, widen PP' and iterate.
    • (glossing over variable scope issues.)
our widening
Our Widening:
  • Conceptually, to widen AB
  • Calculate the DNF
  • Factor out syntactically common primitive relations:
    • In practice, we do a bit of closure first.
    • e.g., normalize terms & relations.
    • e.g., x==e expands to x  e  x  e.
  • Captures any primitive relation that was found on every path.
widening algorithm take 1
Widening Algorithm (Take 1):
  • assn = Prim of reln*term*term
  • | True | False | And of assn*assn
  • | Or of assn*assn
  • widen (Prim(…)) = expand(Prim(…))
  • widen (True) = {}
  • widen (And(a1,a2)) = widen(a1)  widen(a2)
  • widen (Or(a1,a2)) =
  • widen(a1)  widen(a2)
  • ...
widening for dag
Widening for DAG:
  • Can't afford to traverse tree so memoize:
  • widen A = case lookup A of
  • SOME s => s
  • | NONE => let s = widen' A in
  • insert(A,s); s end
  • widen' (x as Prim(…)) = {x}
  • widen' (True) = {}
  • widen' (And(a1,a2)) = widen(a1)  widen(a2)
  • widen' (Or(a1,a2)) =
  • widen(a1)  widen(a2)
hash consing ala shao s flint
Hash Consing (ala Shao's Flint)
  • To make lookup's fast, we hash-cons all terms and assertions.
    • i.e., value numbering
    • constant time syntactic [in]equality test.
  • Other information cached in hash-table:
    • widened version of assertion
    • negation of assertion
    • free variables
note on explicit substitution
Note on Explicit Substitution
  • Originally, we used explicit substitution.
  • widen S (Subst(S',a)) = widen (S  S') a
  • widen S (x as Prim(…)) = {S(x)}
  • widen S (And(a1,a2)) = widen S a1  widen S a2
  • ...
  • Had to memoize w.r.t. both S and A.
    • rarely encountered same S and A.
    • result was that memoizing didn't help.
    • ergo, back to tree traversal.
  • Of course, you get more precision if you do the substitution (but it costs too much.)
back to loops
Back to Loops:
  • The invariants we generate aren't great.
    • worst case is that we get "true"
    • we do catch loop-invariant variables.
    • if x starts off at i, is incremented and is guarded by x < e < MAXINT then we can get x >= i.
  • But:
    • covers simple for-loops well
    • it's fast: only a couple of iterations
    • user can override with explicit invariant(note: only 2 loops in string library annotated this way, but plan to do more.)
memory
Memory
  • As in ESC, use a functional array:
  • terms: t ::= … | upd(tm,ta,tv) | sel(tm,ta)
  • with the environment tracking mem:
  • SP[*e] = do { a  SP[e]; m  lookupmem;return sel(m,a) }
  • McCarthy axioms:
    • sel(upd(m,a,v),a) == v
    • sel(upd(m,a,v),b) == sel(m,b) when a  b
the realities of c bite again
The realities of C bite again…
  • Consider:
  • pt x = new Point{1,2};
  • int *p = &x->y;
  • *p = 42;
  • *x;
  • sel(upd(upd(m,x,{1,2}), x+offsetof(pt,y),42),x) = {1,2} ??
explode aggregates
Explode Aggregates?
  • update(m,x,{1,2}) =
  • upd(upd(m,x+offsetof(pt,x),1), x+offsetof(pt,y),2)
  • This turns out to be too expensive in practice because you must model memory down to the byte level.
refined treatment of memory
Refined Treatment of Memory
  • Memory maps roots to aggregate values:
  • Aggregates: {t1,…,tn} | set(a,t,v) | get(a,t)
  • Roots: malloc(n,t)
  • where n is a program point and t is a term used to distinguish different dynamic values allocated at the same point.
  • Pointer expressions are mapped to paths:
  • Paths: path ::= root | path  t
selects and updates
Selects and Updates:
  • Sel and upd operate on roots only:
  • sel(upd(m,r,v),r) = v
  • sel(upd(m,r,v),r') = sel(m,r') when r != r'
  • Compound select and update for paths:
  • select(m,r) = sel(m,r)
  • select(m,a  t) = get(select(m,a),t)
  • update(m,r,v) = update(m,r,v)
  • update(m, a  t, v) =
  • update(m, a, set(select(m,a),t,v))
for example1
For Example:
  • *x = {1,2};
  • int *p = &x->y;
  • *p = 42;
  • update(upd(m,x,{1,2}), xoff(pt,y), 42) =
  • upd(upd(m, x,{1,2}), x, set({1,2},off(pt, y), 42) =
  • upd(upd(m, x,{1,2}), x, {1,42})) =
  • upd(m, x,{1,42})
reasoning about memory
Reasoning about memory:
  • To reduce:
  • select(update(m,p1,v),p2)) to select(m,p2)
  • we need to know p1and p2 are disjoint paths.
  • In particular, if one is a prefix of the other, we cannot reduce (without simplifying paths).
  • Often, we can show their roots are distinct.
  • Many times, we can show they are updates to distinct offsets of the same path prefix.
  • Otherwise, we give up.
procedures
Procedures:
  • Originally, intra-procedural only:
    • Programmers could specify pre/post-conditions.
  • Recently, extended to inter-procedural:
    • Calculate SP's and propagate to callers.
      • If too large, we widen it.
    • Go back and strengthen pre-condition of (non-escaping) callee's by taking "disjunction" of all call sites' assertions.
summary of vc generation
Summary of VC-Generation
  • Started with textbook strongest post-conditions.
  • Effects: Rewrote as monadic translation.
  • Diamond: Factored variables into an environment to preserve sharing (SSA).
  • Loops: Simple but effective widening for calculating invariants.
  • Memory: array-based approach, but care to avoid blowing up aggregates.
  • Extended to inter-procedural summaries.
proving
Proving:
  • Original plan was to use off-the-shelf technology.
    • eg., Simplify, SAT solvers, etc.
  • But found:
    • either didn't have decision procedures that I needed.
    • or were way too slow to use on every compile.
    • so like an idiot, decided to roll my own…
2 prover s
2 Prover(s):
  • Simple Prover:
  • Given a VC: A  C
  • Widen A to a set of primitive relns.
  • Calculate DNF for C and check that each disjunct is a subset of A.
  • (C is quite small so no blowup here.)
  • This catches a lot:
    • all but about 2% of the checks we eliminate!
    • void f(int @x) { …*x… }
    • if (x != NULL) …*x…
    • for (i=0; i < numelts(A); i++)…A[i]…
2nd prover
2nd Prover:
  • Given A  C, try to show A  C inconsistent.
  • Conceptually:
  • explore DNF tree (i.e., program paths)
    • the real exponential blow up is here.
    • so we have a programmer-controlled throttle on the number of paths we'll explore (default 33).
  • accumulate a set of primitive facts.
  • at leaves, run simple decision procedures to look for inconsistencies and prune path.
problem arithmetic
Problem: Arithmetic
  • To eliminate an array bounds check on an expression x[i], we can try to prove a predicate similar to this:

A  0 i < numelts(x)

where A describes the state of the machine at that program point.

do we need checks here
Do we need checks here?
  • char *malloc(unsigned n)
  • @ensures(n == numelts(result));
  • void foo(unsigned x) {
  • char *p = malloc(x+1);
  • for (int i = 0; i <= x; i++)
  • p[i] = ‘a’;
  • }
  • }

0  i < numelts(p)?

you bet
You bet!
  • foo(-1)
  • void foo(unsigned x) {
  • char *p = malloc(x+1);
  • for (int i = 0; i <= x; i++)
  • p[i] = ‘a’;
  • }
  • }

ix from loop guard, but this isan unsigned comparison. That is,

we are comparing i against 0xffffffffwhich always succeeds.

integer overflow
Integer Overflow
  • This example is based on a vulnerability in the GNU mail utilities (i.e., IMAP servers)
  • http://archives.neohapsis.com/archives/ fulldisclosure/2005-05/0580.html
  • There are other situations where wrap-around gets you into trouble.
  • So we wanted to take machine arithmetic seriously.
  • Unfortunately, I haven't yet found a prover that I can effectively use. (If you know of any, please tell me!)
our dumb arithmetic solver
Our (Dumb) Arithmetic Solver
  • Determines [un]satisfiability of a conjunction of difference constraints (similar to approach used by Touchstone & ABCD):
  • Constraints: x – yS c and x – yU  c
    • care needed when generating constraints
    • e.g., x + c <= y + k cannot (in general) be simplified to x - y  (k - c).
  • Algorithm tries to find cycles in the graphs:
  • x–x1U<= c1, x1– x2U<= c2, … , xn–xU<= cn
  • where c1+c2+…+cn < 0. That is, x–x < 0.
    • again, care needed to avoid internal overflow.
future
Future?
  • We need provers as libraries/services:
    • can we agree upon a logic?
      • typed, untyped?
      • theories must include useful domains (e.g., Z mod).
    • can we agree upon an API?
      • sharing must be preserved
      • need incremental support, control over search
      • need counter-example support
      • need witnesses?
    • we can now generate some useful benchmarks.
      • multiple metrics: precision vs. time*space
currently
Currently:
  • Memory?
    • The functional array encoding of memory doesn't work well.
    • Can we adapt separation logic? Will it actually help?
    • Can we integrate refinements into the types?
    • Work with A. Nanevski & L. Birkedal a start.
  • Loops?
    • Can we divorce VC-generation from theorem proving all together? (e.g., by compiling to a language with inductive predicates?)
false positives
False Positives:
  • We still have 2,000 checks left.
  • I suspect that most are not needed.
  • How to draw the eye to the ones that are?
    • strengthen pre-conditions artificially(e.g., assume no aliasing, overflow, etc.)
    • if we still can't prove the check, then it should be moved up to a "higher-rank" warning.
lots of borrowed ideas
Lots of Borrowed Ideas
  • ESC M3 & Java
  • Touchstone, Special-J, Ccured
  • SPLint (LCLint)
  • FLINT
  • ABCD
more info
More info...

http://cyclone.thelanguage.org