Constraint-Based Analysis and SAT Approaches in Static Bug Detection

Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken)

unlock lock unlock Error Unlocked Locked lock Code Example Flow Sensitivity void f(state *x, state *y) { result = spin_trylock(&x->lock); spin_lock(&y->lock); … if (!result) spin_unlock(&x->lock); spin_unlock(&y->lock); } result (&x->lock); spin_trylock (&y->lock); spin_lock Path Sensitivity (!result) Pointers & Heap (&x->lock); (&y->lock); spin_unlock Inter-procedural

Saturn • What? • SAT-based approach to static bug detection • How? • SAT-based approach • Program constructs  Boolean constraints • Inference  SAT solving • Why SAT? • Lots of reasons, but for now: • Program states naturally expressed as bits • The theory for bits is SAT • Efficient solvers widely available

Intuition • Analyzing in one direction is problematic • Forwards or backwards • Consider null dereference analysis • No null ptr assignments: forwards is best • No dereferences: backwards is best • Constraints • Give a global picture of the program • Allow more efficient order of solution

x31 … x0 y31 … y0 Bitwise-AND x31y31 … x0y0 == Straight-line Code void f(int x, int y) { int z = x & y ; assert(z == x); } ; z y x & == R

Straight-line Code void f(int x, int y) { int z = x & y; assert(z == x); } Query: Is-Satisfiable( ) Answer: Yes x = [00…1] y = [00…0] Negated assertion is satisfiable. Therefore, the assertion may fail. R

Control Flow – Preparation • Approach • Assumes loop free program • Unroll loops, drop backedges • May miss errors that are deeply buried • Bug finding, not verification • Many errors surface in a few iterations • Advantages • Simplicity, reduces false positives

Control Flow – Example • if (c) • x = a; • else • x = b; • res = x; • Merges • preserve path sensitivity • select bits based on the values of incoming guards G = c, x: [a31…a0] G = c, x: [b31…b0] G = cc, x: [v31…v0] where vi = (cai)(cbi) if (c) c c x = a; x = b; true res = x;

Pointers – Overview • May point to different locations… • Thus, use points-to sets p: { l1,…,ln } • … but path sensitive • Use guards on points-to relationships p: { (g1, l1), …, (gn, ln) }

Pointers – Example G = true, p: { (true, x) } • p = &x; • if (c) • p = &y; • res = *p; if (c) res = y; else if (c) res = x; G = c, p: { (true, y) } G = true, p: { (c, y); (c, x)}

Pointers – Recap • Guarded Location Sets { (g1, l1), …, (gn, ln) } • Guards • Condition under which points-to relationship holds • Collected from statement guards • Pointer Dereference • Conditional Assignments

Not Covered • Other Constructs • Structs, … • Modeling of the environment • Optimizations • several to reduce size of formulas • some form of program slicing important

if (l->state == Unlocked) l->state = Locked; else l->state = Error; unlock if (l->state == Locked) l->state = Unlocked; else l->state = Error; lock unlock Error Locked Unlocked lock What can we do with Saturn? int f(lock_t *l) { lock(l); … unlock(l); }

General FSM Checking • Encode FSM in the program • State  Integer • Transition  Conditional Assignments • Check code behavior • SAT queries

How are we doing so far? • Precision:  • Scalability:  • SAT limit is 1M clauses • About 10 functions • Solution: • Divide and conquer • Function summaries

Function behavior can be summarized with a set of state transitions Summary: *l: Unlocked  Unlocked Locked  Error int f(lock_t *l) { lock(l); … … unlock(l); return 0; } Function Summaries (1st try)

int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Problem two possible output states distinguished by return value (retval == 0)… Summary 1. (retval == 0) *l: Unlocked  Unlocked Locked  Error 2. (retval == 0) *l: Unlocked  Locked Locked  Error A Difficulty

FSM Function Summaries • Summary representation (simplified): { Pin, Pout, R } • User gives: • Pin: predicates on initial state • Pout: predicates on final state • Express interprocedural path sensitivity • Saturn computes: • R: guarded state transitions • Used to simulate function behavior at call site

int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Output predicate: Pout = { (retval == 0) } Summary (R): 1. (retval == 0) *l: Unlocked  Unlocked Locked  Error 2. (retval == 0) *l: Unlocked  Locked Locked  Error Lock Summary (2nd try)

Lock checker for Linux • Parameters: • States: { Locked, Unlocked, Error } • Pin = {} • Pout = { (retval == 0) } • Experiment: • Linux Kernel 2.6.5: 4.8MLOC • ~40 lock/unlock/trylock primitives • 20 hours to analyze • 3.0GHz Pentium IV, 1GB memory

Double Locking/Unlocking static void sscape_coproc_close(…) { spin_lock_irqsave(&devc->lock, flags); if (…) sscape_write(devc, DMAA_REG, 0x20); … } static void sscape_write(struct … *devc, …) { spin_lock_irqsave(&devc->lock, flags); … }

Ambiguous Return State int i2o_claim_device(…) { down(&i2o_configuration_lock); if (d->owner) { up(&i2o_configuration_lock); return –EBUSY; } if (…) { return –EBUSY; } … }

Bugs Previous Work: MC (31), CQual (18), <20% Bugs

Function Summary Database • 63,000 functions in Linux • More than 23,000 are lock related • 17,000 with locking constraints on entry • Around 9,000 affects more than one lock • 193 lock wrappers • 375 unlock wrappers • 36 with return value/lock state correlation • Available on the web . . .

Another Checker • Memory leaks • Common, esp. in error handling code • Hard to find • Problematic in long running applications • Current techniques • Escape analysis • Ownership types • Region based analysis…

Simple Leak char *f() { char *p; p = (char*)malloc(…); … if (err) return NULL; … return p; }

Scenario 1 – Malloc Wrappers char *f() { char *p; p = (char*)strdup(…); … if (err) return NULL; … return p; }

Scenario 2 – External References char *f(struct *s) { char *p; p = (char*)malloc(…); s->name = p; if (err) return NULL; … return p; }

Scenario 3 – Function Calls char *f(struct state *s) { char *p; p = (char*)malloc(…); g(s, p); if (err) return NULL; … return p; } void g(s, p) { s->name = p;}

Scenario 4 – Data dependency void f(int len) { char fastbuf[10], *p; if (len < 10) p = fastbuf; else p = (char *)malloc(len); … if (p != fastbuf) free(p); }

Requirements • Track points-to relationships precisely • Infer escaping functions • ones that create external references to objects passed in via parameters • Infer allocation functions

Analysis Part I – Points-to Rule • PointsTo(p, l) • condition under which p points to l (p) = { (g0, l0), …, (gn-1, ln-1) } PointsTo(p, l) =  gi (if li = l)   false (otherwise)

Analysis PartII – EscapeVia • EscapeVia(l, p, X) • the condition under which location l escapes via pointer p, excluding references in set X • Access Roots • Every object in the function body is accessed through one of the following “roots” • Parameters (p1…pn) • The Return Value (ret_val) • Global Variables • Local Variables • Heap Allocated Objects

Analysis Part II – EscapeVia • Never escape through local variables Root(p)  Locals  X EscapeVia(l, p, X) = false • Always escape through global variables RootOf(p)  Globals EscapeVia(l, p, X) = PointsTo(p, l)

Analysis Part II – EscapeVia • Escaping through parameters/return RootOf(p)  (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l) • Escaping via another allocated location RootOf(p) NewLocs – X EscapeVia(l, p, X) = PointsTo(p, l)  Escaped(p,X {RootOf(l)})

Analysis Part III – Escape/Leak • Escape ConditionEscaped(l, X) = p EscapedVia(l, p, X) • Leak ConditionLeaked(l, X) =  Escaped(l, X) • Leak CheckerFor all new locations l, there is a leak ifSatisfiable(Leaked(l, {}))

Results

Why SAT? (Revisited …) • Moore’s Law • Uniform modeling of constructs as bits • Constraints • Local specification • Global solution • Incremental SAT solving • makes multiple queries efficient

Why SAT? (Cont.) • Path sensitivity is important • To find bugs • To reduce false positives • Much easier to model precisely with SAT • Compositionality is important • Function summaries critical for scalability • Easy to construct with SAT queries

Constraint-Based Analysis and SAT Approaches in Static Bug Detection

Constraint-Based Analysis and SAT Approaches in Static Bug Detection

Presentation Transcript

Constraint-Based Routing in MPLS

Constraint-Based Entity Matching

Constraint Based Hindi Parser

CONSTRAINT-BASED SCHEDULING and PLANNING

Constraint-Based Analysis

Constraint Analysis in GIS

Constraint Based Systems

Constraint-Based Scheduling

Design Constraint Analysis

Constraint-Based Verification

CGLIB - Constraint-based Graphics Library

Team 7 Constraint Analysis

Constraint based Dependency Telugu Parser

Constraint Based Hindi Dependency Parser

Explanation-based constraint programming

TCSP – Design Constraint Analysis

Constraint-based Information Integration

Constraint-Based Random Verification by Mutation Analysis

Constraint-Based Embedded Program Composition

Design Constraint Analysis

Integrating Arithmetic Constraint Based Verification and Shape Analysis