Constraint-Based Analysis and SAT Approaches in Static Bug Detection
This presentation explores the use of constraint-based analysis, particularly using SAT-based approaches, in static bug detection. Through examples and code snippets, the methodology of translating program constructs into Boolean constraints and leveraging efficient SAT solvers for inference is discussed. The session also emphasizes the significance of path sensitivity, guard conditions, and function summaries in capturing program behavior. Additionally, practical applications involving locking mechanisms in programming are illustrated, showcasing how such techniques can enhance bug detection efficiency.
Constraint-Based Analysis and SAT Approaches in Static Bug Detection
E N D
Presentation Transcript
Constraint-Based Analysis CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken)
unlock lock unlock Error Unlocked Locked lock Code Example Flow Sensitivity void f(state *x, state *y) { result = spin_trylock(&x->lock); spin_lock(&y->lock); … if (!result) spin_unlock(&x->lock); spin_unlock(&y->lock); } result (&x->lock); spin_trylock (&y->lock); spin_lock Path Sensitivity (!result) Pointers & Heap (&x->lock); (&y->lock); spin_unlock Inter-procedural
Saturn • What? • SAT-based approach to static bug detection • How? • SAT-based approach • Program constructs Boolean constraints • Inference SAT solving • Why SAT? • Lots of reasons, but for now: • Program states naturally expressed as bits • The theory for bits is SAT • Efficient solvers widely available
Intuition • Analyzing in one direction is problematic • Forwards or backwards • Consider null dereference analysis • No null ptr assignments: forwards is best • No dereferences: backwards is best • Constraints • Give a global picture of the program • Allow more efficient order of solution
x31 … x0 y31 … y0 Bitwise-AND x31y31 … x0y0 == Straight-line Code void f(int x, int y) { int z = x & y ; assert(z == x); } ; z y x & == R
Straight-line Code void f(int x, int y) { int z = x & y; assert(z == x); } Query: Is-Satisfiable( ) Answer: Yes x = [00…1] y = [00…0] Negated assertion is satisfiable. Therefore, the assertion may fail. R
Control Flow – Preparation • Approach • Assumes loop free program • Unroll loops, drop backedges • May miss errors that are deeply buried • Bug finding, not verification • Many errors surface in a few iterations • Advantages • Simplicity, reduces false positives
Control Flow – Example • if (c) • x = a; • else • x = b; • res = x; • Merges • preserve path sensitivity • select bits based on the values of incoming guards G = c, x: [a31…a0] G = c, x: [b31…b0] G = cc, x: [v31…v0] where vi = (cai)(cbi) if (c) c c x = a; x = b; true res = x;
Pointers – Overview • May point to different locations… • Thus, use points-to sets p: { l1,…,ln } • … but path sensitive • Use guards on points-to relationships p: { (g1, l1), …, (gn, ln) }
Pointers – Example G = true, p: { (true, x) } • p = &x; • if (c) • p = &y; • res = *p; if (c) res = y; else if (c) res = x; G = c, p: { (true, y) } G = true, p: { (c, y); (c, x)}
Pointers – Recap • Guarded Location Sets { (g1, l1), …, (gn, ln) } • Guards • Condition under which points-to relationship holds • Collected from statement guards • Pointer Dereference • Conditional Assignments
Not Covered • Other Constructs • Structs, … • Modeling of the environment • Optimizations • several to reduce size of formulas • some form of program slicing important
if (l->state == Unlocked) l->state = Locked; else l->state = Error; unlock if (l->state == Locked) l->state = Unlocked; else l->state = Error; lock unlock Error Locked Unlocked lock What can we do with Saturn? int f(lock_t *l) { lock(l); … unlock(l); }
General FSM Checking • Encode FSM in the program • State Integer • Transition Conditional Assignments • Check code behavior • SAT queries
How are we doing so far? • Precision: • Scalability: • SAT limit is 1M clauses • About 10 functions • Solution: • Divide and conquer • Function summaries
Function behavior can be summarized with a set of state transitions Summary: *l: Unlocked Unlocked Locked Error int f(lock_t *l) { lock(l); … … unlock(l); return 0; } Function Summaries (1st try)
int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Problem two possible output states distinguished by return value (retval == 0)… Summary 1. (retval == 0) *l: Unlocked Unlocked Locked Error 2. (retval == 0) *l: Unlocked Locked Locked Error A Difficulty
FSM Function Summaries • Summary representation (simplified): { Pin, Pout, R } • User gives: • Pin: predicates on initial state • Pout: predicates on final state • Express interprocedural path sensitivity • Saturn computes: • R: guarded state transitions • Used to simulate function behavior at call site
int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Output predicate: Pout = { (retval == 0) } Summary (R): 1. (retval == 0) *l: Unlocked Unlocked Locked Error 2. (retval == 0) *l: Unlocked Locked Locked Error Lock Summary (2nd try)
Lock checker for Linux • Parameters: • States: { Locked, Unlocked, Error } • Pin = {} • Pout = { (retval == 0) } • Experiment: • Linux Kernel 2.6.5: 4.8MLOC • ~40 lock/unlock/trylock primitives • 20 hours to analyze • 3.0GHz Pentium IV, 1GB memory
Double Locking/Unlocking static void sscape_coproc_close(…) { spin_lock_irqsave(&devc->lock, flags); if (…) sscape_write(devc, DMAA_REG, 0x20); … } static void sscape_write(struct … *devc, …) { spin_lock_irqsave(&devc->lock, flags); … }
Ambiguous Return State int i2o_claim_device(…) { down(&i2o_configuration_lock); if (d->owner) { up(&i2o_configuration_lock); return –EBUSY; } if (…) { return –EBUSY; } … }
Bugs Previous Work: MC (31), CQual (18), <20% Bugs
Function Summary Database • 63,000 functions in Linux • More than 23,000 are lock related • 17,000 with locking constraints on entry • Around 9,000 affects more than one lock • 193 lock wrappers • 375 unlock wrappers • 36 with return value/lock state correlation • Available on the web . . .
Another Checker • Memory leaks • Common, esp. in error handling code • Hard to find • Problematic in long running applications • Current techniques • Escape analysis • Ownership types • Region based analysis…
Simple Leak char *f() { char *p; p = (char*)malloc(…); … if (err) return NULL; … return p; }
Scenario 1 – Malloc Wrappers char *f() { char *p; p = (char*)strdup(…); … if (err) return NULL; … return p; }
Scenario 2 – External References char *f(struct *s) { char *p; p = (char*)malloc(…); s->name = p; if (err) return NULL; … return p; }
Scenario 3 – Function Calls char *f(struct state *s) { char *p; p = (char*)malloc(…); g(s, p); if (err) return NULL; … return p; } void g(s, p) { s->name = p;}
Scenario 4 – Data dependency void f(int len) { char fastbuf[10], *p; if (len < 10) p = fastbuf; else p = (char *)malloc(len); … if (p != fastbuf) free(p); }
Requirements • Track points-to relationships precisely • Infer escaping functions • ones that create external references to objects passed in via parameters • Infer allocation functions
Analysis Part I – Points-to Rule • PointsTo(p, l) • condition under which p points to l (p) = { (g0, l0), …, (gn-1, ln-1) } PointsTo(p, l) = gi (if li = l) false (otherwise)
Analysis PartII – EscapeVia • EscapeVia(l, p, X) • the condition under which location l escapes via pointer p, excluding references in set X • Access Roots • Every object in the function body is accessed through one of the following “roots” • Parameters (p1…pn) • The Return Value (ret_val) • Global Variables • Local Variables • Heap Allocated Objects
Analysis Part II – EscapeVia • Never escape through local variables Root(p) Locals X EscapeVia(l, p, X) = false • Always escape through global variables RootOf(p) Globals EscapeVia(l, p, X) = PointsTo(p, l)
Analysis Part II – EscapeVia • Escaping through parameters/return RootOf(p) (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l) • Escaping via another allocated location RootOf(p) NewLocs – X EscapeVia(l, p, X) = PointsTo(p, l) Escaped(p,X {RootOf(l)})
Analysis Part III – Escape/Leak • Escape ConditionEscaped(l, X) = p EscapedVia(l, p, X) • Leak ConditionLeaked(l, X) = Escaped(l, X) • Leak CheckerFor all new locations l, there is a leak ifSatisfiable(Leaked(l, {}))
Why SAT? (Revisited …) • Moore’s Law • Uniform modeling of constructs as bits • Constraints • Local specification • Global solution • Incremental SAT solving • makes multiple queries efficient
Why SAT? (Cont.) • Path sensitivity is important • To find bugs • To reduce false positives • Much easier to model precisely with SAT • Compositionality is important • Function summaries critical for scalability • Easy to construct with SAT queries