Using System-Specific Compiler Extensions to Find Errors in Systems Code

Talk about how to find hundreds of errors by just putting a little bit of system specific knowledge in a compiler Using System-Specific Compiler Extensions to Find Errors in Systems Code Dawson Engler Ben Chelf, Andy Chou, Seth Hallem Stanford University

Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use) Checking systems software • Systems software has many ad-hoc restrictions: • “acquire lock L before accessing shared variable X” • “do not allocate large variables on 6K kernel stack” • Error = crashed system. How to find errors? • Formal verification • + rigorous • - costly + expensive. *Very* rare to do for software • Testing: • + simple, few false positives • - requires running code: doesn’t scale & can be impractical • Manual inspection • + flexible • - erratic & doesn’t scale well. • What to do?? Practically intractable: far too strenous, and even if you do, spec isn’t code. Method of choice if you build systems for money: test. Problem: O(paths) = exponential in length of code. So if you build systems, what you wind up with is a system that only crashes after a week. Further, mapping crash back to cause can be *really* hard. Inspection is dead: modify code, have to do it again.

Never do X (do not use floating point, allocate large vars on 6K kernel stack) Always do X before/after Y (acquire lock before use, release after use) Another approach • Observation: rules can be checked with a compiler • scan source for “relevant” acts check if they make sense E.g., to check “disabled interrupts must be re-enabled:” scan for calls to disable()/enable(), check that they match, not done twice • Main problem: • compiler has machinery to automatically check, but not knowledge • implementor has knowledge but not machinery • Meta-level compilation (MC): • give implementors a framework to add easily-written, system-specific compiler extensions Want to make compilers aggressively system specific: if you design a system or interface and see a use of it, you invariably see ways of doing it better: give you way to articulate this knowledge and have compiler do it for you automatically

“did not re-enable interrupts!” interrupt chk Actual error in Linux: raid5 driver disables interrupts, and then if it fails to allocate buffer, returns with them disabled. This kernel deadlock is actually hidden by an immediate segmentation fault since the callers dereference the pointer without checking for NULL Meta-level compilation (MC) • Implementation: • Extensions dynamically linked into GNU g++ compiler • Applied down all paths in input program source • E.g. 64-line extension to check disable/disable (82 bugs) • Static detection of real errors in real systems: • 600+ bugs in Linux, OpenBSD, FLASH, Xok exokernel • most extensions < 100 lines, written by system outsiders save(flags); cli(); if(!(buf = alloc())) return NULL; restore(flags); return buf; GNU C++ compiler Linux: raid5.c The main results: it works really well, and its easy

initial Is enabled enable disable enable Is disabled disable End-of- path error A bit more detail { #include ”linux-includes.h” } sm chk_interrupts { decl { unsigned } flags; // named patterns pat enable = { sti(); } | { restore_flags(flags); }; pat disable = { cli(); }; // states is_enabled: disable ==> is_disabled | enable ==> { err("double enable"); } ; is_disabled: enable ==> is_enabled | disable ==> { err("double disable"); } | $end_of_path$ ==> { err("exiting w/intr disabled!"); } ; }

P.clean “X before Y” rule: system call pointers • Applications are evil • OS much check all input pointers before use • one missing check = security hole • MC checker: • Bind syscall ptr’s to “tainted” state • tainted vars only touched w/ “safe” routines • or: explicit check to make “clean” Each input ptr P copyin(p), copyout(p) P.tainted use(p) check(p) error /* from sys/kern/disk.c */ int sys_disk_request(… struct buf *reqbp, u_int k) { ... /* bypass for direct scsi commands */ if (reqbp->b_flags & B_SCSICMD) return sys_disk_scsicmd (sn, k, reqbp); use(p)

Drivers pull them from all sorts of random locations Deriving specification from common usage • Problem: difficult to specify all user pointers • so:see what code usually does, deviations probably errors • if ever pass ptr to paranoid routine, make sure always do • Found 5 security errors in Linux. • Canonical example: hole in an “ioctl” routine for some obscure device driver. /* drivers/usb/evdev.c */ static int evdev_ioctl(..., unsigned long arg) { ... switch (cmd) { case EVIOCGVERSION: return put_user(EV_VERSION, (__u32 *) arg); case EVIOCGID: /* copy_to_user(to, from)! */ return copy_to_user(&dev->id, (void *) arg, sizeof(struct input_id));

Kernel alloc/dealloc rules • Must check that alloc succeeded • Must allocate enough space • Must not use after free() • Must free alloc’d object on error: /* from drivers/char/tea6300.c */ client = kmalloc(sizeof *client,GFP_KERNEL); if (!client) return -ENOMEM; ... tea = kmalloc (sizeof *tea, GFP_KERNEL); if (!tea) return -ENOMEM; ... MOD_INC_USE_COUNT; /* bonus bug: kmalloc could sleep */ 47 error leaks, 48 false positives. 72 mod_inc/mod_dec errors

Interesting since it’s so small: a few lines of code and we have a path sensitive high-level analysis pass Stripped-down kernel malloc/free checker decl { scalar } sz; // match any scalar decl { const int } retv; // match const ints state decl { any_ptr } v; // match any pointer, can bind to a state // Bind malloc results to “unknown” until observed start: { v = (any)malloc(sz) } ==> v.unknown | { free(v) } ==> v.freed; // can compare in states unknown, null, not_null v.unknown, v.null, v.not_null: { (v == 0) } ==> true = v.null, false = v.not_null | { (v != 0) } ==> true = v.not_null, false = v.null; // Cannot reach error path with unknown or not-null v.unknown, v.not_null: { return retv; } ==> { if(mgk_int_cst(retv) < 0) err("Error path leak!"); }; // No dereferences of null, freed, or unknown ptrs. v.null, v.freed v.unknown: { *(any *)v } ==> { err("Using ptr illegally!"); };

Some amusing bugs • No check (130 errors, 11 false pos). Worse case (many uses): • use after free (14 errors, 3 false pos): 5 cut&paste of • wrong size (2 errors) /* include/linux/coda_linux.h:CODA_ALLOC */ ptr = (cast)vmalloc((unsigned long) size); ... if (ptr == 0) printk("kernel malloc returns 0\n”); memset( ptr, 0, size ); /* drivers/isdn/pcbit:pcbit_init_dev */ kfree(dev); iounmap((unsigned char*)dev->sh_mem); release_mem_region(dev->ph_mem, 4096); /* drivers/parport/daisy.c:add_dev:50 */ newdev = kmalloc (GFP_KERNEL,sizeof(struct daisydev));

NoBlock 400 lines later have this violation. This is a common pattern: implementor just doesn’t know about the rule, so keeps violating it. Happens: since rules manually enforced and poorly documented. “In context Y, don’t do X”: blocking • Linux: if interrupts are disabled, or spin lock held, do not call an operation that could block. • MC checker: • Compute transitive closure of all potentially blocking fn’s • Hit disable/lock: warn of any calls • 123 errors, 8 false pos clean lock(l) unlock(l) enable() disable() Block call • /* drivers/net/pcmcia/wavelan_cs.c */ • spin_lock_irqsave (&lp->lock, flags); /* 1889 */ • switch(cmd) • ... • case SIOCGIWPRIV: • ... • if(copy_to_user(wrq->u.data.pointer, …)) /* 2305 */ • ret = -EFAULT; error

Another way to use MC is to push dynamic checks to static. Usually have some amount of dynamic type checking going on where you have a series of if statements at the beginning of your routine to check for error conditions. So just pull into the compiler and check. Example: statically checking assert • Assert(x) used to check “x” at runtime. Abort if false • compiler oblivious, so cannot analyze statically • Use MC to build an assert-aware extension • Result: found 5 errors in FLASH. • Common: code cut&paste from other context • Manual detection questionable: 300-line path explosion between violation and check General method to push dynamic checks to static msg.len = 0; ... assert(msg.len !=0); line 211:assert failure! assert checker

High bits: small number of LOC = big bug count, 2:1 ratio of false positives Result overview (+others)

Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it. Conclusions • MC goal: make programming much more powerful • How: Raise compilation from level of programming language to the “meta level” of the systems implemented in that language • MC works well in real, heavily tested systems • We found bugs in every system we’ve looked at. • Over 600 bugs in total, many capable of crashing system • Easily written by people unfamiliar w/ checked system Currently: • making correctors, using domain-knowledge to extract verifiable specs, deriving errors by usage deviations, performing meta-level optimization…

Given a set of uses of some interface you’ve built, you invariably see better ways of doing things. This gives you a way to articulate this knowlege and have the compiler do it for you automatically. Let one person do it. Conclusions • Meta-level compilation: • Make compilers aggressively system-specific • Easy: digest sentence fragment, write checker/optimizer. • Result: Static, precise, immediate error diagnosis As outsiders found errors in every system looked at Over 600 bugs, many capable of crashing system Currently: • making correctors, using domain-knowledge to extract verifiable specs, deriving errors by usage deviations, performing meta-level optimization…

See system call polarity: return < 0 or > 0 on error: found 8 places in linux. If ever check a routine failure: make sure they always check for failure. See what they do after calling a security check (suser): if they do it a lot and deviate, whine. Weakness: always do wrong, never do right. Bugs as deviant behavior • One way to find bugs: • have a deep understanding of code semantics, detect when code makes no sense. Hard. • Easier: • see what code usually does: deviations probably bugs • x protected by lock(a) 1000 times, by lock(b) once, probably an error • Find inverses by looking for common pairings • More general: derive temporal orderings. Use machine learning to derive more sophisticated patterns? lock(a); x++; unlock(a); lock(b); x++; unlock(b); lock(a); x++; unlock(a); lock(a); x++; unlock(a); lock(a); x++; unlock(a); lock(a); x++; unlock(a);

What to do when static analysis too weak? • Static analysis works in some cases, not well in others • hit undecidable problems with loop termination conditions, data values, pointers,… • Alternative: • use domain-specific slicing to extract spec from code • run through verifier • Main lever: a little domain knowledge goes a long way • e.g., strip out Linux TCP finite-state-machine by keying off of variable “sk->state” • Real example: checking FLASH code

Deeply nested control structures (29 conditional compilation directives, 21 if statements) Extracting specs from FLASH code • Embedded sw for cache coherence in FLASH machine • errors crash or deadlock machine: can take week to track • typical protocol: 18K lines of hairy C code • Extract specifications from source by simple slicing • found 9 errors in code • despite 5+ years of heavy testing and formal verification! • How? • Given list of data structure fields and message operations, slice out all relevant operations • Compose with specification (manual) boilerplate • run through Murphi model checker • Levers: aliasing and globals, but in a stylized way that we can mostly ignore. 4 loops in code.

FLASH vs Murphi HANDLER_GLOBALS(header.nh.len) = LEN_CACHELINE; if (! HANDLER_GLOBALS(h.hl.Pending)) { if (! HANDLER_GLOBALS(h.hl.Dirty)) { ASSERT(!HANDLER_GLOBALS(h.hl.IO)); PI_SEND(F_DATA, F_FREE, F_SWAP,…,…); HANDLER_GLOBALS(h.hl.Local) = 1; /* ... deleted 14 lines */ } else { ASSERT(!HANDLER_GLOBALS(h.hl.List)); ASSERT(!HANDLER_GLOBALS(h.hl.RealPtrs)); FLASH nh.len := len_cacheline; if ((DH.Pending = 0)) then if ((DH.Dirty = 0)) then assert(nh.len != len_nodata); mbResult := pi_send_func(src, PI_Putt); DH.Local := 1; else assert((DH.List = 0)); assert((DH.RealPtrs = 0)); … Murphi

tmp = malloc(sizeof *p); cli(); p = tmp; sti(); cli(); p = malloc(sizeof *p); sti(); Malloc: can insert null pointer check (if you can figure out right value to return) or can preallocate. Hoist mod_inc/dec above sleeping operations. Checkers into Correctors Empirical observation: we have sent out hundreds of bug reports. 20-30 have gotten fixed. • Problem: big system, lots of bugs • may not be your system or take too long to fix manually • Can turn some classes of checkers into correctors: • “Do not allocate large variables on kernel stack”: if you hit a violation, rewrite code to dynamically allocate var • “Do not call blocking memory allocator with interrupts disabled”: hoist allocation out • “On error paths, rollback side-effects”: dynamically track what these are, and reverse. • Interesting: trade dynamic checks for simplicity

lock(q->lock); head = q->head; n = q->nelem; unlock(q->lock); read_lock(q->lock); head = q->head; n = q->nelem; read_unlock(q->lock); read_lock(q->lock); q->head = h; “modifying q with read lock!” MC optimization • Optimization rules similar to checking: • “if data is not shared with interrupt handlers, protect using spin locks rather than interrupt disable/enable • “to save an instruction when setting a message opcode, xor it with the new and old (msg.opcode ^= (new ^ old)); • “replace quicksort with radix sort when sorting integers” • Common rule: “In situation X, do Y rather than Z”: • “if a variable is not modified, protect using read locks” • and with a few lines: change opt into checker:

MC analysis vs. traditional compiler analysis • Meaning more apparent + domain-specific knowlege • Easier to bound side-effects: use knowledge of abstract state to ignore many concrete actions • Aliasing less of a problem • typical: opaque handles vs normal mess of pointers • Operations more coarse grain • read()/write() vs load/store; matrix ops vs +/- Bigint a, b, c; set(a, 3); mul(b, a, a); mul(c, b, b); printf(“%s”, bigint_to_str(c)); Interfaces let compilers treat implementation as a black box just like programmers! printf(“81”); Could imagine following bunches of pointers and memory allocation: instead, we know that it does a +, - whatever and we can ignore it. Similarly: ignoring ptrs. Many compiler problems hard because static. Not a problem with runtime checking. Less control needed: pushing around function calls, rather than doing register allocation

Using System-Specific Compiler Extensions to Find Errors in Systems Code

Using System-Specific Compiler Extensions to Find Errors in Systems Code

Presentation Transcript

joeq compiler system

Using Specific Details in Writing

How to find lots of bugs in real code with system-specific static analysis

Extensions to Structure Layout Optimizations in the Open64 Compiler

Using Model Checking to Find Serious File System Errors

Compiler Optimization and Code Generation

Compiler Optimization to Reduce Soft Errors in Register Files

A Compiler-Based Approach to Schema-Specific Parsing

Compiler Construction Code Generation I

Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions

Bugs as deviant behavior – Inferring errors in systems code

Compiler II: Code Generation

System J – Compiler –

Compiler Errors

Demo 9: Phx.Morph Byte code weaving using Microsoft’s Phoenix compiler

Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions

Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions

Compiler II: Code Generation

Compiler Optimization and Code Generation

Using compiler-directed approach to create MPI code automatically Paraguin Compiler Continued

Portable Code Compiler