1 / 22

CIL: Infrastructure for C Program Analysis and Transformation

CIL: Infrastructure for C Program Analysis and Transformation. George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/~necula/cil. ETAPS – CC ’02 Friday, April 12. What is CIL?. Distills C language into a few key forms

Download Presentation

CIL: Infrastructure for C Program Analysis and Transformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/~necula/cil ETAPS – CC ’02 Friday, April 12

  2. What is CIL? • Distills C language • into a few key forms • with precise semantics • Parser + IR + Program Merger for C • Maintains types, close ties to source • Highly structured, clean subset of C • Handles ANSI/GCC/MSVC

  3. Why CIL? • Analyses and Transformations • Easy to use • impersonates compiler & linker • $ make project CC=cil • Easy to work with • converts away tricky syntax • leaves just the heart of the language • separates concepts

  4. C Feature Separation • CIL separates language components • pure expressions • statements with side-effects • control-flow • embedded CFG • Keeps all programmer names • temps serialize side-effects • simplified scoping

  5. Example: C Lvalues • An exp referring to a region of storage • Example: rec[1].fld[2] • May involve 1, 2, 3 memory accesses • 1 if rec and fld are both arrays • 2 if either one is a pointer • 3 if rec and fld are both pointers • Syntax (AST) is insufficient

  6. CIL Lvalues • An exp referring to a region of storage lval ::= <base ´ offset> base ::= Var(varinfo) | Mem(exp) offset ::= None | Field(f ´ offset) | Index(exp ´ offset)

  7. CIL Lvalues • Example: rec[1].fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec), None>)), Field(fld, None)>), None> • Full static and operational semantics

  8. Semantics • CIL gives syntax-directed semantics • Example judgment: environment meaning lvalue form

  9. CIL Lvalue Semantics

  10. CIL output: struct __anonstruct1 { int fld[3] ; }; typedef struct __anonstruct1 * Myptr; Myptr rec; (rec + 2)->fld[1] = (int)’h’; SUIF 2.2.0-4 output: typedef int __ar_1[3]; struct type_1 { __ar_1 fld; }; struct type_1 * rec; (((((int *)(((char *)&((((struct type_1 *) (rec))))[2])+0U))))[1]) =(104); CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2].fld[1] = ’h’;

  11. Corner Cases • Your analysis will not have to handle: • return ({goto L; p;}) && ({L: 5;}); • return &(--x ? : z) - & (x++, x); • Full handling of • GNU-isms, MSVC-isms • attributes • initializers

  12. Corner Cases • Your analysis will not have to handle: • return ({goto L; p;}) && ({L: 5;});  int tmp; goto L; if (p) { L: tmp = 1; } else { tmp = 0; } return tmp;

  13. StackGuard Transform • Cowan et al., USENIX ’98 • Buffer overrun defense • push return addess on private stack • pop before returning • only change functions with local arrays • 40 lines of commented code with CIL • Quite easy: uses visitors for tree replacement, explicit returns, etc.

  14. Other Transforms • Instrument and log all calls: 150 lines • Eliminate break, continue, switch: 110 • 1 memory access per assignment: 100 • Make each function have a single return statement: 90 • Make all stack arrays heap-allocated: 75 • Log all value/addr memory writes: 45

  15. Whole-Program Merger • C has incremental linking, compilation • coupled with a weak module system! • Example (vortex / gcc / c++2c): /* foo.c */ struct list { int head; struct list * tail; }; struct list * mylist; /* bar.c */ struct chain { int head; struct chain * tail; }; extern struct chain * mylist;

  16. Merging a Project • Determine what files to merge • Merge the files • handle file-scoped identifiers • C uses name equivalence for types • but modules need structural equivalence • Key: Each global identifier has 1 type!

  17. Other Merger Details • Remove duplicate declarations • every file includes <stdio.h> • Match struct pointer with no defined body in file A to defined body in file B • Be careful when picking representatives

  18. How Does it Work? • Make project, pass all files through CIL • Run your transform and analysis • Emit simplified C • Compile simplified C with GCC/MSVC • … and it works!

  19. Large Programs Used in the CCured and BLAST projects

  20. Merged Kernel Stats • Stock monolithic Linux 2.4.5 kernel • http://manju.cs.berkeley.edu/cil/vmlinux.c • Statistics: Before | After • 324 files | One 12.5MB file • 11.3 M-words | 1.5 M-words • 7.3 M-LOC (post-process) | 470 K-LOC • $ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”

  21. Conclusion • CIL distills C to a precise, simple subset • easy to analyze • well-defined semantics • close to the original source • Well-suited to complex analyses and source-to-source transforms • Parses ANSI/GCC/MSVC C • Rapidly merges large programs

  22. Questions? • Try CIL out: • http://www.cs.berkeley.edu/~necula/cil • Complete source, documentation and test cases freely available

More Related