ccured type safe retrofitting of legacy code n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CCured : Type-Safe Retrofitting of Legacy Code PowerPoint Presentation
Download Presentation
CCured : Type-Safe Retrofitting of Legacy Code

Loading in 2 Seconds...

play fullscreen
1 / 35

CCured : Type-Safe Retrofitting of Legacy Code - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

CCured : Type-Safe Retrofitting of Legacy Code. George Necula Scott McPeak Wes Weimer Presented by Anastasia Braginsky. Some slides were taken from George Necula presentation: http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275. Problem.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CCured : Type-Safe Retrofitting of Legacy Code' - bonner


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ccured type safe retrofitting of legacy code

CCured: Type-Safe Retrofitting of Legacy Code

George Necula Scott McPeak Wes Weimer

Presented by Anastasia Braginsky

Some slides were taken from George Necula presentation:

http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275

problem
Problem
  • C is popular; it is part of the infrastructure
  • C is also unsafe and has a weak type system that can cause subtle bugs
solution
Solution
  • Add type safety to C – Make C “feel” as safe as Java
  • Catch memory safety errors, by static analysis as much as possible
  • Add run-time checks to C programs, as less as possible (performance)
  • Minimal user effort
  • Add type inference to C
the ccured system
The CCured System

Halt: Memory

Safety Violation

Instrumented

C Program

CCured

Translator

Compile &

Execute

C Program

Success

two main premises
Two Main Premises
  • Usually in C a large part of the program can be verified statically to be type safe
    • The remaining part can be instrumented with run-time checks to ensure that the execution is memory safe
  • In many applications, some loss of performance due to run-time checks is an acceptable price for the type safety
example c program
Example C Program
  • Boxed integer

31 bit 1 bit

  • Un-boxing
  • C type int* is used to represent boxed integer

integer or pointer tag

0011…11101001 0

0001…11000101 1

0101…10101110 0

example c program1
Example C Program

1int * * a; //array

2int i; // index

3intacc; // accumulator

4int * * p; // element ptr

5int * e; // unboxer

6acc = 0;

7 for (i=0; i<100; i++) {

8 p = a + i; // ptr arithmetic

9 e = *p; // read element

10 while ( (int)e%2 == 0 ) { // check tag

11 e = * (int * * ) e; // unbox

12 }

13acc += ((int)e >> 1); // strip tag

14 }

DYNamic

e

0101…10101110 1

SAFE

p

SEQuence

a

0011…11101001 0

0001…11000101 1

0101…10101001 0

0101…10101110 1

1101…10110110 1

example c program2
Example C Program

But due to aliases all are considered to point to dynamic!

1int * * a; //array

2int i; // index

3intacc; // accumulator

4int * * p; // element ptr

5int * e; // unboxer

6acc = 0;

7 for (i=0; i<100; i++) {

8 p = a + i; // ptr arithmetic

9 e = *p; // read element

10 while ( (int)e%2 == 0 ) { // check tag

11 e = * (int * * ) e; // unbox

12 }

13acc += ((int)e >> 1); // strip tag

14 }

DYNamic

e

0101…10101110 1

SAFE

p

SEQuence

a

0011…11101001 0

0001…11000101 1

0101…10101001 0

0101…10101110 1

1101…10110110 1

safe pointers
SAFE Pointers

SAFE pointer to type t

On use:

- null check

ptr

Can do:

- dereference

t

sequence pointers
SEQuence Pointers

SEQ pointer to type t

On use:

- null check

- bounds check

base

ptr

end

Can do:

- dereference

- pointer arithmetic

t

t

t

dynamic pointers
DYNamic Pointers

On use:

- null check

- bounds check

- tag check/update

DYN pointer

home

ptr

tags

Can do:

- dereference

- pointer arithmetic

- arbitrary typecasts

len

DYN

DYN

int

1

1

0

a formal language
A Formal Language
  • To simplify the presentation, it is described formally for a small language: CCured
  • Then it is described informally how to extend the approach to handle the remaining C constructs
the syntax
The Syntax

Only integers or pointers

ML syntax of references

Doesn’t carry the type of the pointed value

  • Types:
    • τ ::= int|τ ref SAFE |τref SEQ |DYNAMIC
  • Expressions:
    • e ::= x |e1 op e2 | (τ)e|e1 ⊕e2| !e
  • Commands:
    • c ::= skip | c1; c2 | e1:= e2

Integer literals

Assortment of binary integer operations

Casting

Pointers arithmetic

Memory update through a pointer, like *e1= e2 in C

Like *e in C

example c program translated to ccured
Example C Program, translated to CCured

Sequence pointer to DYN

Dynamic

1int *1 *2 a; //array

2int i; // index

3intacc; // accumulator

4int *3 *4 p; // element ptr

5int *5 e; // unboxer

6acc = 0;

7 for (i=0; i<100; i++) {

8 p = a + i; // ptr arithmetic

9 e = *p; // read element

10 while ( (int)e%2 == 0 ) { // check tag

11 e = * (int *6 *7 ) e; // unbox

12 }

13acc += ((int)e >> 1); // strip tag

14 }

Safe pointer to DYN

1 DYNAMIC ref SEQ a; // array

2 int ref SAFE p_i; // index

3 int ref SAFE p_acc; // accumulator

4 DYNAMIC ref SAFE ref SAFE p_p; // element ptr

5 DYNAMIC ref SAFE p_e; // unboxer

6 p_acc := 0;

7 for ( p_i := 0 ; !p_i<100 ; p_i := !p_i + 1 ) {

8 p_p := (DYNAMIC ref SAFE) (a ⊕ !p_i); // ptr arith

9 p_e := !!p_p; // read element

10 while ( (int) !p_e % 2 == 0 ) { // check tag

11 p_e := !! p_e; // unbox

12 }

13 p_acc := !p_acc + ((int)!p_e >> 1); // strip tag

14 }

the ccured type system
The CCured Type System
  • The purpose is to maintain the separation between the statically typed and the un-typed words
  • For presented type system assume that the program contains complete pointer kind information
  • Type environment is provided with the types for every variable name
  • It needs to give types, using derivation rules, to expressions and commands
the derivation rules convertibility
The derivation rules:convertibility
    • “a ≤ b” – it is possible to convert type a to type b
  • τ≤τreflexivity
  • τ≤int reading addresses
  • int≤τref SEQ pointers arithmetic
  • int≤DYN
    • dereferences are prevented by run-time checks; the pointer has lost its capability to perform memory operations
  • τref SEQ≤τref SAFE

 reference types can’t change; bounds are checked by run-time checks

the derivation rules expressions
The derivation rules:expressions
    • “x :τ” – expression x is from type τ
  • (τ ref SAFE) 0 : τ ref SAFE creating safe null pointer
  • IFe : τ ref SAFE THAN !e : τ memory operations only for
  • IFe : DYN THAN !e : DYN  safe and dynamic pointers
  • IF ( e : τ’ AND τ’≤τ ) THAN (τ)e : τ casting rules
  • IF(e1: intANDe2: int)THAN e1 op e2 : int

 binary integer operations

  • IF( e1: τ ref SEQANDe2: int)THAN e1⊕e2 : τ ref SEQ
  • IF( e1: DYNANDe2: int)THAN e1⊕e2 : DYN

 pointer arithmetic only for sequence and dynamic pointers

the derivation rules commands
The derivation rules:commands
  • IF( e1: τ ref SAFEANDe2: τ)THAN e1 := e2
  • IF( e1: DYN ANDe2: DYN) THAN e1 := e2
homes
Homes
  • H is a set of memory allocated areas (which are called homes)
  • A home is represented by its starting address and its size
  • All homes are disjoint
  • A special null-home: 0H size(0)=1
  • Safe pointers and integers have no representation overhead over C
  • Sequence and dynamic pointers carry with them their home

Home

starting

at h1

Home - h2

casts
Casts
  • Any integer with value n, can be casted to sequence or dynamic pointer with value n with null-home
    • No further memory operations
  • Any sequence or dynamic pointers with value n and with home with starting address h, can be cast to integer with value n+h
  • Any dynamic pointer can be cast to different dynamic pointer with same value and home
    • No dynamic ↔ sequence since it is not allowed by type system
  • Any sequence pointer with value n and with home with starting address h, can be cast to safe pointer with value n+h.
    • Only if 0≤n<size(home)  run-time check
run time checks
Run-time checks
  • A null-pointer check for memory operation that uses safe pointer
  • Memory access boundaries
  • Non-pointer check (null-home) for sequence and dynamic pointers
    • Programs that cast pointers to integers and then back to pointers will not be able to use the resulting pointers as memory addresses
well typed ccured programs
Well-typed CCured programs
  • Can fail
    • Due to failed run-time check
  • Can not fail
    • Due to unexpected types
    • Due to trying to access an invalid memory location
theorem i progress and type preservation
Theorem I (Progress and type preservation)
  • IF
    • e : τ(for valid type τ)
      • AND
    • The contents of each memory address corresponds to the typing constraints of the home to which it belongs
  • THEN
      • EITHER
    • One of the run-time checks fails during the evaluation of the expression e
      • OR ELSE
    • e evaluates to value v AND v is the valid value of type τ
theorem ii progress for commands
Theorem II (Progress for commands)
  • For any command c which is built from valid types
  • IF
    • The contents of each memory address corresponds to the typing constraints of the home to which it belongs
  • THEN
      • EITHER
    • The command execution fails due to run-time checks
      • OR ELSE
    • The commands succeeds and still the contents of each memory address corresponds to the typing constraints of the home to which it belongs
type inference algorithm
Type inference algorithm
  • Given a C program, translate the pointer types to make the program well-typed in the CCured type system
  • The C program already uses types of the form “τref ”. It is needed to discover whether it should be safe, sequence or dynamic.
  • τref q
    • where q is a qualifier ranging over the set {SAFE, SEQ, DYN}
  • The overall strategy is to find as many SAFE and SEQ pointers as possible
algorithm overview
Algorithm overview
  • Introduce a qualifier variable for each syntactic occurrence of the pointer type constructor in the C program
  • Scan the program and collect a set of constrains C on these qualifier variables
  • Solve the system of constrains to produce a substitution S of qualifier variables with qualifier values

S(int) = int

S(τref q) = DYNAMIC if S(q)=DYN

S(τ) ref S(q) otherwise

  • Apply the substitution to the types of C program to produce a CCured program
constraint generation rules
Constraint Generation Rules
  • Convertibility
    • int ≤τref q {q ≠ SAFE}  C
    • τ1ref q1 ≤τ2ref q2

 {q1 ← q2}  { q1=q2=DYN OR τ1=τ2=int}  C

      • q1 ← q2 = SEQ can be cast to SAFE (q1 is SEQ and q2 is SAFE) or qualifiers are equal
  • Expressions and commands
    • If e1 :τref q and e2 :int than e1⊕e2 : τ ref q

 {q ≠ SAFE}  C (pointer arithmetic)

constraint collection
Constraint Collection
  • Additional rules to bridge the gap between C and CCured
    • Allow memory access through SEQ (not just SAFE) pointers
    • Allow ints to be read or written through DYNAMIC pointers
      • In both cases implicit cast, no run-time checks
  • In a memory write allow a conversion of the value being written to the type of the referenced type
  • For each type of the form τ ref q’ ref qcollect a constraint q=DYN => q’=DYN
final set of constrains
Final Set of Constrains

Constraint Solving

Propagate the ISDYN constrains using the constraints EQ, CONV, and POINTSTO.

All qualifier variables involved in ARITH constrains are set to SEQ and this information is propagated using the constraints EQ and CONV

Make all the other variables SAFE

ARITH: q ≠ SAFE

CONV: q ← q’

POINTSTO:

q = DYN => q’ = DYN

ISDYN: q = DYN

EQ: q = q’

The whole type inference process is linear in the size of the program!

handling the rest of c
Handling the rest of C
  • In the DYNAMIC world, structures and arrays are simply alternative notations for saying how bytes of storage to allocate
  • Explicit de-allocation is ignored (Garbage Collecor is used)
  • The address-of operator in C can yield a pointer to a stack-allocated variable – additional run-time check that stack pointer is not copied to a heap or globals
  • DYNAMIC function pointers and variable-argument functions are handled by passing a hidden argument which specifies the types of all arguments passed (checked by callee)…
source changes
Source Changes
  • There are still a few cases in which legal program will stop with a failed run-time check – some manual invention is still necessary
    • Pointer to integer then back to pointer  make it all void*
    • Some programs attempt to store stack variables into a memory  allocate on the heap
    • Calling functions in libraries that were not compiled with CCured  write wrapper function
bugs found
Bugs Found
  • ks passes FILE* to printf, not char*
  • compress, ijpeg: array bound violations
  • go: 8 array bound violations
  • go: 1 uninit variable as array index
  • Many involve multi-dimensional arrays
  • Purify only found go uninit bug
  • ftpd buffer overrun bug
conclusions
Conclusions
  • C is a popular and useful program language, but need to have type safety
  • Even in C programs most pointers can be verified to be type safe, rest can be checked in run-time
  • This work provide us ability to infer simple and accurately which pointers need to be checked in run-time
  • Since majority of the pointers are safe, the overheads are smaller then those of comparable tools
  • The presented type system is formally defined and proved
questions

Questions?

Thank you!