Principles of Program Analysis for Effective Software Development

Program Analysis Mooly Sagiv http://www.math.tau.ac.il/~sagiv/courses/pa.html Tel Aviv University 640-6706 Sunday 18-21 Scrieber 8 Monday 10-12 Schrieber 317 Textbook: Principles of Program Analysis Chapter 1.5-8 (modified)

Outline • Analyzing Incomplete Programs • Abstract Interpretation • Type (and Effect) Systems • Transformations • Conclusions

The Abstract Interpretation Technique • The foundation of program analysis • Goals • Establish soundness of (find faults in) a given program analysis algorithm • Design new program analysis algorithms • The main ideas: • Relate each step in the algorithm to a step in a structural semantics • Establish global correctness using a general theorem • Not limited to a particular form of analysis

Soundness in Reaching Definitions • Every reachable definition is detected • May include more definitions • Less constants may be identified • Not all the loop invariant code will be identified • May warn against uninitailzed variables that are in fact in initialized • At every elementary block lRDentry(l) includes all the possibly definitions reaching l • At every elementary block lRDentry(l) “represents” all the possible concrete states arising when the structural operational semantics reaches l

Proof of Soundness • Define an “appropriate” structural operational semantics • Define “collecting” structural operational semantics • Establish a Galois connection between collecting states and reaching definitions • (Local correctness) Show that the abstract interpretation of every atomic statement is soundw.r.t. the collecting semantics • (Global correctness) Conclude that the analysis is sound

Structural Operational Semantics to justify Reaching Definitions • Normal states [Var* Z] are not enough • Instrumented states[Var* Z]  [Var* Lab*] • For an instrumented state (s, def) and variable xdef(x) holds the last definition of x

[comp1sos] <S1 , (s, d)>  <S’1, (s’, d’)> <S1; S2, (s, d)>  < S’1; S2, (s’, d’)> [comp2sos] <S1 , (s, d)> (s’, d’) <S1; S2, (s, d)>  < S2, (s’, d’)> Instrumented Structural Semantics for While [asssos] <[x := a]l, (s, d)>  (s[x Aas], d(x l)) [skipsos] <[skip]l, (s, d)>  (s, d) axioms rules

[ifttsos] <if [b]l then S1 else S2, (s, d)> <S1, (s, d)> [ifffsos] <if [b]l then S1 else S2, (s, d)> <S2, (s, d)> if Bbs=tt if Bbs=ff Instrumented Structural Semantics if construct

Instrumented Structural Semanticswhile construct [whilesos] <while [b]l do S, (s, d)>  <if [b]l then (S; while [b]l do S) else skip, (s, d)>

The Factorial Program [y := x]1;[z := 1]2; while [y>1]3 do ( [z:= z * y]4; [y := y - 1]5; ) [y := 0]6;

Code Instrumentation • Alternative instrumentation • Generate an equivalent program which maintains more information • Use standard structural operational semantics

Other Consumers of Instrumentation • Specialized interpreters • Code Instrumentation • Performance analysis qpt - count the number of execution of basic blocks or the number of calls to a function. • Profiling Tools --- These are used to find “hot” paths (paths that are executed often) by remembering which edge in the control flow graph was executed. • Cleanness Tools Purify - identify uninitialized objects at run-time and SafeC

Collecting (Instrumented) Semantics • The input state is not known at compile-time • “Collect” all the (instrumented) states for all possible inputs to the program • No lost of precision

Flow Information for While • Associate labels with program statements describing when statements begin and end • init:StmLab* • init([x := a]l)= l • init([skip]l)= l • init(S1 ; S2) = init(S1) • init(if [b]lthen S1else S2) = l • init(while [b]l do S) = l • final:StmP(Lab*) • final([x := a]l)= {l} • final([skip]l)= {l} • final(S1 ; S2) = final(S2) • final(if [b]lthen S1else S2) = final(S1) final(S2) • final(while [b]l do S) = {l}

Collecting (Instrumented) Semantics(Cont) • The input state is not known at compile-time • “Collect” all the (instrumented) states for all possible inputs to the program • Define d?:Var* Lab* by d?(x)=? • CSentry(l) = {(s’, d’)|s0: (P, (s0, d?) * (S’, (s’, d’)), init(S’)=l} • Soundness w.r.t. operational semanticsFor all (s’, d’) in CSentry (l) For all variable x (x, d(l)) RDentry(l) • Optimality w.r.t. operational semantics

The Factorial Program [y := x]1;[z := 1]2; while [y>1]3 do ( [z:= z * y]4; [y := y - 1]5; ) [y := 0]6;

An “Iterative” Definition • Generate a system of monotonic equations • The least solution is well-defined • The least solution is the collecting interpretation

Equations Generated for Collecting Interpretation • Equations for elementary statements • [skip]lCSexit(1) =CSentry(l) • [b]lCSexit(1) = CSentry(l) • [x := a]lCSexit(1) = {(s[x Aas], d(x l)) | (s, d)  CSentry(l)} • Equations for control flow constructsCSentry(l) =  CSexit(l’) l’ immediately precedes l in thecontrol flow graph • An equation for the entryCSentry(1) = {(s0, d?) |s0  Var* Z}

The Least Solution • 12 sets of equationsCSentry(1), …, CSexit (6) • Can be written in vectorial form • The least solution Fcsis well-defined (Tarski 1955) • Every component is minimal • Since F is monotonic such a solution always exists • CSentry(l) = {(s’, d’)|s0: (P, (s0, d?) * (S’, (s’, d’)), init(S’)=l}

The Abstraction Function • Map collecting states into reaching definitions • The abstraction of an individual state:[Var* Z]  [Var* Lab*]  P(Var*  Lab*)(s,d) = {(x, d(x) | x  Var* } • The abstraction of set of states:P([Var* Z]  [Var* Lab*])  P(Var*  Lab*) (CS) = (s, d)  CS (s,d) = = {(x, d(x) | (s, d)  CS, x  Var* } • Soundness(CSentry (l))  RDentry(l) • Optimality

The Concretization Function • Map reaching definitions into collecting states • The formal meaning of reaching definitions • The concretization: P(Var*  Lab*)  P([Var* Z]  [Var* Lab*])  (RD) = {(s, d) |  x  Var* :(x, d(x)  RD}= = { (s, d) | (s, d)  RD} • SoundnessCSentry (l)   (RDentry(l)) • Optimality

Galois Connections • The pair of functions (, ) form a Galois connection if: CS  P([Var* Z]  [Var* Lab*])  RD P(Var* Lab*) (CS)  RD iff CS   (RD) • Alternatively: CS  P([Var* Z]  [Var* Lab*])  RD P(Var* Lab*) ( (RD))  RD and CS   ((CS)) •  and  uniquely determine each other

Local Soundness • For every atomic statement S show one of the following • ({S(s, d) | (s, d) CS } S# ((CS)) • {S(s, d) | (s, d)   (RD)}   (S# (RD)) • ({S(s, d) | (s, d)   (RD)}) S# (RD) • In our case, S is assignment and skip • The above condition implies global soundness [Cousot & Cousot 1976] (CSentry (l))  RDentry(l) CSentry (l)   (RDentry(l))

Proof of Soundness (Summary) • Define an “appropriate” structural operational semantics • Define “collecting” structural operational semantics • Establish a Galois connection between collecting states and reaching definitions • (Local correctness) Show that the abstract interpretation of every atomic statement is soundw.r.t. the collecting semantics • (Global correctness) Conclude that the analysis is sound

Operational semantics statement s Set of states Set of states concretization abstraction statement s abstract representation Abstract semantics Abstract (Conservative) interpretation abstract representation

Induced Analysis (Relatively Optimal) • It is sometimes possible to show that a given analysis is not only sound but optimal w.r.t. the chosen abstraction (but not necessarily optimal) • Define S# (RD) = ({S(s, d) | (s, d)   (RD)}) • But this S# may not be computable • Derive (at compiler-generation time) an alternative form for S# • A useful measure to decide if the abstraction must lead to overly imprecise results

Type and Effect Systems • The type of a program expression at a given program point provides a conservative estimation to its value in all the execution paths • A type system provides a syntax directed rules for annotating expressions with types • Simplest type inference algorithms are linear • But in ML, ABC • But types can also include implementation information such as reaching definitions

Annotated Type Base for Reaching Definitions • S : RD1 RD2if S is executed when the reaching definitions is RD1 it produces reaching definitionsRD2 • Similar to the constraint based approach

[seq] S1 : RD1RD2, S2 : RD2RD3 S1; S2: RD1 RD3 [if] S1 : RD1RD2, S2 : RD1RD2 if [b]l then S1 else S2 : RD1 RD2 Annotated Type Base for Reaching Definitions [ass] [x := a]l’: RD  (RD - {{(x, l) | l Lab })  {(x, l’)} [skip] <[skip]l: RD RD axioms rules

[wh] S : RD RD while [b]l do S: RD RD Annotated Type Base For Whilewhile construct

[sub] S : RD2RD3 S: RD1 RD4 if RD1RD2 and RD3RD4 Annotated Type Base For Whilesubsumption rule

Not Covered • Effect Systems • Transformations

Conclusions • Three similar techniques • Dataflow analysis • Constraint based approach (a generalization) • Type and effect system (directly deals with the syntax) • Abstract interpretation can be used to show soundness of these methods • But more convenient in the dataflow setting

Principles of Program Analysis for Effective Software Development