Static Program Analysis using Abstract Interpretation

1 / 65

# Static Program Analysis using Abstract Interpretation - PowerPoint PPT Presentation

## Static Program Analysis using Abstract Interpretation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Static Program Analysis using Abstract Interpretation Guillaume Brat brat@email.arc.nasa.gov Arnaud Venet venet@email.arc.nasa.gov Kestrel Technology NASA Ames Research Center Moffett Field, CA 94035

2. Introduction

3. Static Program Analysis Static program analysis consists of automaticallydiscovering properties of a program that hold for all possible execution paths of the program. Static program analysis is not • Testing:manuallychecking a property for some execution paths • Model checking:automaticallychecking a property for all execution paths

4. Program Analysis for what? • Optimizing compilers • Program understanding • Semantic preprocessing: • Model checking • Automated test generation • Program verification

5. int a[1000]; for (i = 0; i < 1000; i++) { a[i] = … ; // 0 <= i <= 999 } a[i] = … ; // i = 1000; safe operation buffer overrun Program Verification • Check that every operation of a program will never cause an error (division by zero, buffer overrun, deadlock, etc.) • Example:

6. Incompleteness of Program Analysis • Discovering a sufficient set of properties for checking every operation of a program is an undecidable problem! • False positives: operations that are safe in reality but which cannot be decided safe or unsafe from the properties inferred by static analysis.

7. Precision versus Efficiency • Precision and computational complexity are strongly related • Tradeoff precision/efficiency: limit in the average precision and scalability of a given analyzer • Greater precision and scalability is achieved through specialization Precision: number of program operations that can be decided safe or unsafe by an analyzer.

8. Specialization • Tailoring the program analyzer algorithms for a specific class of programs (flight control commands, digital signal processing, etc.) • Precision and scalability is guaranteed for this class of programs only • Requires a lot of try-and-test to fine-tune the algorithms • Need for an open architecture

9. Soundness • What guarantees the soundness of the analyzer results? • In dataflow analysis and type inference the soundness proof of the resolution algorithm is independent from the analysis specification • An independent soundness proof precludes the use of test-and-try techniques • Need for analyzers correct by construction

10. Abstract Interpretation • A general methodology for designing static program analyzers that are: • Correct by construction • Generic • Easy to fine-tune • Scalability is difficult to achieve but the payoff is worth the effort!

11. Approximation The core idea of Abstract Interpretation is the formalization of the notion of approximation • An approximation of memory configurations is first defined • Then the approximation of all atomic operations • The approximation is automatically lifted to the whole program structure • The approximation is generally a scheme that depends on some other parameter approximations

12. Overview of Abstract Interpretation • Start with a formal specification of the program semantics (the concrete semantics) • Construct abstract semantic equations w.r.t. a parametric approximation scheme • Use general algorithms to solve the abstract semantic equations • Try-and-test various instantiations of the approximation scheme in order to find the best fit

13. The Methodology of Abstract Interpretation

14. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

15. Lattices and Fixpoints • A lattice (L, ⊑, ⊥, ⊔, ⊤, ⊓) is a partially ordered set (L, ⊑) with: • Least upper bounds (⊔) and greatest lower bounds (⊔) operators • A least element “bottom”: ⊥ • A greatest element “top”: ⊤ • L is complete if all least upper bounds exist • A fixpoint X of F: L → L satisfies F(X) = X • We denote by lfp F the least fixpoint if it exists

16. Fixpoint Theorems • Knaster-Tarski theorem: If F: L → L is monotone and L is a complete lattice, the set of fixpoints of F is also a complete lattice. • Kleene theorem: If F: L → L is monotone, L is a complete lattice and F preserves all least upper bounds then lfp F is the limit of the sequence: F0 = ⊥ Fn+1 = F (Fn)

17. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

18. Concrete Semantics Small-step operational semantics: (S, ) s =  programpoint, env s  s' Example: 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit 1, n   2, n  0  3, n  0  4, n  1  2, n  1  ...  5, n  1000 Undefined value

19. n = 0 1 2 n ≥ 1000 n < 1000 3 n = n + 1 5 4 Control Flow Graph 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit

20. Transition Relation op Control flow graph: ⓘⓙ Operational semantics: 〈ⓘ, ε〉 → 〈ⓙ,〚op〛ε 〉 Semantics of op

21. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

22. Collecting Semantics The collecting semantics is the setof observable behaviours in the operational semantics. It is the starting point of any analysis design. • The set of all descendants of the initial state • The set of all descendants of the initial state that can reach a final state • The set of all finite traces from the initial state • The set of all finite and infinite traces from the initial state • etc.

23. Which Collecting Semantics? • Buffer overrun, division by zero, arithmetic overflows: state properties • Deadlocks, un-initialized variables: finite trace properties • Loop termination: finite and infinite trace properties

24. State properties The set of descendants of the initial state s0: S = {s | s0  ...  s} Theorem:F: ((S), )  ((S), ) F (S) = {s0} {s' | s  S: s  s'} S = lfp F

25. Example 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit S = {1, n , 2, n  0, 3, n  0, 4, n  1, 2, n  1, ..., 5, n  1000}

26. Computation • F0 = ∅ • F1 = {〈1,n⇒Ω〉} • F2 = {〈1,n⇒Ω〉, 〈2,n⇒0〉} • F3 = {〈1,n⇒Ω〉, 〈2,n⇒0〉, 〈3,n⇒0〉} • F4 = {〈1,n⇒Ω〉, 〈2,n⇒0〉, 〈3,n⇒0〉, 〈4,n⇒1〉} • ...

27. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

28. Partitioning • Σ = Σ1⊕Σ2⊕ ... ⊕Σn • Σi = {〈k, ε〉 ∈ Σ | k = i } • F(S1, ..., Sn)i = {s' ∈ Si | ∃j∃s ∈ Sj: s  s'} • F(S1, ..., Sn)i = {〈i,〚op〛ε〉 | ⓙⓘ ∈ CFG (P)} • F(S1, ..., Sn)0 = { s0 } We partition the set S of states w.r.t. program points: op

29. 〈1, e1〉 〈1, e2〉 〈i,〚op〛ε1〉 〈j, ε1〉 〈i,〚op〛ε2〉 〈j, ε2〉 Illustration Σ Σ S1 Σ1 Si op op Σi Sj F Σj

30. Semantic Equations • Notation:Ei = set of environments at program point i • System of semantic equations: • Solution of the system = S = lfpF op Ei = U {〚op〛Ej | ⓙ ⓘ ∈ CFG (P)}

31. Example 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit E1 = {n } E2 =〚n = 0〛E1E4 E3 = E2 ]-, 999] E4 =〚n = n + 1〛E3 E5 = E2 [1000, +[

32. n = 0 1 2 n < 1000 n ≥ 1000 3 n = n + 1 5 4 Example E1 = {n } 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit E2 =〚n = 0〛E1E4 E3 = E2 ]-, 999] E4 =〚n = n + 1〛E3 E5 = E2 [1000, +[

33. Other Kinds of Partitioning • In the case of collecting semantics of traces: • Partitioning w.r.t. procedure calls: context sensitivity • Partitioning w.r.t. executions paths in a procedure: path sensitivity • Dynamic partitioning (Bourdoncle)

34. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

35. Approximation Problem: Compute a sound approximation S#ofS S S# Solution: Galois connections

36. Galois Connection L1, L2 two lattices Abstract domain g (L1, ) (L2, ) a • xy :a (x)  y  x  g (y) • xy : x  g oa (x) & a og (y)  y

37. aoFog L2 L2 g a L1 L1 F Fixpoint Approximation Theorem: lfp Fg (lfp aoFog)

38. Abstracting the Collecting Semantics • Find a Galois connection: • Find a function: aoFog F# g ((S), ) (S#, ) a Partitioning ➱ Abstract sets of environments

39. Abstract Algebra • Notation:Ethe set of all environments • Galois connection: • ,  approximated by #, # • Semantics〚op〛approximated by〚op〛# g ((E), ) (E#,) a ao〚op〛og  〚op〛#

40. Abstract Semantic Equations 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit E1# = a ({n }) E2# =〚n = 0〛# E1##E4# E3# = E2##a (]-, 999]) E4# =〚n = n + 1〛# E3# E5# = E2##a ([1000, +[)

41. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

42. Abstract Domains Environment: x  v, y  w, ... Various kinds of approximations: • Intervals (nonrelational): x  [a, b], y  [a', b'], ... • Polyhedra (relational): x + y - 2z  10, ... • Difference-bound matrices (weakly relational): y - x  5, z - y  10, ...

43. Example: intervals 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit • Iteration 1: E2# = [0, 0] • Iteration 2: E2# = [0, 1] • Iteration 3: E2# = [0, 2] • Iteration 4: E2# = [0, 3] • ...

44. Problem How to cope with lattices of infinite height? Solution: automatic extrapolation operators

45. Methodology Concrete Semantics Collecting Semantics Tuners Partitioning Abstract Domain Iterative Resolution Algorithms Abstract Semantics Abstract Domain

46. Widening operator Lattice (L, ):  : L  L  L • Abstract union operator: xy : x  x  y & y  x  y • Enforces convergence: (xn)n  0 y0 = x0 yn + 1 = yn xn+1 (yn)n≥0 is ultimately stationary

47. Widening of intervals [a, b]  [a', b'] • If a  a' then a else - • If b'  b then b else + ➥ Open unstable bounds (jump over the fixpoint)

48. fixpoint widening Widening and Fixpoint

49. Iteration with widening 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit (E2#)n+1 = (E2#)n(〚n = 0〛# (E1#)n# (E4#)n) Iteration 1 (union): E2# = [0, 0] Iteration 2 (union): E2# = [0, 1] Iteration 3 (widening): E2# = [0, +] stable

50. Imprecision at loop exit 1: n = 0; 2: while n < 1000 do 3: n = n + 1; 4: end 5: exit; t[n] = 0; // thas 1500 elements False positive!!! • E5#= [1000, +[ • The information is present in the equations