Download
introduction to abstract interpretation n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Abstract Interpretation PowerPoint Presentation
Download Presentation
Introduction to Abstract Interpretation

Introduction to Abstract Interpretation

158 Views Download Presentation
Download Presentation

Introduction to Abstract Interpretation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to Abstract Interpretation Neil Kettle, Andy King and Axel Simon a.m.king@kent.ac.uk http://www.cs.kent.ac.uk/~amk Acknowledgments: much of this material has been adapted from surveys by Patrick and Radia Cousot

  2. Applications of abstract interpretation • Verification: can a concurrent program deadlock? Is termination assured? • Parallelisation: are two or more tasks independent? What is the worst/base-case running time of function? • Transformation: can a definition be unfolded? Will unfolding terminate? • Implementation: can an operation be specialised with knowledge of its (global) calling context? • Applications and “players” are incredibly diverse

  3. House-keeping

  4. Computing Lab Xmas Party • Located in Origins – the “restaurant” in Darwin • A buffer lunch will be served – courtesy of the department • Department will supply some wine (which last year lasted 10 minutes) • Bar will be open afterwards if some wine is not enough wine • Send an e-mail to Deborah Sowrey [D.J.Sowery@kent.ac.uk] if you want to attend • Come along and meet other post-grads

  5. Casting out nines algorithm • Which of the following multiplications are correct: • 2173  38 = 81574 or • 2173  38 = 82574 • Casting out nines is a checking technique that is really a form of abstract interpretation: • Sum the digits in the multiplicand n1, multiplier n2 and the product n to obtain s1, s2 and s. • Divide s1, s2 and s by 9 to compute the remainder, that is, r1 = s1 mod 9, r2 = s2 mod 9 and r = s mod 9. • If (r1 r2) mod 9  r then multiplication is incorrect • The algorithm returns “incorrect” or “don’t know”

  6. Running the numbers for 2173  38 = 81574 • Compute r1 = (2+1+7+3) mod 9 = … • Compute r2 = (3+8) mod 9 = … • Calculate (r1 r2) mod 9 = … • Calculate r = (8+1+5+7+4) mod 9 = … • Check ((r1 r2) mod 9 = r) = … • Deduce that 2173  38 = 81574 is …

  7. Abstract interpretation is a theory of relationships • The computational domain for multiplication (concrete domain): • N – the set of non-negative integers • The computational domain of remainders used in the checking algorithm (abstract domain): • R = {0, 1, …, 8} • Key question is what is the relationship between an element nN which is used in the real algorithm and its analog rR in the check

  8. What is the relationship? • When multiplicand is n1 = 456, say, then the check uses r1 = (4+5+6) mod 9 = 4 • Observe that • 456 mod 9 = • (4*100 + 56) mod 9 = • (4*90+ 4*10 + 56) mod 9 = • (4*10 + 56) mod 9 = • ((4 + 5)*10 + 6) mod 9 = • ((4 + 5)*9 + (4 + 5) + 6) mod 9 = • (4 + 5 + 6) mod 9 • More generally, induction can show r1= n1 mod 9 and r2 = n2 mod 9

  9. Correctness is the preservation of relationships • The check simulates the concrete multiplication and, in effect, is an abstract multiplication • Concrete multiplication is n = n1 n2 • Abstract multiplication is r = (r1 r2) mod 9 • Where r1 describes n1 and r2 describes n2 • For brevity, write r  n iff r = n mod 9 • Then abstract multiplication preserves  iff whenever r1 n1 and r2 n2 it follows that r  n

  10. Correctness argument • Suppose r1  n1 and r2  n2 • If • n = n1 n2 then • n mod 9 = (n1 n2) mod 9 hence • n mod 9 = ((n1 mod 9)  (n2 mod 9)) mod 9 whence • n mod 9 = (r1 r2) mod 9 = r therefore • r  n • Consequently if (r  n) then n  n1 n2

  11. Summary • Formalise the relationship between the data • Check that the relationship is preserved by the abstract analogues of the concrete operations • The relational framework [Acta Informatica, 30(2):103-129,1993] not only emphases the theory of relations but is very general

  12. Numeric approximation and widening Abstract interpretation does not require a domain to be finite

  13. Interval approximation • Consider the following Pascal-like program • SYNTOX [PLDI’90] inferred the invariants scoped within {…} • Invariants occur between consecutive lines in the program • i[0,15] asserts 0i15 whereas i[0,0] means i=0 begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[0,15]} i := i + 1 {3: i[1,16]} end {4: i[16,16]}

  14. Compilation versus (classic) interpretation • Abstract compilation – compile the concrete program into an abstract program (equation system) and execute the abstract program: • good separation of concerns that aids debugging • the particulars of the domain can be exploited to reorder operations, specialise operations, etc • Abstract interpretation – run the concrete program but on-the-fly interpret its concrete operations as abstract operations: • ideal for a generic framework (toolkit) which is parameterised by abstract domain plugins

  15. Abstract domain that is used in interval analysis • Domain of intervals includes: • [l,u] where l  u and l,u  Z for bounded sets ie [0, 5]{0,1,4} since {0,1,4}  [0, 5] •  to represent the empty set of numbers, that is,  • [l,] for sets which are bounded below such as {l,l+2,l+4,…} • [-,u] to represent sets which are bounded above such as {..,l-5,l-3,l}

  16. Weakening intervals if … then … {1: i[0,2]} else … {2: i[3,5]} endif {3: i[0,5]} Join (path merge) is defined: • Put d1d2 = d1 if d2 =  • d2 else if d1 =  • [min(l1,l2), max(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2]

  17. Strengthening intervals Meet is defined: • Put d1d2 = if (d1 = )  (d2 = ) • [max(l1,l2), min(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2] {3: i[0,5]} if (2 < i) then {4: i[3,5]} … else {5: i[0,2]} …

  18. Meet and join are the basic primitives for compilation • I1= [0,0] since program point (1) immediately follows the i := 0 • I2= (I1 I3)  [-, 15] since: • control from program points (1) and (3) flow into (2) • point (2) is reached only if i < 16 holds • I3 = {n+1 | n  I2} since (3) is only reachable from (2) via the increment • I4= (I1 I3)  [16, ] since: • control from (1) and (3) flow into (4) • point (4) is reached only if (i < 16) holds

  19. Interval iteration

  20. Jacobi versus Gauss-Seidel iteration • With Jacobi, the new vector I1’,I2’,I3’,I4’ of intervals is calculated from the old I1,I2,I3,I4 • With Gauss-Seidel iteration: • I1’ is calculated from I1,I2,I3,I4 • I2’ is calculated from I1’,I2,I3,I4 • I3’ is calculated from I1’,I2’,I3,I4 • I4’ is calculated from I1’,I2’,I3’,I4

  21. Gauss-Seidel versus chaotic iteration • Observe that I4 might change if either I1 or I3 change, hence evaluate I4 after I1 and I3 stabilise • Suggests that wait until stability is achieved at one level before starting on the next I1 I2 {I1} {I4} I4 I3 {I2, I3}

  22. Gauss-Seidel versus chaotic iteration • Chaotic iteration can postpone evaluating Ii for bounded number of iterations: • I1’ is calculated from I1,-,-,- • I2’ and I3’ are calculated Gauss-Seidel style from I1,I2,I3,- • I4’ is calculated from I1’,I2’,I3’,I4 • Fast and (incremental) fixpoint solvers [TOPLAS 22(2):187-223,2000] apply chaotic iteration

  23. Research challenge • Compiling to equations and iteration is well-understood (albeit not well-known) • The implicit assumption is that source is available • With the advent of component and multi-linguistic programming, the problem is how to generate the equations from: • A specification of the algorithm or the API; • The types of the algorithm or component • In the interim, environments with support for modularity either: • Equip the programmer with an equation language • Or make worst-case assumptions about behaviour

  24. Suppose i was decremented rather than incremented begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[-,0]} i := i -1 {3: i[-,-1]} end {4: i} • I1= [0,0] • I2= (I1 I3)  [-, 15] • I3 = {n-1 | n  I2} • I4= (I1 I3)  [16, ]

  25. Ascending chain condition • A domain D is ACC iff it does not contain an infinite strictly increasing chain d1<d2<d3<… where d<d’ iff dd’ and dd’ (see below) • The interval domain D is ordered by: •  d forall dD and • [l1,u1]  [l2,u2] iff l2l1u1u2 and is not ACC since [0,0]<[-1,0]<[-2,0]<… T … -4 –3 –2 –1 0 1 2 3 4 … 

  26. Some very expressive relational domains are ACC • The sub-expression elimination relies on detecting duplicated expression evaluation • Karr [Acta Informatica, 6, 133-151] noticed that detecting an invariance such as y = x/2 – 7 was key to this optimisation begin x := sin(a) * 2; y := sin(a) – 7; end

  27. The affine domain • The domain of affine equations over n variables is: • D = {A,B|A is mn dimensional matrix and B is m dimensional column vector} • D is ordered by: • A1,B1A2,B2 iff (if A1x=B1 then A2x=B2)

  28. Pre-orders versus posets • A pre-order D,  is a set D ordered by a binary relation  such that: • If dd for all dD • If d1d2 and d2d3 then d1d3 • A poset is pre-order D,  such that: • If d1d2 and d2d3 then d1d3

  29. The affine domain is a pre-order (so it is not ACC) • Observe A1,B1A2,B2 but A2,B2A1,B1 A1= B1= A2= B2= • To build a poset from a pre-order • define dd’ iff dd’ and d’d • define [d] = {d’D|dd’} and D = {[d]|dD} • define [d]  [d’]iff dd’ • The poset D,  is ACC since chain length is bounded by the number of variables n

  30. Inducing termination for non-ACC (and huge ACC) domains • Enforce convergence for intervals with a widening operator :DD  D • d = d • d = d • [l1,u1]  [l2,u2] = [if l2<l1 then - else l1, if u1<u2 then  else u1] • Examples • [1,2][1,2] = [1,2] • [1,2][1,3] = [1,] but [1,3][1,2] = [1,3] • Safe since [li,ui]([l1,u1][l2,u2]) for i{1,2}

  31. Chaotic iteration with widening • To terminate it is necessary to traverse each loop a finite number of times • It is sufficient to pass through I2 or I3 a finite number of times [Bourdoncle, 1990] • Thus widen at I3 since it is simpler I1 I2 I4 I3

  32. Termination for the decrement • I1= [0,0] • I2= (I1 I3)  [-, 15] • I3 = I3{n-1 | n  I2} note the fix • I4= (I1 I3)  [16, ] • When I2 = [-1,0] and I3 = [-1,0], then I3{n+1 | n  I2} = [-1,0]  [-2,-1] = [-,0]

  33. Widening dynamic data-structures cons cons cons or or 0 nil or or 0 1 nil cons begin i := 0; p := nil; while (i < 16) do i := i +1 p := new cons(i, p); {1:pcons(i, …cons(0,nil))} end 0 1 2 nil cons 0 nil or or 0 1 nil cons 0 nil

  34. Depth-2 versus type-graph widening cons cons or or or or 0 1 2 nil cons 0 1 2 nil any any • Type-graph widening is more compact • Type-graph widening becomes difficult when a list contains lists as its elements • In constraint-based analysis, widening is dispensed with altogether

  35. (Malicious) research challenge • Read a survey paper to find an abstract domain that is ACC but has a maximal chain length of O(2n) • Construct a program with O(n) symbols that iterates through all O(2n) abstractions • Publish the program in IPL

  36. Not all numeric domains are convex • A set SRn is convex iff for all x,yS it follows that {x + (1-)y | 01}  S • The 2 leftmost sets in R2 are convex but the 2 rightmost sets are not.

  37. Are intervals or affine equations convex? • Suppose the values of n variables are represented by n intervals [l1,u1],…,[ln,un] • Suppose x=x1,…,xn, y=y1,…,ynRnare described by the intervals • Then each lixiui and each liyiuiu • Let 01 and observe z = x + (1-)y = x1 + (1-)y1, …, xn + (1-)yn • Therefore limin(xi, yi) xi + (1-)yi max(xi, yi)ui and convexity follows

  38. Arithmetic congruences are not convex • Elements of the arithmetic congruence (AC) domain take the form x – 2y = 1 (mod 3) which describes integral values of x and y • More exactly, the AC domain consists of conjunctions of equations of the form c1x1+…+cmxm = (c mod n) where ci,cZ and nN • Incredibly AC is ACC [IJCM, 30, 165--190, 1989]

  39. Research challenge • Søndergaard [FSTTCS,95] introduced the concept of an immediate fixpoint • Consider the following (groundness) dependency equations over the domain of Boolean functions Bool, ,  • f1 = x  (y  z) • f2 = t(x(z(u  (tx)  v  (tz)  f4))) • f3 = u (v(x  u  z  v  f2)) • f4 = f1 f3 • Where x(f) = f[x true]f[x false] thus x(xy) = true and x(xy) = y

  40. The alternative tactic • The standard tactic is to apply iteration: • Søndergaard found that the system can be solved symbolically (like a quadratic) • This would be very useful for infinite domains for improved precision and predictability

  41. Combining analyses • Verifiers and optimisers are often multi-pass, built from several separate analyses • Should the analyses be performed in parallel or in sequence? • Analyses can interact to improve one another (problem is in the complexity of the interaction [Pratt])

  42. Pruning combined domains • Suppose that 1 D1C and 2D2C, then how is D=D1D2 interpreted? • Then d1,d2c iff d11c d22c • Ideally, many d1,d2D will be redundant, that is, cC . c1d1c2d2

  43. Time versus precision from TOPLAS 17(1):28--44,1993

  44. The Galois framework Abstract interpretation is often presented in terms of Galois connections

  45. Lattices – a prelude to Galois connections • Suppose S,  is a poset • A mapping :SSS is a join (least upper bound) iff • ab is an upper bound of a and b, that is, aab and bab for all a,bS • ab is the least upper bound, that is, if cS is an upper bound of a and b, then abc • The definition of the meet :SSS (the greatest lower bound) is analogous

  46. Complete lattices • A lattice S, , ,  is a poset S,  equipped with a join  and a meet  • The join concept can often be lifted to sets by defining :(S)S iff • t(T) for all TS and for all tT • if ts for all tT then (T)s • If meet can often be lifted analogously, then the lattice is complete • A lattice that contains a finite number of elements is always complete

  47. A lattice that is not complete • A hyperplane in 2-d space in a line and in 3-d space is a plane • A hyperplane in Rn is any space that can be defined by {xRn | c1x1+…+cnxn = c} where c1,…,cn,cR • A halfspace in Rn is any space that can be defined by {xRn | c1x1+…+cnxn c} • A polyhedron is the intersection of a finite number of half-spaces

  48. Examples and non-examples in planar space

  49. Join for polyhedra • Join of polyhedra P1 and P2 in Rn coincides (with the topological closure) of the convex hull of P1P2

  50. The “join” of an infinite set of polyhedra • Consider the following infinite chain of regular polyhedra: • The only space that contains all these polyhedra is a circle yet this is not polyhedral