1 / 53

A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk 

A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk . Arnaud Venet Kestrel Technology arnaud@kestreltechnology.com. Please see Notes View for annotations. Static Analysis. Code. List of defects. Static Analysis Tool. Confirmed property violations

kort
Download Presentation

A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk Arnaud Venet Kestrel Technology arnaud@kestreltechnology.com Please see Notes View for annotations. Kestrel Technology LLC

  2. Static Analysis Code List of defects Static Analysis Tool • Confirmed property violations • Indeterminates (possible violations) Parameters Kestrel Technology LLC

  3. Can we find all defects? • Classic undecidability results in Computer Science apply (halting problem) • We require soundness (no defects are missed) • A conservative approach is acceptable • Abstract Interpretation is the enabling theory in CodeHawk • Sound • Tunably precise • Scalable • “Generatively general” Kestrel Technology LLC

  4. Intuition All lines in the code Abstract Interpretation SW Defects Kestrel Technology LLC

  5. Abstract Interpretation • Computes an envelope of all data in the program • Mathematical assurance • Static analyzers based on Abstract interpretation are difficult to engineer • KT’s expertise: building scalable and effective abstract interpreters Kestrel Technology LLC

  6. Properties vs. defects • An application might be free of programming language defects but not carry the desired property • resource issues (memory, execution time) • separation • range of output data • Abstract Interpretation can address application-specific of properties as well as language defects Kestrel Technology LLC

  7. Objectives • Go over a detailed example • Understand how the technology works • Achievements and challenges in the engineering of abstract interpreters • What it means to build an analyzer based on Abstract Interpretation Kestrel Technology LLC

  8. Detailed example Kestrel Technology LLC

  9. Buffer overflow for(i = 0; i < 10; i++) { if(message[i].kind == SHORT_CMD) allocate_space (channel, 1000); else allocate_space (channel, 2000); } Can we exceed the channel’s buffer capacity? Kestrel Technology LLC

  10. i = 0 i < 10 ? stop message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) i++ Control Flow Graph Kestrel Technology LLC

  11. i = 0 i = 0 i < 10 ? i < 10 ? stop stop message[i].kind == SHORT_CMD ? message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 2000) allocate_space (channel, 1000) allocate_space (channel, 1000) i++ i++ Analytic model of the code M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Kestrel Technology LLC

  12. Analysis process intuition • We mimic the execution of the program • We collect all possible data values for all variables for all possible traces Kestrel Technology LLC

  13. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Analyzing the model Kestrel Technology LLC

  14. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Initially Kestrel Technology LLC

  15. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Loop initialization Kestrel Technology LLC

  16. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Loop entry Kestrel Technology LLC

  17. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Analyzing a branching (1) Kestrel Technology LLC

  18. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Analyzing a branching (2) ? Kestrel Technology LLC

  19. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Accumulating all possible values Kestrel Technology LLC

  20. Abstraction of point clouds • We want the analysis to terminate in reasonable time • We need a tractable representation of point clouds in arbitrary dimensions • Abstract Interpretation offers a broad choice of such representations • Example: convex polyhedra • Compute the convex hull of a point cloud Kestrel Technology LLC

  21. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Analyzing a branching Kestrel Technology LLC

  22. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Convex hull Kestrel Technology LLC

  23. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Iterating the loop analysis Kestrel Technology LLC

  24. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Building the loop invariant Kestrel Technology LLC

  25. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Analyzing a branching Kestrel Technology LLC

  26. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Analyzing a branching Kestrel Technology LLC

  27. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Convex hull Kestrel Technology LLC

  28. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Building the loop invariant Kestrel Technology LLC

  29. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Keeping iterating… Kestrel Technology LLC

  30. Passing to the limit • We want this iterative process to end at some point • We need to converge when analyzing loops • After some iteration steps, we use a widening operation at loop entry to enforce convergence Kestrel Technology LLC

  31. Widening  • Let a1, a2, …an, … be a sequence of polyhedra, then the sequence • w1 = a1 • wn+1 = wn an+1 is ultimately stationary • The widening is a join operation i.e., a  a  b & b  a  b Kestrel Technology LLC

  32. Widening for intervals • [a, b]  [c, d] = [if c < a then - else a, if b < d then + else b] • Example: [10, 20]  [11, 30] = [10, +] Kestrel Technology LLC

  33. Widening for polyhedra • We eliminate the faces of the computed convex envelope that are not stable • Convergence is reached in at most N steps where N is the number of faces of the polyhedron at loop entry Kestrel Technology LLC

  34. M M M = 0 i = 0 i < 10 ? i i stop M = M + 1000 M = M + 2000 i++ Widening Kestrel Technology LLC

  35. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ After widening Kestrel Technology LLC

  36. Detecting convergence • Abstract iteration sequence • F1 = P (initial polyhedron) • Fn+1 = Fn if S(Fn) Fn Fn S(Fn) otherwise where S is the semantic transformer associated to the loop body • Theorem: if there exists N such that FN+1  FN, then Fn = FN for n > N. Kestrel Technology LLC

  37. M M = 0 i = 0 i < 10 ? i stop M = M + 1000 M = M + 2000 i++ Convergence The computation has converged Kestrel Technology LLC

  38. We are not done yet… • The analyzer has just proven that 1000 * i≤ M ≤ 2000 * i • But we have lost all information about the termination condition 0 ≤i ≤ 10 • Since we have obtained an envelope of all possible values of the variables, if we run the computation again we still get such an envelope • The point is that this new envelope can be smaller • This refinement step is called narrowing Kestrel Technology LLC

  39. M = 0 i = 0 i < 10 ? i M stop M = M + 1000 M = M + 2000 i 9 i++ Refinement M Kestrel Technology LLC

  40. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Analyzing a branching i M i i 9 M i 9 Kestrel Technology LLC

  41. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Convex hull M i 9 Kestrel Technology LLC

  42. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Back to loop entry M i 10 1 M i Kestrel Technology LLC

  43. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Narrowing M i M i 10 Kestrel Technology LLC

  44. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Refined loop invariant M 10 i Kestrel Technology LLC

  45. M = 0 i = 0 i < 10 ? stop M = M + 1000 M = M + 2000 i++ Invariant at loop exit M 20,000 10,000 i  10 10 i Kestrel Technology LLC

  46. Interpretation of the results Space allocated in the buffer [ ] 25,000 15,000 Buffer size: 20,000 10,000 5,000 Certain buffer overflow No buffer overflow Possible buffer overflow Kestrel Technology LLC

  47. Achievements and challenges Kestrel Technology LLC

  48. Assured static analyzers for NASA • C Global Surveyor: verified array bound compliance for NASA mission-critical software • Mars Exploration Rovers: 550K LOC • Deep Space 1: 280K LOC • Mars Path Finder: 140K LOC • Pointer analysis: • International Space Station payload software (major bug found) Kestrel Technology LLC

  49. What we observe • No scalable, precise, and general-purpose abstract interpreter • PolySpace: • Handles all kinds of runtime errors • Decent precision (<20% indeterminates) • Doesn’t scale (topped out at  40K LOC) • Customization is the key • Specialized for a property or a class of applications • Manually crafted by experts Kestrel Technology LLC

  50. CodeHawk™ • Abstract Interpretation development platform/ static analyzer generator • Automated generation of customized static analyzers • Leverage from pre-built analyzers • Directly tunable by the end-user Kestrel Technology LLC

More Related