1 / 37

Using First-Order Theorem Provers in Data Structure Verification

Using First-Order Theorem Provers in Data Structure Verification. Charles Bouillaguet Ecole Normale Sup érieure, Cachan, France. Viktor Kuncak Martin Rinard MIT CSAIL. Implementing Data Structures is Hard. Often small, but complex code Lots of pointers Unbounded, dynamic allocation

sawyer
Download Presentation

Using First-Order Theorem Provers in Data Structure Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using First-Order Theorem Provers in Data Structure Verification Charles Bouillaguet Ecole Normale Supérieure, Cachan, France Viktor Kuncak Martin Rinard MIT CSAIL

  2. Implementing Data Structures is Hard • Often small, but complex code • Lots of pointers • Unbounded, dynamic allocation • Complex shape invariants • Dag • Properties involving arithmetic (ordering…) • Need strong invariants to guarantee correctness • e.g. lookup in ordered tree needs sortedness

  3. How to obtain reliable data structure implementations? • Approach • Prove that the program is correct • For all program executions (sound) • Verified properties: • Data structure operations do not crash • Data structure invariants are preserved • Data structure content is correctly updated

  4. Infrastructure • Jahob system for verifying data structure implementation • Kuncak, Wies, Zee, Rinard, Nguyen, Bouillaguet, Schmitt, Marnette, Bugrara • Analyzed programs: subset of Java • Specification : subset of Isabelle’s language

  5. Implementations of relations Add a binding Remove all bindings for a given key Test key membership Retrieve data bound to a key Test emptiness Verified implementations Linked list Ordered tree Hash table Summary of Verified Data Structures

  6. An Example : Ordered Trees • Implementation of a finite map • Operations • insert • lookup • remove • Representation invariants: • tree shaped (acyclicity, unique parent) • ordering constraints keyvalue right left

  7. Sample code publicstatic FuncTree update(int k, Object v, FuncTree t) { FuncTree new_left, new_right; Object new_data; int new_key; if (t==null) { new_data = v; new_key = k; new_left = null; new_right = null; } else { if (k < t.key) { new_left = update(k, v, t.left); new_right = t.right; new_key = t.key; new_data = t.data; } elseif (t.key < k) { … } else { new_data = v; new_key = k; new_left = t.left; new_right = t.right; } } FuncTree r = new FuncTree(); r.left = new_left; r.right = new_right; r.data = new_data; r.key = new_key; return r; }

  8. Sample code publicstatic FuncTree update(int k, Object v, FuncTree t) /*: requires "v ~= null” ensures "result..content = t..content - {(x,y). x=k} + {(k,v)} */ { FuncTree new_left, new_right; Object new_data; int new_key; if (t==null) { new_data = v; new_key = k; new_left = null; new_right = null; } else { if (k < t.key) { new_left = update(k, v, t.left); new_right = t.right; new_key = t.key; new_data = t.data; } elseif (t.key < k) { … } else { new_data = v; new_key = k; new_left = t.left; new_right = t.right; } } FuncTree r = new FuncTree(); r.left = new_left; r.right = new_right; r.data = new_data; r.key = new_key; //: "r..content" := "t..content - {(x,y). x=k} + {(k,v)}"; return r; } no null dereferences 3 lines spec 30 lines code postcondition holds and invariants preserved

  9. Ordered tree: interface public ghost specvar content :: "(int * obj) set" = "{}"; publicstatic FuncTree empty_set()ensures "result..content = {}" publicstatic FuncTree add(int k, Object v, FuncTree t)requires "v ~= null & (ALL y. (k,y) ~: t..content)”ensures "result..content = t..content Un {(k,v)}” publicstatic FuncTree update(int k, Object v, FuncTree t)requires "v ~= null”ensures "result..content = t..content - {(x,y). x=k} + {(k,v)}” publicstatic Object lookup(int k, FuncTree t) ensures "((k, result) : t..content) | (result = null & (ALL v. (k,v) ~: t..content))” publicstatic FuncTree remove(int k, FuncTree t)ensures "result..content = t..content - {(x,y). x=k}”

  10. Representation Invariants publicfinalclass FuncTree{ privateint key;private Object data;private FuncTree left, right; /*: public ghost specvar content :: "(int * obj) set"; invariant ("content definition") "this ~= null --> content = {(key, data)} Un left..content Un right..content" invariant ("null implies empty") "this = null --> content = {}" invariant ("left children are smaller") "ALL k v. (k,v) : left..content --> k < key” invariant ("right children are bigger") "ALL k v. (k,v) : right..content --> k > key" */ abstract set-valued field tuples implicit universal quantification over this equality between sets arithmetic explicit quantification

  11. How could these properties be verified?

  12. Standard Approach eauto ; intros . intuition ; subst . apply Extensionality_Ensembles. unfold Same_set. unfold Included. unfold In. unfold In in H1. intuition. destruct H0. destruct (eq_nat_dec x1 ArraySet_size).subst. rewrite arraywrite_match in H0 ; auto. intuition. subst. apply Union_intror. auto with sets. assert (x1 < ArraySet_size). omega. clear n. apply Union_introl. rewrite arraywrite_not_same_i in H0.unfold In. exists x1. intuition.omega. inversion H0 ; subst ; clear H0. unfold In in H3. destruct H3. exists x1. intuition. rewrite arraywrite_not_same_i. intuition ; omega. omega. exists ArraySet_size. intuition. inversion H3. subst. rewrite arraywrite_match ; trivial. • Transform program into a logic formula • Using weakest precondition • The program is correct iff the formula is valid • Prove the formula • Very difficult formulas: interactively (Coq, Isabelle) • Decidable classes: automated (MONA, CVCL, Omega) • This talk: difficult formulas in automated way :) • low efficiency • 1 line per grad student-minute • parallelization looks non-trivial

  13. Formulas in Jahob • Very expressive specification language • Higher-Order features • How to prove formulas automatically? • Convert them to something simpler • Decidable classes • First-Order Logic

  14. Automated reasoning in Jahob

  15. Why FOL? • Existing theorem provers: • SPASS, E, Vampire, Theo, Prover9, … • continuously improving (yearly competition) • Effective on formulas with short proofs • Handle nicely formulas with quantifiers

  16. HOL  FOL • Ideas : • avoid axiomatizing rich theories • Translate what can naturally be expressed in FOL • soundly approximate the rest • Sound, incomplete approach • Full details in long version of the paper • (x,y) ∈ z.content ⇢ Content(x,y,z) • w.f := y ⇢(x=y ⋀ w=v) ⋁ (x ≠ y ⋀ w=f(y) ) • λx.E = λx.F⇢∀x. E=F • …

  17. Arithmetic • Numbers are uninterpreted constants in FOL • Provers do not know that 1+1=2 ! • Still need to reason about arithmetic… • Our Solution • Provide partial, incomplete axiomatization • Still cannot deduce 1+1=2 ! • comparison between constants in formula • Satisfactory results in practice • ordering of elements in tree • array bound checks

  18. Observation • Most formulas are easy to prove • ie in no measurable time • have very short proofs (in # of resolution step) • Problem often concentrated in a small number that take very long to prove • We applied two existing techniques to make them easier • Eliminating type/sort information • Filtering unnecessary assumptions

  19. Sort Information • Specification language has sorts • Integers • Objects • Boolean • Translate to unsorted FOL ∀(x : Obj). P(x) ⇣ ∀x. Obj(x) ⇒P(x)

  20. Sort Information • Encoding sort information • bigger formulas • longer proofs • Formulas become harder to prove • Temptation to omit sort information…

  21. Effect on hard formulas • Formulas that take more than 1s to prove, from the Tree implementation (SPASS)

  22. Omitting Sorts (cont’d) • Great speed-up (more than x10 sometimes) ! • However: ∀ (x y:S). x = y ∃ (x y:T). x ≠ y • Satisfiable with sorts (S={a}, T={b,c})… • Unsatisfiable without! • Omitting sort guards breaks soundness!!! • Possible workaround: type-check generated proof • When it is possible to skip type-checking ?

  23. Omitting Sorts Result We proved the following Theorem. Suppose that • Sorts are pair-wise disjoint (no sub-sorting) • Sorts have the same cardinality Then omitting sort guards is sound and complete This justify this useful optimization

  24. Assumption Filtering • Provers get confused by too many assumptions • Lots of useless assumptions • Hardest shown benchmark needs 12 out of 56 • Big benchmark: on average 33% necessary • Assumption filtering • Try to eliminate irrelevant assumptions automatically • Give a score to assumption based on relevance

  25. Experimental results

  26. Verification effort • Decreased as we improved the system • functional list was easy • a few days for trees • two hours for simple hash table • FOL : Currently most usable method for these kind of data structures

  27. Related work • Interactive Provers – Isabelle, Coq, HOL, PVS, ACL2 • First-Order ATP • Vampire – Voronkov [04] • SPASS – Weidenbach [01] • E – Shultz [IJCAR04] • Program Checking • ESC/Java2 – Kiniry, Chalin, Hurlin • Krakatoa – Marche, Paulin-Mohring, Urbain [03] • Spec# – Barnett, DeLine, Jacobs, Fähndrich, Leino, Schulte, Venter [05] • Hob system: verify set implementations (we verify relations) • Shape analysis • PALE - Møller and Schwartzbach [PLDI01] • TVLA - Sagiv, Reps, and Wilheim [TOPLAS02] • Roles - Kuncak, Lam, and Rinard [POPL02]

  28. Multiple Provers - Screenshot

  29. Conclusion • Jahob verification system • Automation by translation HOLFOL • omitting sorts theorem gives speedup • filtering automates selection of assumptions • Promising experimental results • strong properties: correct implementation • Do not crash • operations correctly update the content, clarifies behavior in case of duplicate keys, … • representation invariants preserved (ordering, treeness, each element is in appropriate bucket) • relatively fast • verification effort much smaller than using interactive provers

  30. Thank you Formal Methods are the Future of computer Science. Always have been… Always will be. Questions ?

  31. Converting to GCL • Conditionnal statement: easy • [| ifcondthentbranchelsefbranch |] = (Assume cond; [| tbranch|] )  (Assume !cond; [| fbranch|] ) • Procedure calls: • Could inline (potentially exponential blowup) • Desugaring (modularity) : • [| r = CALL m(x, y, z) |] = Assert (m’s precondition); Havoc r; Havoc {vars modified by m} ; Assume (m’s postcondition)

  32. Converting to GCL (cont’d) • Loops: invariant required • [| while/*: invariant */ (condition) {lbody} |] = assert invariant; havoc vars(lbody); assume invariant; ((assume condition; [| lbody |]; assert invariant; assume false)  (assume !condition)) invariant hold initially no assumptions on variables except that invariant hold condition hold invariant is preserved no need to verify anything more or condition do not hold and execution continues

  33. Verification condition for remove ((((fieldRead Pair_data null) = null) & ((fieldRead FuncTree_data null) = null) & ((fieldRead FuncTree_left null) = null) & ((fieldRead FuncTree_right null) = null) & (ALL (xObj::obj). (xObj : Object)) & ((Pair Int FuncTree) = {null}) & ((Array Int FuncTree) = {null}) & ((Array Int Pair) = {null}) & (null : Object_alloc) & (pointsto Pair Pair_data Object) & (pointsto FuncTree FuncTree_data Object) & (pointsto FuncTree FuncTree_left FuncTree) & (pointsto FuncTree FuncTree_right FuncTree) & comment ''unalloc_lonely'' (ALL (x::obj). ((x ~: Object_alloc) --> ((ALL (y::obj). ((fieldRead Pair_data y) ~= x)) & (ALL (y::obj). ((fieldRead FuncTree_data y) ~= x)) & (ALL (y::obj). ((fieldRead FuncTree_left y) ~= x)) & (ALL (y::obj). ((fieldRead FuncTree_right y) ~= x)) & ((fieldRead Pair_data x) = null) & ((fieldRead FuncTree_data x) = null) & ((fieldRead FuncTree_left x) = null) & ((fieldRead FuncTree_right x) = null)))) & comment ''ProcedurePrecondition'' (True & comment ''FuncTree_PrivateInv content definition'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) = (({((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)), (fieldRead (FuncTree_data :: (obj => obj)) (this :: obj)))} Un (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_left :: (obj => obj)) (this :: obj)))) Un (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_right :: (obj => obj)) (this :: obj))))))) & comment ''FuncTree_PrivateInv null implies empty'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) = null)) --> ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) = {}))) & comment ''FuncTree_PrivateInv no null data'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead (FuncTree_data :: (obj => obj)) (this :: obj)) ~= null))) & comment ''FuncTree_PrivateInv left children are smaller'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree)) --> (ALL k. (ALL v. (((k, v) : (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_left :: (obj => obj)) (this :: obj)))) --> (intless k (fieldRead (FuncTree_key :: (obj => int)) (this :: obj)))))))) & comment ''FuncTree_PrivateInv right children are bigger'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree)) --> (ALL k. (ALL v. (((k, v) : (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_right :: (obj => obj)) (this :: obj)))) --> ((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)) < k))))))) & comment ''t_type'' (((t :: obj) : (FuncTree :: obj set)) & ((t :: obj) : (Object_alloc :: obj set)))) --> ((comment ''TrueBranch'' (((t :: obj) = null) :: bool) --> (comment ''ProcedureEndPostcondition'' ((((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (null :: obj)) = ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (t :: obj)) - {p. (EX x y. ((p = (x, y)) & (x = (k :: int))))})) & (ALL (framedObj::obj). (((framedObj : Object_alloc) & (framedObj : FuncTree)) --> ((fieldRead FuncTree_content framedObj) = (fieldRead FuncTree_content framedObj))))) & comment ''FuncTree_PrivateInv content definition'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) = (({((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)), (fieldRead (FuncTree_data :: (obj => obj)) (this :: obj)))} Un (fieldRead (FuncTree_content :: (obj => … And 200 more kilobytes… Infeasible to prove directly

  34. Splitting heuristic • Verification condition is big conjunction • conjunctions in postcondition • proving each invariant • proving each branch in program • Solution: split VC into individual conjuncts • Prove each conjunct separately • Each conjunct has form H1 /\ … /\ Hn Gi Tree.Remove has 230 such conjuncts • How do we prove them?

  35. Detupling (cont’d) • Complete rules:

  36. Handling of Fields (cont’d) • We dealt with field updates • New function expressed in terms of old one • Base case: field variables • Natural encoding in FOL using functions: x = y.f ! x = f(y)

  37. Future work • Verify more examples • balanced trees • fancy priority queues (binomial, Fibonacci, …) • hash table with dynamic resizing • hash function • verify clients of data structures • Improve assumption filtering • take rarity of symbols into account • check for occurring polarity • …

More Related