1 / 24

Ditto: Speeding Up Runtime Data Structure Invariant Checks

Ditto is a tool that speeds up data structure invariant checks in Java programs, allowing for automatic and efficient debugging of large-scale web applications. It incrementally checks for changes in the data structure and optimizes the rerun of invariant checks.

ardice
Download Presentation

Ditto: Speeding Up Runtime Data Structure Invariant Checks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ditto:Speeding Up Runtime Data Structure Invariant Checks AJ Shankar and Ras Bodik UC Berkeley

  2. Motivation: A Debugging Scenario • Buggy program: a large-scale web application in Java • Primary data structure: hashMap of shopping carts • Carts are modified throughout code • Bug: hashMap acting weird: carts disappearing, etc. • Hypothesis: cart modification violates hashCode() invariance

  3. How to Check the Hypothesis? • Debugger facilities inadequate • Idea: write a runtime check • Iterates over buckets, checks hashCode() of each cart in bucket • Run check frequently to pinpoint error

  4. Problem • The check is slow! (100x slowdown) • Rerunning the program is now a problem • Furthermore, what if bug isn’t reproducible? • Run the program with the check on entire test suite? • Infeasible.

  5. Our Tool: Ditto • Ditto speeds up data structure invariant checks • Usually asymptotically in size of data structure • Hash table: 10x speedup at 1600 elements • What invariant checks can Ditto handle? • Side-effect-free: cannot return fresh mutable objects • Recursive: not an inherent limitation of algorithm

  6. Basic Observation: Incrementalize • Invariant checks the entire data structure … • … but once checked, a local change can be (re)checked locally! • So, first establish invariant, then incrementally check changes … … … “Hash code of each cart in table corresponds to containing bucket.” …

  7. A New Domain • Existing incrementalizers: general purpose but not automatic [Acar PLDI 2006] • User must annotate the program • For functional programs • Other caveats (conversion to CPS, etc.) • Ditto is automatic in this domain • Functional invariant checks in an imperative Java setting • No user annotations • Allows arbitrary heap updates outside the invariant • A simple bytecode-to-bytecode implementation

  8. Ditto Algorithm Overview • First run of check: construct graph of the computation • Stores function calls, concrete inputs • Track changes to computation inputs • Subsequent runs of check: rerun only subcomputations with changed inputs • Incrementally update computation graph = incrementally compute invariant check

  9. Example Invariant Check • Ensures a tree is locally ordered booleanisOrdered(Tree t) { if (t == null) return true; if (t.left != null && t.left .value >= t.value) return false; if (t.right != null && t.right.value <= t.value) return false; return isOrdered(t.left) && isOrdered(t.right); }

  10. 1. Constructing a Computation Graph • Purpose of computation graph: • For unchanged parts of data structure, reuse existing results • For changed parts, identify parts of check that need to be rerun • Graph stores the initial check run: • Node = function invocation, along with its • Concrete formal arguments • Concrete heap accesses • Return value Same inputs = can reuse return val Changed inputs = must rerun Inputs

  11. 1. Constructing a Computation Graph • During first check run, by instrumentation The Heap Node created with concrete formal arg A P isOrdered(P) Returns true A isOrdered(A) Calls children B C isOrdered(B) isOrdered(C) Heap reads from a.value, a.left, a.right, a.left.value, a.right.value are remembered

  12. 2. Detecting Changed Inputs • Inputs to check that could change between runs: • Arguments – easy to detect (passed to the check) • Heap values – harder (could be modified anywhere in code) • Selective write barriers • Statically determine which fields are read in the check • Barriers collect changed heap inputs used by check • In example: add write barriers for all writes into fields: • Tree.left • Tree.right • Tree.value if (t == null) return true; if (t.left != null && t.left.value >= t.value) return false; if (t.right != null && t.right.value <= t.value) return false; return isOrdered(t.left) && isOrdered(t.right);

  13. 3. Rerunning the Invariant isOrdered() • Data structure modification: Add node N, remove node F … … A A B C N C … … … B … … … D D E F … … E F G … … … … G … …

  14. 3. Rerunning the Invariant true isOrdered(A) … … A A B C N C • Goal: Incrementally update computation graph • Graph must look as if check was run afresh … … … B … … … D D E F … E F G Write barriers say… … … … … G … … Tree With New Modifications Computation Graph From Last Run

  15. 3. Rerunning the Invariant true … … A A C N C N • isOrdered(A) is first node that needs to be rerun • Parent inputs haven’t changed (functions are side-effect-free) • Rerunning exposes new node N • What happens at isOrdered(B)? … … B B … … … … D D E F E F … … … G G … … … …

  16. 3. Rerunning the Invariant true … … A A C N C N • isOrdered(B) has same formal args, heap inputs • We’d like to reuse its previous result • And end this subcomputation • Problem: isOrdered(B) also depends on return values of its callees • Which might change, since isOrdered(D) will be rerun • So we can’t be sure isOrdered(B)’s result will be the same! … … B B … … … … D D E F E F … … … G G … … … …

  17. Optimistic Memoization • Don’t want to rerun all nodes between B and D • Solution: we optimistically assume that isOrdered(B) will return the same result • Invariant checks generally do! (e.g. “success”) • Check assumption when we rerun isOrdered(D) • For now, reuse previous result, finish up A • A returns previous result (true), so finished here … A N C … B … … D E F … … G … …

  18. 3. Rerunning the Invariant • Now we rerun isOrdered(D) • Reuse previous result of isOrdered(E), (G) • No further changes so no need for optimism • isOrdered(F) pruned from graph • isOrdered(D) returns previous result (true) • So optimistic assumption was correct • Computations around isOrdered(A) all correct … A N C … B … … D E F … … G … …

  19. What If isOrdered(D) Returned false? false • Result propagated up graph • Continues as long as return val differs • In this case, root node of graph is reached • Result for entire computation is changed • Automatically corrects optimistic assumptions false A false N false B false false false D E … … G … …

  20. Result of Algorithm true … … A A C N C N • We’ve incrementally updated computation graph to reflect updated data structure • Even with circular dependencies throughout graph, only reran 3 nodes • Result of computation is result of root node (true) • Graph is ready for next incremental update … … B B … … … … D D E F E … … … G G … … … …

  21. Evaluation • Ran on a number of common data structure invariants, two real-world examples • Most complex invariant: red-black trees • Tree is globally ordered • Same # of black nodes to leaf • Other RB properties (Black follows Red, etc.) • We were unable to incrementalize this check by hand!

  22. Kernel Results

  23. Real-world Examples • Tetris-like game Netcols • Invariant: no “floating” jewels in grid • With check, main event loop ran at 80ms, noticeably laggy • Result: event loop to 15ms with Ditto • JavaScript obfuscator • Invariant: no excluded keywords (based on a set of criteria) in renaming map

  24. Summary • Results: • Automatic incrementalization made practical • For checks in Java programs • Data structure checks viable for development environment • Made possible by • Selection of an interesting domain • Optimistic memoization • Web: http://www.cs.berkeley.edu/~aj/cs/ditto/

More Related