1 / 33

Modeling The Heap Optimization and Software Engineering

Modeling The Heap Optimization and Software Engineering. Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org. Motivation. Many optimization and software engineering applications utilize heap information Optimization Parallelization Memory management

sissy
Download Presentation

Modeling The Heap Optimization and Software Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling The HeapOptimization and Software Engineering Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org

  2. Motivation • Many optimization and software engineering applications utilize heap information • Optimization • Parallelization • Memory management • Software Engineering and Debugging • Interactive debugging of heap data structures • Tainted or Information Flow analysis through heap objects • Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis)

  3. Objective • General purpose model capable of supporting these applications: • Must model the range of fundamental properties needed by our target application domains • Cannot place significant restrictions on the program being analyzed • Must be computationally efficient and compact • Willing to sacrifice some precision • Our focus is on providing general classes of information for others to build on

  4. Desired Information (Structural) • Connectivity • Reachability • Interference • Paths • Logical data structures (Regions) • Group related sections of the heap • Keep unrelated sections of the heap separate • Shape of a region • Cycle, Dag, Tree, List, Singleton

  5. Desired Information (Intrinsic) • Identity • Given an object at point p, track the flow of this object at all later program points q • Heap Based Use-Mod • Find all program points a given memory location may be read/written at • Escape • Objects that are freshly allocated • Objects that escape the local call context

  6. Model Extraction (Static) • The theory of Abstract Interpretation provides framework for static program analysis • Takes a lattice (set) of abstract models, each of which represents a set of concrete program states • Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point

  7. Model Extraction (Dynamic) • A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: • Debugging • Specification mining/checking • Given a snapshot of the single current program heap compute the corresponding abstract model

  8. Current Status • Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io) • Precisely model (in static and dynamic analyses) the properties of interest • Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date) • Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps)

  9. Model Overview • Base on storage shape graph • Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers • Has natural representation for many of the properties we are interested in • Easy to visualize • Efficient to compute with • Annotate nodes and edges with additional instrumentation properties

  10. Logical Structure Identification • Key issue in shape graph is how to pick nodes that abstract concrete objects • Too many nodes is confusing and computationally expensive • Too few nodes leads to imprecision (as a single node must represent multiple logical structures) • Often done via allocation site or types • Solution: nodes are related sets of objects • Recursive type information (recursive vs. non-recursive types) • Objects stored in the same collection, array or structure

  11. Concrete Expression Heap

  12. Abstract Expression Heap

  13. Layout • Most general way objects in a region are connected • (S)ingleton: no pointers between any objects • (L)ist: may contain a linear List or simpler structures • (T)ree: may contain a Tree or simpler structures • (D)ag: may contain a Dag or simpler structures • (C)ycle: may a cyclic or simpler structures • E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures.

  14. Layout Example

  15. Sharing • Edges abstract sets of references (variable references or pointers) • Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties • E.g. given an array of objects does any object appear multiple times? • May occur between references abstracted by same edge or two different edges • Interference: abstracted by same edge • Connectivity: abstracted by different edges

  16. Interference • Does a singleedge abstract only references with disjoint targets or may some of these references alias/related? • Edge e is: • non-interfering: all pairs of references r1, r2 in γ(e) must be unrelated (refer to disjoint data structures). • interfering: may be a pair ofreferences r1, r2 in γ(e) that are related (refer to the same data structure).

  17. Interference Example

  18. Connectivity • Connectivity: Do twoedges abstract sets of references with disjoint targets or do some of these references alias/related? • Edges e1, e2 are: • disjoint: all pairs of references r1 in γ(e1), r2 in γ(e2) are unrelated (refer to disjoint data structures). • connected: may be pair of references r1 in γ(e1), r2 in γ(e2) that are related (refer to the same data structure).

  19. Sharing Example

  20. Intrinsic Properties • Object Identity • Across each method call track how data structures are split, merged, reconnected • Field Sensitive Use/Mod • For each method track the fields for the objects in each region (node) and if the field is used/modified in the method • At each line track which regions (nodes) and fields may be used modified • Object Allocation • Track which objects are allocated in this scope and which may escape

  21. Identity and Read/Write 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }

  22. Case Study and Evaluation

  23. Case Study: BH (Barnes-Hut) • N-Body simulation in 3-dimensions • Uses Fast Multi-Pole method with space decomposition tree • For nearby bodies use naive n2 algorithm • For distant bodies compute center of mass of many bodies and treat as single point mass • Updates space decomposition tree to account for body motion • Has not been analyzed with other existing (precise) heap analysis methods

  24. Regions and Basic Structure

  25. BH Optimizations Memory • Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction

  26. Body Force Calculation Loop Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root);

  27. BH Optimizations TLP • TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine

  28. Static Analysis Statistics

  29. Dynamic Analysis Statistics

  30. Summary • Have the core of a practical analysis system • Performance: • Analyze moderate size non-trivial Java programs • 15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method) • Debugging abstraction efficiently compresses large heaps to compact abstract representation • Accuracy: • Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information • Qualitatively Useful • Used results in multiple optimization domains and in debugging applications

  31. Current and Future Work • Currently working on transforming core concepts from prototype to robust tools • Implementing static analysis for MSIL bytecode + core libraries • Implementing full featured debugger support and specification mining (for both MSIL and Java) • Enrich the model • Wider range of properties (what is useful in general) • Allow user to easily extend with new properties • Apply information in more client applications • Additional optimization domains • Support for programmer assisted refactorings

  32. Interpreter Benchmark • Simple interpreter and debug environment for large subset of Java language • 14,000+ Loc (in normalized form), 90 Classes • Additional 1500 Loc for specialized standard library handling stubs • Large recursive call structures, large inheritance trees with numerous virtual method implementations • Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures

More Related