1 / 44

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages. John Whaley Monica S. Lam Computer Systems Laboratory Stanford University September 18, 2002. Background. Andersen’s points-to analysis for C (1994) Flow-insensitive, context-insensitive

yadid
Download Presentation

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. LamComputer Systems LaboratoryStanford UniversitySeptember 18, 2002

  2. Background • Andersen’s points-to analysis for C (1994) • Flow-insensitive, context-insensitive • Inclusion-based, more accurate thanunification-based Steensgaard • O(n3), considered too slow to be practical • CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01) • Online caching/cycle elimination • Field-independent: 1.3M lines of code in 137s SAS 2002

  3. Doing it for Java • We want Andersen-level pointers for Java • Naïve port of CLA algorithm: • Spec “compress” benchmark: 2+ hours! • Call graph accuracy: same as RTA (terrible) • Our paper: how to do CLA for Java • Spec “compress” benchmark: 5 seconds! • JEdit (1371 classes): ~10 minutes! • Call graph accuracy: very good SAS 2002

  4. Java vs. C: Virtual calls • Java has many virtual calls • Accuracy of analysis strongly affects number of call targets • More call targets leads to more code being analyzed and longer analysis times SAS 2002

  5. Java vs. C: Treatment of Fields • Field-independent: in o.f, use only o • Most C pointer analyses • Sound even for non-type-safe languages • Field-based: in o.f, use only f • Very inaccurate, requires type safety • Field-sensitive: in o.f, use both o, f • Strictly more accurate than field-independent or field-based • Essential for Java SAS 2002

  6. Java vs. C: Local variables • Local variables/stack locations are reused • Flow insensitivity causes many false aliases • Local flow sensitivity is necessary SAS 2002

  7. Our Contribution • Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLA • Field sensitivity • Tracks separate fields of separate objects • Uses “method summary graphs” • Sparse representation, uses local flow sensitivity • Optimizations • Caching across iterations, reducing redundant ops • Supports all features of Java SAS 2002

  8. Algorithm Overview Intraprocedural:Generate a sparse, flow-insensitive summary graph for each method • Based on access paths, uses local flow sensitivity Interprocedural:Using summary graphs, build inclusion graph to obtain whole-program result SAS 2002

  9. Method Summaries • Sparse, flow-insensitive summary of the semantics of each method • Stores (writes) in method • Calls made by method and their parameters • Return values, thrown and caught exceptions • Use a flow-sensitive technique to generate method summaries • Precisely model updates to stack and locals SAS 2002

  10. Method Summary: Example Code for method foo: Summary for method foo: static void foo(C x, C y) {C t = x.f;t.g = y;x.g = x;t.bar(y); } f g x x.f y g bar(t,y); read edge write edge parameter map edge SAS 2002

  11. Node types A node represents an object at run time. • Concrete type nodes • Objects that have a known concrete type • new statements and constant objects • Abstract nodes • Parameters, return values, dereferences • Interprocedural phase maps an abstract node to set of concrete nodes it can represent SAS 2002

  12. Edge types • Read edge: • Created by load statements • Represent dereferences (access paths) of known locations • Write edge: • Created by store statements • Represent references created by the method f f SAS 2002

  13. Outgoing parameter map • Records which nodes are passed as which parameters • This is used in the interprocedural phase to match call sites to call targets f g x x.f y g t.bar(y); SAS 2002

  14. Generating method summary • Worklist data flow solver (flow-sensitive) • Strong updates on locals, weak on others • Detect and close cycles in access paths • More detail in the paper SAS 2002

  15. Review: Andersen’s Points-to • Points-to is encoded as inclusion relations x = y implies x  y x  y is also written as: x  y SAS 2002

  16. x  newy newy.f  e x  newy e  newy.f e1  e2 e1  e2, e2  e3 e1  e3 Review: Andersen’s Points-to Rule name: If code contains: Apply rule: Store x.f = e; Load e = x.f; Copy e1 = e2; Transitive closure SAS 2002

  17. Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y SAS 2002

  18. Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E SAS 2002

  19. x  newy e  newy.f Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Load e = x.f; SAS 2002

  20. x  newy e  newy.f Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Load e = x.f; SAS 2002

  21. x  newy newy.f  e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002

  22. x  newy newy.f  e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002

  23. x  newy newy.f  e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002

  24. x  newy newy.f  e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002

  25. Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E SAS 2002

  26. Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E SAS 2002

  27. Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E Bar:this Bar:p1 SAS 2002

  28. Overall Picture “Abstract” world F E “Concrete” world C D SAS 2002

  29. Graph-based Andersen • Computing full transitive closure is prohibitively expensive • Store the graph in pre-transitive form, and calculate reachable nodes on demand SAS 2002

  30. Algorithm foreach write edge e1→ e2 do foreach n in getConcreteNodes(e1) add write edge n.f → e2 foreach read edge e1→ e2 do foreach n in getConcreteNodes(e1) add inclusion edge e2 n.f foreach method call e1.f() foreach n in getConcreteNodes(e1) add parameter mappings for target method SAS 2002

  31. Caching reachability queries • getConcreteNodes(e): transitive closure query on the inclusion graph • The same queries are repeated many times • Store the result in a hash table • Cached result may be stale due to edges added since the last query • Iterate until convergence SAS 2002

  32. Online cycle detection • Inclusion graph includes cycles • The algorithm collapses cycles as they are traversed • During traversal, keeps track of current path • If a node on current path is revisited, collapse all nodes in cycle • Each node has a “skip” pointer, which is set when collapsed and followed on all accesses SAS 2002

  33. Reusing caches • Concrete node cache values don’t change much between algorithm iterations • Reallocation and rebuilding them is expensive • Reuse caches from old iterations • Keep track of an iteration ‘version’ number for each cache entry SAS 2002

  34. Minimizing set union operations • Many caches don’t change across iterations • Avoid set union operations for caches that haven’t changed since the last iteration • Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry • If input set hasn’t changed, set union operation is redundant SAS 2002

  35. Experimental Results • Concrete type inference • Static call graph • Implemented in ~800 lines of Java • Freely available at: http://joeq.sourceforge.net SAS 2002

  36. Programs • SpecJVM • Standard benchmark suite • J2EE – Java 2 Enterprise Edition v1.3 • Massive (1+ million lines) business framework • joeq • Compiler infrastructure, 75K lines • Cloudscape • Database shipped with J2EE, no source code • JEdit • Full-featured editor, 100K lines SAS 2002

  37. Experimental Results • We analyzed the reachable code for each application • Results include code in class library • Analysis was very effective in reducing total program size • Pentium 4 2GHz 2GB RAM, Redhat 7.2 • Sun JDK 1.3.1_01 with 512MB heap SAS 2002

  38. Analysis Precision vs. RTA SAS 2002

  39. Analysis time: Small benchmarks SAS 2002

  40. Analysis time: Large benchmarks SAS 2002

  41. Analysis time (speedup) SAS 2002

  42. Analysis time (bytecodes/second) SAS 2002

  43. Related Work • Original CLA paper • Heintze and Tardieu (PLDI 2001) • Anderson’s analysis for Java • Rountev, Milanova, Ryder (OOPSLA 2001) • Liang, Pennings, Harrold (PASTE 2001) • Many others… • Concrete type inference • CHA, RTA • Flow and context sensitivity, 0-CFA SAS 2002

  44. Conclusion • Improved precision • Field sensitivity • Local flow sensitivity • Improved efficiency • Reuse reachability cache across iterations • Minimize set-union operations • Scales to the largest Java programs • A new baseline for Java pointers • No reason to use a less precise analysis SAS 2002

More Related