1 / 23

IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant. Interprocedural Analysis with Large Libraries.

Download Presentation

IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant

  2. Interprocedural Analysis with Large Libraries • All programs are built with reusable components • Standard libraries in C++, Java, C# • Domain-specific libraries • Whole-program analysis: complete client program C, together with all libraries it uses • Solutions for all program points in C and in the libraries • Summary-based analysis: pre-analyze the library and record reusable library summary information • Solutions for all program points in C • Goal: reduce the cost without losing any precision • e.g., the solutions inside C should be the same • This may be low-hanging fruit

  3. Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs

  4. Interproc. Distributive Environment Problems • Defined by Sagiv, Reps, and Horwitz [TheorCompSci96] • Subsumes the interprocedural finite distributive subset (IFDS) problems from their [POPL95] work • Versions of constant propagation, slicing, alias analysis, side-effect analysis, reaching definitions, liveness, etc. • An environment is a map e : D  L; e  Env(D,L) • D is a set of symbols, L is a meet semi-lattice • Environment meet: (e1 e2)(d) = e1(d)  e2(d) • Environment transformer t : Env(D,L)  Env(D,L) • Distributive: e.g. t(e1 e2) = t(e1)  t(e2)

  5. Dependence Analysis and Type Analysis for Java • Dependencies: for a local variable v at CFG node n, which formal parameters of n’s method influence v? • Restricted form of dep. analysis; useful for SDG building • D = { v1, …, vk }: locals vi • L = powerset of { f1, …, fm }: formals fj; meet is  • Transformer for v1:=f2: t(e) = e[v1 {f2}] • Transformer for v1:=v2+v3: t(e) = e[v1 e(v2)e(v3)] • Call v1:=meth(v2): composition of v2-to-formal, valid same-level paths in meth, return-to-v1 • 0-CFA type analysis: D = { v1, …, vk, fld1, …, fldm }: locals and fields; L = powerset of set of types

  6. Representation of Environment Transformers • Key issue for any summary-based analysis: how do we represent and manipulate dataflow functions? • For IDE: composition/meet of environment transformers • Sagiv et al.: a transformer can be represented by a bipartite directed graph with 2(|D|+1) nodes • Edges labeled with functions L  L

  7. Composition of Transformers • Graph reachability + composition of edge labels

  8. Precise Whole-Program Analysis • Graph reachability along valid interprocedural paths • Phase 1: summary function fn for each CFG node n • Represents the solution at n as a function of the solution at the entry of the procedure containing n • Computed through composition and meet of transformers • Summary function at proc exit used at call sites to proc • Partial functions fn: only for the subset of the domain that is relevant to callers of n’s procedure • Phase 2: Top-down propagation of actual environments (e.g., dependence sets, type sets) • Adapt to library summary generation?

  9. Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs

  10. Phase 1: Intraprocedural Summary Generation • Produce a set of summary functions yn,m • n is the entry or a call site • m is the exit or a call site • there exists a call-free path from n to m • Similar to the summary functions fnfrom the whole-program analysis, but • complete functions instead of partial functions • all possible compositions and meets of transformers (as graph operations), until a fixed point is reached • After this, some elements of D are filtered away • e.g., for dependence analysis: locals that are not actuals of calls and not written the return values from calls

  11. Example entry  cs1 rs2  exit

  12. Phase 2: Interprocedural Summary Generation summary for toString, at cs2 rs1  exit

  13. Phase 2: Interprocedural Summary Generation • Fixed call site: has exactly one possible target • Cannot be a site that calls back client methods • Check type hierarchy for possible overriding in clients • Cannot have multiple target methods • Static calls; constructor calls; final classes/methods • Intraprocedural 0-CFA type analysis: in the summary function, the only edge reaching xshould be L x • Fixed method: has only fixed calls (or no calls), and this also holds for all methods reachable from it • Bottom-up traversal of the SCC-DAG of fixed methods; composition and filtering • In non-fixed methods: instantiate fixed calls to fixed methods; composition and filtering

  14. Example: Final Summary for format entry  cs1 rs1  exit

  15. Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs

  16. Summary Generation • Libraries: 10238 classes, 77190 methods • 0-CFA type analysis + dependence analysis [w/ Soot] • Both data and control dependencies • Simple optimizations: def-use chains, sparse graphs • Cost: 90 minutes time, 1.2GB memory • Includes all Soot-related costs and all I/O • Final summary on disk: 18MB • Measurements: number of edges in the graph representation of transformers • [1]: before any composition or meet • [2]: after intraprocedural composition and meet • [3]: after [2] and intraprocedural filtering: remove elements that are irrelevant for callers and callees

  17. Intraprocedural Propagation dependence analysis: reduction in # edges from [2] to [3]: 53% type analysis: reduction in # edges from [2] to [3]: 55%

  18. Interprocedural Propagation for Dep. Analysis • Fixed methods: 25490 (33%); eliminate 7195 (9%) of them because their only callers are in the library • Summary functions for fixed methods • Instantiate at fixed calls within non-fixed methods: eliminates 21% of all library call sites • Additional intraprocedural propagation and filtering reduction in # edges from [3] to [4]: 32%

  19. Summary-Based Analysis of Clients • Reduction in start-to-end time: IR building, type analysis + call graph, dependence analysis

  20. Only Dependence Analysis • Reduction in analysis time: actual analysis and a hypothetical best case with no library dependencies

  21. Overview of Results • Start-to-end cost: IR, type analysis, dep. analysis • Average time reduction 51% • Average memory reduction 33% • Only dependence analysis • Average time reduction 69% • Average memory reduction 90% • Very close to a conservative upper bound • Conclusions • Summary generation has reasonable cost • Summary size is small (# edges and total disk size) • Significant savings for analysis running time and memory usage, compared to whole-program analysis

  22. Future Work • This is a very preliminary study • Promising initial results, but just the tip of the iceberg • More IDE analyses, with different characteristics • e.g. points-to analysis, side-effect analysis, constant propagation, typestate properties, etc. • Beyond IDE analyses • e.g. recent [POPL08] paper by Yorsh et al. • Better handling of callbacks and polymorphic calls • e.g. take advantage of behavioral subtyping • Reusable API for storing and retrieving summary information – generality for many different analyses • Open-source API implementation based on Soot

  23. Questions?

More Related