To IR or not to IR?

To IR or not to IR? A discussion about storing program and analysis data in the IR vs. a Relational DB Fluid Meeting 11/10/03 Presenter: Elissa

Outline • The Issue at Hand • Pros and Cons of Relational DB • DB Schema so far and Example • Issues with DB Schema • What Shouldn’t be in the DB • Issues with Querying • Querying Chains of Evidence

Issue • The Fluid system needs flexibility in order to ensure scaling, to facilitate persistence, and to provide stand-alone tools. Also, The query engine needs to return results as fast as possible. • This discussion is about: • Should we store program data for querying in a relational DB? • If yes: • Which data should be stored? • How do we maintain consistency with the program in Eclipse? • Do we perform all possible analyses up front and store results or perform analysis as required? • Note: it may be useful to extract certain common information like structure, uses and defs, call chain, etc. up from and then for more in depth information, do analysis as needed) • Will all analysis results be stored in both the IR and the DB?

Pros of Relational DB • Structural data about code is easily stored and retrieved from a relational DB • Retrieval of query results is probably faster from constructed DB vs. from IR • Note: must test this on real system • DB can store “metrics” about program, for use by program team or for heuristics (see Querying CoE)

Cons of Relational DB • Requires MySQL running on computer or server somewhere • Must maintain consistency of program in Eclipse and DB • It should be possible to make incremental changes to DB, if plugged in to DoubleChecker • Overhead of populating DB in the first place • Information may be stored twice (in DB and IR)

First, Let’s Review… • Partial Java Query Model Package Name Key Edge has multiplicity greater than 1 Has-classes Part-of-package Class Edge name * * Name, Visibility extends (*), has-parent, has-child (*) Node Name Has-field Part-of-class Field Attributes Name, Visibility, Static, Final

Now, add regions… • Partial Base Java Model with Simplified Regions contains fields from Package Name Part-of-package Region Class * * Name, Visibility Name, Visibility extends (*), has-parent, has-child (*) Has-field Part-of-class Part-of-region Field Has-field Name, Visibility, Static, Final

Some Queries of “Model” Example • Public fields in fluid package • Fields where visibility == public and field part-of $class and $class part-of package “fluid” • Region that spans classes: • $region contains $field from $class1 and $field2 from $class2 ($field1 != $field 2, $class1 != $class2) • Region spans subclasses of MyClass • union(last query, $class1 has-parent “MyClass” and $class2 has-parent “MyClass”)

DB schema so far • See handout • Notation: • * means primary key • @ means foreign key • Constraints will be enforced outside the DB (like business logic) and are in an illustrative syntax (or English if I was lazy)

Operationalization of Schema • TODO

Issues with DB schema • Location (in code) and identity • Going to have to re-implement and/or re-run the Binder? • Examples: • Location in method a() of def of field foo • Distinguishing between more than one call to method b() from method a() and locating the call • Actual values • Actual parameters, variable values, assignment values • How to represent a call graph?

Other TODOs on DB Schema • Array creation • Actual method effects • Region parameterization • Throws/catches exceptions (params?) • Package • Concerns • Thread Coloring stuff • Imports? • Project/Workspace? • Local variables and values? • Drop-sea??

More Review… Query History Hierarchy Output strategy partially determined by query input (query type) and query history (“context” of query) Type of view to show query results. Includes extra visibility, ordering, styling information (overrides code viewer defaults)

More Review… Query History Hierarchy If we use a relational DB backend, the query tree will get translated into SQL queries

What Data Can’t/Shouldn’t Be in DB • Queries themselves • Could only store as text, so it doesn’t make sense • You don’t query queries, only “query” the history • Query history • IR version structure is perfect for recording query history (which is a tree that can be traversed forward and backward) • Query results • This is the output of the DB. We need it in the Fluid system so we can report them back to the GUI. • Query-based constraints • Need to participate in chains of evidence

Issues with Querying • Granularity in querying • Currently, schema operate on a method-level granularity • It is not yet clear what sub-method-level granularity queries would be useful • I doubt real users will want to find different types of expressions • Note: may want a more powerful query facility for internal use • Noted Exceptions: • Synchronized blocks • Find where decisions are made based on the value of variable foo

Issues with Querying (2) • Transformations • Would storing some analysis results in DB affect ability to perform xforms?

Querying Chains of Evidence • Next step: Where would the user receive the most benefit from annotating next? • Per assurance • Heuristics • Which classes have the most i’s • “Importance” of those i’s • Classes that are used the most often • “Logical” next step in an annotation task • How much of the system has been assured?

Querying Chains of Evidence (2) • Visualization/views will cover the following queries: • Where are annotations? Where are assumptions? • What vouched for this link? • Unsure if this is covered/useful: • What is related to the model? • What assurances are invalidated if this assumption is false?

To IR or not to IR?

To IR or not to IR?

Presentation Transcript