1 / 58

Efficient Checking of Component Specifications in Java Systems

This paper discusses the use of CHET, a tool for automatically checking component specifications in Java systems, to improve the reliability, security, and robustness of programming. The tool uses a component specification language and flow analysis techniques to find instances of component usage and check their validity. By creating a model program per instance and using model checking techniques, CHET efficiently checks specifications for each instance. The tool is practical for everyday programming, as it focuses on tracking component usage through control and data flow, making it simpler and less prone to errors.

lineberry
Download Presentation

Efficient Checking of Component Specifications in Java Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Checking of Component Specifications in Java Systems Steven P. Reiss Brown University CHET

  2. Our Goal • To Improve Programming • More reliable • More secure • More robust • More understandable • Easier • To Deal With Real Systems • Not yesterday’s • Some today’s • Worrying about tomorrow’s CHET

  3. Model Checking • Is the next great thing for programmers • Will find all our bugs automatically • Will fix all our problems • But with minor exceptions it is not used • Not on an everyday basis • Not for everyday programs • Not by most programmers • What is needed here • Must be “automatic” -- no effort required • Must be fast -- “compilation speed” • Must be helpful -- accurate, precise CHET

  4. The Problem of Components • Java programs are built on class libraries • Standard java libraries • Open source libraries • Libraries created for an application • Creators know how they should be used • Each has its own pattern of usage • Typically fail if not used correctly • Make sure they are used correctly • Throughout the program • Each instance • Statically • With “real” Java programs CHET

  5. The Solution Create a Component Specification Language Find Instances of Component Usage Check Each Instance for Validity CHET

  6. Specification Language • Define how components should be used • In a way that matches their use • Once for all potential instances • So that it can be done by programmers • And the specification can be understood • Solution • Use finite automata • Over parameterized program events • Matches call sequences, variable usage, etc. CHET

  7. Specification Instances • Components are used multiple times • List, Iterator, XmlWriter, … • Need to handle each use separately • Uses must be found automatically • As specific as possible (statically) • Solution • Using flow analysis over the class files • Trigger events define instances • Other events used in particular instances CHET

  8. Checking Specifications • Each instance must be checked • Independently • To ensure the specification is met • Solution • Create a simple model program per instance • Check if model program meets specification • Using model checking techniques • Do all this efficiently CHET

  9. Keeping this Practical • Most components are used through calls • Control flow determines call sequences • Data flow determines which calls • Most component usage is single threaded • Can often ignore thread interactions • This is simpler than the general problem • Need to track fewer variables • Need to worry less about variable values • Need to worry less about interweaving CHET

  10. CHET Overview Specifications Application Flow Analysis Instances Abstract Program Builder Program Checker Report CHET

  11. Iterator Usage CHET

  12. Comodification Checking CHET

  13. Xml Writer Usage CHET

  14. File Open-Close CHET

  15. Catching Errors CHET

  16. Web Crawler Library CHET

  17. Nested Locks CHET

  18. Events & Parameters • CALL (caller this, argi, calling this) • RETURN (this, return value) • ENTRY (this, argi) • FIELD (this) [set to int, null, nonnull] • ALLOC (new object) • CATCH (catch object) • THROW (throw object) • LOCK (lock object) • UNLOCK (lock object) CHET

  19. Why Event-Based Specification • ESP and others use code patterns • These are closer to programs • And hence easier to understand • However they are hard to generalize • Iterator can use nextElement or next • Nested opens and alternatives • Xml writer alternatives • Events and automata generalize • Easy to define abstract patterns • Still understandable by programmers CHET

  20. Finding All Instances • Done using flow analysis • Of the program and its libraries • Handling all specifications at once • Each trigger event yields a source • We determine where this source can flow • This determines which events are relevant • To this particular instance • But its not that easy • Multiple-parameter events • Accurate flow and type analysis required CHET

  21. Vector<A> v = … Iterator it = v.iterator(); while (it.hasNext()) { A x = it.next(); … } … for (it = v.iterator(); it.hasNext(); ) { A y = it.next(); … } Trigger Source Trigger Source Example CHET

  22. Flow Analysis • Identify sources • From trigger events • Tracking sources and where they flow • Through symbolic execution • Result: Determine at each location • What sources are used • This lets us check event parameters • Trigger source used on call => event CHET

  23. Flow Analysis Goals • Complete analysis • Ensure we track all possible uses a source • Must include libraries as well as user code • Accurate analysis • Must know types for virtual calls • Must understand full Java semantics • Must handle all methods (including native, etc.) CHET

  24. Flow Analysis Techniques • Done at the byte code level • Tracking types and values • Through symbolic execution • Full Interprocedural flow analysis • Using a work queue approach • Of user code and libraries • Handling all the complexities of Java • Selectively context sensitive • Flow sensitive, not path sensitive • Tradeoff accuracy and speed • Accuracy where important, speed otherwise CHET

  25. Flow Analysis Issues • Speed versus accuracy • Start with the minimum possible • Add more information to get needed accuracy • What to track • Trigger sources; all other sources • Java Issues that arose • Static initializers • Constructors • Native methods • Reflection • Callbacks • Data structures • Exceptions CHET

  26. What to Track: Sources • Local Sources • Anything generated via a new operator • Track values stored in fields of the source • Array Sources • Created by new array operators • Track values stored in the array • Fixed Sources • Results from native methods, built-in values • Can be mutable (changed on a cast) CHET

  27. Sources • Model Sources • Generated by trigger events • One-to-one association with instances • Field Sources • Track the values of fields • Only for fields used in specifications • Determine where the fields are used • Others • Privacy, … CHET

  28. Values • Flow analysis deals with values • These are sets of sources • Associated with each field, local, stack, … • Value contains additional information • Data type (for type analysis) • CanBe or MustBe NULL flags • Integer value range (or indefinite) • Operations applied symbolically CHET

  29. Static Initializers • Problem • Called implicitly at first use • Must return before class can be used • Accurate field analysis requires this • But it can call methods of the class • Some classes initialized by JVM • Solution • Track whether initializer has been started • Add some system classes by default • Don’t process methods before started CHET

  30. Constructors • Problem • Most methods assume constructor done • Accurate field analysis requires this • But constructors can be quite complex • Solution • Track current set of constructors we are in • Only process method if • We have constructed an object of this class OR • We are called from within the constructor CHET

  31. Native & Reflexive Methods • Problem • These are hidden from static analysis • Solution 1: Default handling • Use a fixed source of return type • Use mutable sources where appropriate • Solution 2: Internal Special handling • arraycopy : copy array values CHET

  32. Native & Reflexive Methods • Solution 3: Resource-based return • User specifies return type in resource file • Can be specified as mutable • On a function basis • On a call-site basis • Solution 4: Method substitution • Resource file can specify alternative method • Thread.start => Thread.run • AccessController.doPrivileged => run CHET

  33. Native and Reflexive Methods • Solution 5: Ignore • Resource file can specify calls to ignore • Most calls to swing, awt, … are black boxes • Can be done by method, class or package • With exceptions • Solution 6: User Substitution • User can provide alternative dummy method • Use it as the replacement method • Complex uses of reflection CHET

  34. Callbacks • Problem • Some callbacks are hidden in native code • Callbacks need to have proper arguments • For accurate analysis • Lots of user code is through callbacks • Solution • Note callbacks in resource file • Associate callback method with registration • Provide calling sequence as well • Simulate callbacks with proper arguments • Automatically during analysis CHET

  35. Data Structures • Problem • Maps, collections are hard to analyze • Expensive and inaccurate to look at code • Solution • Introduce prototype sources • With procedural models of methods • Simulate what the methods do in the source • Don’t use the method code per se • Extend to iterators, etc. based on prototypes CHET

  36. Prototype Map • Tracks the contents of the map • Can track selective key-value pairs • Tracks empty, non-empty, either • Handles all the map operations • Updating internal contents • Returning appropriate values • Returns prototype iterators • That are aware of prototype contents CHET

  37. Prototypes • Provide more accurate analysis • Know the type of items stored in table • Avoid merging of multiple tables • Know when tables are null and not • Provide more efficient analysis • Speed up of 30% • Are relatively easy to implement • Collections: < 900 lines of source • Maps: < 500 lines of source CHET

  38. Exceptions • Problem • Normal exceptions are easy to handle • What to do with hidden exceptions • catch (Throwable …) • Synchronized regions • Solution • Restrict analysis to explicit exceptions • Unless explicitly told not to CHET

  39. Finding One Instance • Trigger event => Model Source • This determines the basic instance • Where model source flows • Determines event locations • Based on event type • Based on event parameters CHET

  40. E1 is the trigger Provides a model source M E2 occurs whenever M flows to a call to Iterator.hasNext E3, E4, E5 similarly Example CHET

  41. Multi-Parameter Specifications • Find all possible instances (statically) • Start with model source for trigger • Find all locations for next NEW event • Based on flow of the model source • Build a new instance for the source pair • Continue to handle additional NEW sources • Note that we have to consider all sets • And not just complete sets CHET

  42. E1 is the trigger Model source M Writer constructor call With M as arg1 Yields new source M1 Instance <M,M1> If additional call With M1 as arg1 Build new instance Example CHET

  43. Where Are We • We have • Specified how components should be used • Using parameterized automata • Found all instances of each specification • Using detailed flow analysis • Next we need to • Check each instance • By creating a model program • And looking at all its possible executions CHET

  44. Checking an Instance • Build an abstract program for each instance • Using flow-sensitive analysis • Abstract program organized into routines • Abstract program generates event sequences • Some nodes output events • Determine all event sequences that can be generated • Ensure that they are all valid wrt specification CHET

  45. Abstract Programs • Methods represented by automata • Each defined as a directed graphs • Nodes of the graph represent actions • Arcs represent nondeterministic traversal • Control flow embedded in nodes • Calls, asynchronous calls • Actions can do tests (on variables, returns) • Actions can dead-end • If is represented as two test nodes CHET

  46. Sample Program CHET

  47. Sample Conditional CHET

  48. Abstract Program Actions • Enter a routine • Exit a routine • Call a routine • Generate a particular event • Set a variable to a given value • Correspond to program variables • Set the return value of the routine • Test a variable or return value for a value • Exit (call to System.exit) • Asynchronous call of a routine • Begin synchronized region • End synchronized region • (Wait, Notify) events CHET

  49. Abstract Program Variables • Which variables are used in the program • Can be given as part of the specification • Otherwise determined automatically • Using a separate cursory flow analysis • Determine which fields directly affect event generation in the abstract program • Conditional using field branches around event • This is done before building the program CHET

  50. Simplifying Abstract Programs • Simplification essential for fast checking • Eliminate routines obviously not used • Through a quick transitive closure check • Then apply FSA minimization techniques • Throw away nodes with no effects • Combine nodes where possible • No effects • If no thread starts, then all thread operations • If no conditionals for a variable (return), no sets • Conditional without internal nodes • Enter-exit only for a routine • Call of empty routine CHET

More Related