Four-A — Component Adaptation and Assurance

Four-A — Component Adaptation and Assurance PI Meeting Honolulu July 2000 William L. Scherlis Institute for Software Research CMU School of Computer Science scherlis@cs.cmu.edu412-268-8741 With: John Tang Boyland (U Wisc), Aaron Greenhouse, Edwin Chan

Technical Objectives 1. Improve source-level software assurance • Systematically improve code safety, tolerance, etc., using source-level analysis, annotation, transformation. • Improve the extent of formal assurance using analyses, annotation, transformation. • Provide composable approaches for a variety of code-safety properties, based on annotations. 2. Provide ongoing assurance thru evolution • Avoid re-verification of code safety, tolerance, and other properties as software components and systems evolve. • Support programmer through adaptation by formally analyzing and carrying out changes, preserving and enhancing assurance where possible.

Four-A and Information Assurance Code-SafetyFour-A Robustness SecurityEncapsulation . • Code Safety • Array: avoid bounds and types exceptions • Types: avoid cast exceptions • References: avoid nullpointer exceptions • Concurrency: avoid races, deadlocks, stolen locks, etc. • Exceptions: avoid non-declared RuntimeException instances (excluding VM) • Data representations: avoid violations of data integrity • Robustness • Binary compatibility: Java’s own promises: compiler vs. load • Redundancy, integrity: insert redundant integrity checks across components • Encapsulation • References and access: protect referenced objects • Unique: protect referenced objects Prevention Detection Tolerance Java

Threats/attacks addressed Exploitations of code safety errors via client interfaces E.g., APIs, subclassing Historically a major threat domain (CERT data) "Misuse" of systems calls E.g., C buffer overflow Examples: Induced exceptions Data integrity violations Concurrency failures Liberal policy vs. race condition Deadlock inducement, lock stealing, etc. Testing difficulties Assumptions Code-level focus Type safety Limited/absent functional specs Component-orientation with diverse access requirements Policies enforced Data integrity and confidentiality policies Concurrency policies Consistency with mechanical specifications Enforcement approach Code manipulation, analysis, annotation/specification Management of assurance through code change Threats, Assumptions, Policy

Footnote: CERT vulnerability taxonomy • Assumptions wrong or changed • Design errors • Implementation errors • Basic programming practices • Improper use of a well understood algorithm • Timing windows • Privileged programs • Trusts something not designed to support trust • Trusts untrustworthy information • Errors in requirements specifications • Other problems • User interface

Technical objectives Code-level assurance In development: improve In evolution: sustain Link with design Threats, assumptions, policy Research hypotheses Project approach Scope Engineering practice Baseline Adaptation: JDK evolution Security: CERT vulnerability data Four-A case studies & scenarios Technical approach Semantics-based manipulation Program analysis Annotation and scalability Tool-based studies Accomplishments & plans Schedule Expected accomplishments Transition & management Techniques, technologies, tools Approach Outline

Combining helpers into a host class makes the host class more complex but also potentially more efficient, due to short-circuited method calls and the like. Performing such simplifications along the way, we can define a more concise, slightly more efficient, and surely more frightening version of BoundedBuffer. – Lea,Concurrent Programming in Java, 2nd ed. A recurring textbook remark. When you are handed an interface, the first thing you’ll see is a set of operations that specify a service of a class or a component. Look a little deeper and you’ll see the full significance of these operations, along with any of their special properties, such as visibility, scope, and concurrency semantics. These properties are important, but for complex interfaces they aren’t enough to help you understand the semantics of the service they represent, much less know how to use these operations properly. In the absence of any other information, you’d have to dive into some abstraction that realizes the interface to figure out what each operations does and how these operations are meant to work together. However, that defeats the purpose of an interface, which is to provide a clear separation of concerns in a system. – Booch, Rumbaugh, Jacobson, The Unified Modeling Language User Guide. See example in Sun’s production library code. Complexity and Encapsulation in Practice

Four-A Hypotheses • The four A’s. Manipulations, analyses, annotations, and detailed design-record management can be used to improve safety, tolerance, and robustness • Add: safety checks, redundancies, audit, graceful degredation • Scalability. Many code safety properties can be made composable using annotations. • Examples: exceptions, arrays, concurency management, types • Evolution. Re-verification of many critical safety and dependability properties can be avoided. • Systematization. Safety risks of restructuring can be reduced using systematic techniques. • Administrative structural changes • Performance improvements • Robustness improvements

Four-A Project Approach • Assurance at code-level • Code safety • Design consistency • Robustness improvement • Techniques • Semantics-based program manipulation • E.g., performance optimization, relocate abstraction boundaries, refactor, stage, increase concurrency, conform with changing APIs • Advanced program analysis • Program annotations and specifications • Effects, unique refs, uses, mutability, etc. • Fine-grained audit-trail, design-record, and linking • Ultra-fine versioning, design/code linking, code diff, etc. • Tools • Language-independent core • IR, CFG and analysis framework, Usability. • Java-specific capability (“99% pure”) • Data • Understand change logs and code-level vulnerabilities from practice

Four-A Project Approach, continued • Work from code level thru design toward spec • Why: Code as ground truth. Snapshot problem. • Why: Legacy code. Exploit and improve partial specs. • Why: Manage detail design. • Use partial information about components in a system • Why: Trade secret (COTS). Security. Distributed development. • Cf. whole-program analysis • Rely on encapsulation, type safety, composable props • Java, (modified) beans, etc. • Encapsulation benefits both programmers and intruders. • Why: Scalability. Partial information. Manipulation soundness. • Focus on administrative change in routine SWE • Why: Appropriate roles for programmers and tools. Adoptability. • Why:Tune for performance, security, robustness

Scenarios and Baseline • Commercial baseline • CERT vulnerability database • Many vulnerabilities/exploitations enabled by failures in code-level implementation: code safety, concurrency, etc. • Four-A’s JDK census • Preliminary results: • Most changes explicitly require client-side semantic analysis • Estimate 34% to be potentially tool-feasible • Only about 20% are potentially transparent to client • Many changes affect code safety • Very limited client-side API evolution support • E.g., Sun’s sed script • Four-A scenarios • “Busy-guy / Bad-guy” • Based in production code • Examples • Denial-of-service: lock stealing • Information leakage: unique • Unlocked door: Method extract

public class Point { private int x; private int y; public Point( final int x, final int y ) { this.x = x; this.y = y; } public int getX() { return x; } public int getY() { return y; } public void set( int x, int y ) { this.x = x; this.y = y; } public String toString() { return "(" + x + ", " + y + ")"; } } public class Point { private final Object mutex = new Object(); private int x; private int y; public Point( final int x, final int y ) { this.x = x; this.y = y; } public int getX() { synchronized( mutex ) { return x; } } public int getY() { synchronized( mutex ) { return y; } } public void set( int x, int y ) { synchronized( mutex ) { this.x = x; this.y = y; } } public String toString() { synchronized( mutex ) { return "(" + x + ", " + y + ")"; } } } A Simple Concurrency Example Protect lock stealing.  Avoid races This version of the class is safe for single threading, but does not support concurrent access. This version of the class is safe for concurrency, but (in this rendering), has lost extensibility.

Semantics-based program manipulation Source-code and design level Structural manipulations Run-time manipulations Meta-manipulations Analysis and models OO effects, mutability, uniqueness, aliasing, uses, . . . Annotation and specification Mechanical properties Tools for assured adaptation of Java components Information loss and chain of evidence Use of audit data Four-A Technologies(Adaptation, Analysis, Annotation, Accounting)

1. Systematic Software Adaptation Routine software structural evolution • Examples: • API change • Data representation change • Class hierarchy restructuring • Signature change • Introduce self-adaptation • Mobility • Encapsulation • Split into phases / stages • Cloning to produce specialized variants • Merging of related functions • Replication for robustness • Threading changes Provide tool support for these operations • With predictable impact on functional and mechanical program properties

Assured Software Change Structural change in practice • Costly • Changes can be distributed throughout a system. • Complex analysis (program understanding) is required. • Risky • Invariants and specifications are not present. • Avoided • Why do we tolerate brittleness? • Code rot = persistence of abstractions beyond their time. • Why do commercial APIs accrete? • Necessary • Structural change enables functional change • Localize/encapsulate related software elements • Structural change enables code management • Cf. Aspect-Oriented Prog. Subject-Oriented Prog. N-Dim. • Navigate structural trade-offs during design/evolution • Support iterative software processes

Move field f from class C to class A. Checks C is descendent of A. If A is interface, f must be public static final. Shadowing: A and B have no use of ancestral f. Unshadowing: No f field in B (capture C’s f uses). “Simple” Example: Move Field Programmers can do this using drag-and-drop. A f B foo bar f C

Move field f from class C to class A. Checks C is descendent of A. If A is interface, f must be public static final. Shadowing: A and B have no use of ancestral f. Unshadowing: No f field in B (capture C’s f uses). D (and other sibs) have no uses of f. Initializer code can be reordered, by field type. Reordering is acceptable for interleaved constructor and field code. Actions Adjust access tags Handle special cases Caveats Visibility in D and other sibs Visibility in C’s subs Promises introduced Changes in binary compatibility A f D B foo bar f C “Simple” Example: Move Field Programmers can do this using drag-and-drop.

Structural Manipulation Techniques • Boundary movement (ISAW’98) • Frequency change • Data representation change (ESOP’98) • Hierarchy restructuring • Staging, specialization, splitting • Thread management • Self-adaptation • Integrity

Evolving MultiThreaded Code Why • Improve code safety and robustness • Improve performance and flexibility How • Annotations • Locks associated with regions (generalization of fields) • Assignment of locks to (final) fields or instance variable • Lock ordering • Manipulations • Shrink lock • Split lock, merge locks • Move boundaries • Reorder locks • Etc. • Analyses • Tool support • The Generative Approach (new)

Four Concurrency Policies for EventQueue A B C D // maximizing getSize() public class EventQueue { // . . . public int getSize() { synchronized( gsLock ) { synchronized( dQLock ) { int s1, s2; synchronized( high ) { s1 = numHigh; } synchronized( normal ) { s2 = numNormal; } return s1 + s2; } } } public void enqueuePriority( Object e ) { synchronized( gsLock ) { synchronized( dQLock ) { synchronized( high ) { high.add( e ); numHigh += 1; } } } } private Object dequeuePriority(){ Object e = null; synchronized( high ) { if( numHigh != 0 ) { e = high.remove( 0 ); numHigh -= 1; } return e; } } public void dispatchEvent() { Object e = null; synchronized( fifoLock ) { synchronized( dQLock ) { e = dequeue(); } if( e != null ) fireEQEvent( e ); } } } // minimizing getSize() public class EventQueue { // . . . public int getSize() { synchronized( gsLock ) { synchronized( dQLock ) { int s1, s2; synchronized( high ) { s1 = numHigh; } synchronized( normal ) { s2 = numNormal; } return s1 + s2; } } } public void enqueuePriority( Object e ) { synchronized( dQLock ) { synchronized( high ) { high.add( e ); numHigh += 1; } } } private EQEvent dequeuePriority(){ Object e = null; synchronized( gsLock ) { synchronized( high ) { if( numHigh != 0 ) { e = high.remove( 0 ); numHigh -= 1; } return e; } } } public void dispatchEvent() { Object e = null; synchronized( fifoLock ) { synchronized( dQLock ) { e = dequeue(); } if( e != null ) fireEQEvent( e ); } } } // exact getSize() public class EventQueue { // . . . public int getSize() { synchronized( gsLock ) { synchronized( dQLock ) { int s1, s2; synchronized( high ) { s1 = numHigh; } synchronized( normal ) { s2 = numNormal; } return s1 + s2; } } } public void enqueuePriority( Object e ) { synchronized( gsLock ) { synchronized( dQLock ) { synchronized( high ) { high.add( e ); numHigh += 1; } } } } private EQEvent dequeuePriority(){ Object e = null; synchronized( gsLock ) { synchronized( high ) { if( numHigh != 0 ) { e = high.remove( 0 ); numHigh -= 1; } return e; } } } public void dispatchEvent() { Object e = null; synchronized( fifoLock ) { synchronized( dQLock ) { e = dequeue(); } if( e != null ) fireEQEvent( e ); } } } // getSize() w/no guarantees public class EventQueue { // . . . public int getSize() { synchronized( dQLock ) { int s1, s2; synchronized( high ) { s1 = numHigh; } synchronized( normal ) { s2 = numNormal; } return s1 + s2; } } public void enqueuePriority( Object e ) { synchronized( dQLock ) { synchronized( high ) { high.add( e ); numHigh += 1; } } } private EQEvent dequeuePriority(){ Object e = null; synchronized( high ) { if( numHigh != 0 ) { e = high.remove( 0 ); numHigh -= 1; } return e; } } public void dispatchEvent() { Object e = null; synchronized( fifoLock ) { synchronized( dQLock ) { e = dequeue(); } if( e != null ) fireEQEvent( e ); } } }

Four Concurrency Policies for EventQueue // maximizing getSize() public class EventQueue { // . . . public int getSize() { int s1, s2; s1 = numHigh; s2 = numNormal; return s1 + s2; } public void enqueuePriority( Object e ) { high.add( e ); numHigh += 1; } private Object dequeuePriority(){ Object e = null; if( numHigh != 0 ) { e = high.remove( 0 ); numHigh -= 1; } return e; } public void dispatchEvent() { Object e = null; e = dequeue(); if( e != null ) fireEQEvent( e ); } } VersionPolicy A B C D getSize is exact getSize gives upper bound getSize gives lower bound No guarantees about getSize Exercise for the reader: Which arrows above are correct??

Analyses Typing Binding Effects Unique Reaching defs Predicate Capability Concurrency Annotations Effects (read/write, regions) Exceptions Conditions (pre/post, exception) Locks (ordering, regions, used, context) Uses Method attributes (idem, pure) Pointer capability (anonymous, clonable, castable (down), mutable, borrowed, unique/excluded) 2. Analyses and 3. Annotations

Specifications for mechanical properties • Manipulationsrequireanalyses • Example • Manipulation: Reorder code • Analyses: Effects, aliasing (may-equal and uniqueness), uses. • Cf. compiler analyses, software engineering analyses. • Analyses are goal-directed • Avoid whole-program analysis • Limited access. Scale. • At scale: • Development: • Distributed/collaborative. • Functional specifications lacking. • Programs: • OO, dynamically linked. • Potentially: adaptive, mobile • Analysesrequiremechanical specifications • Promises about components and their elements (ICSE’98)

Key Ideas: OO Effects(ECOOP’99) • Source-level analysis of partial programs • Do not want, and may not have, the whole program • Use annotations on methods as surrogates for components • Use of regions and aliases to analyze OO programs • Encapsulate state of objects in regions to protect programmer abstractions • Use aliasing information to improve results: • Identify potential aliases – may-alias information • Control creation of aliases – unique information. • Programmer-guided source-level manipulation • Goal-directed analysis (vs. compile-time opportunistic analysis)

The value of uniqueness Sole access to an object entails certain privileges: Mutations can be performed without regard to rest of program (no other read access) Invariants can be maintained without regard to rest of program (no other write access) Program invariants are ideally Explicit (code readability) Checked (code maintainability) Uniqueness examples Input stream of a lexer If unique: Lexer can buffer input without interference. String buffer character array If unique: Can be coerced to immutable when final string is desired. Vector’s array If unique: Mutations of separate vectors can be reordered. Hashtable’s array If unique: One can enforce hashing invariants, And can rehash without interference. Code safety and Unique Variables

The Fluid IR Ternary structure Versioning Simultaneity Immutability Typing Sequences Encapsulated composites Persistence Sharing windows Coordination Team and concurrency policy Notification Dependency management Truth maintenance Language support Parse Unparse and format Compositional CFG 50 lines to define def/use Pattern matching Manipulation support UI framework Stateful views Templates Indexed attributes Java support Analyses Annotations Manipulations 4. Four-A Tool Infrastructure

Four-A Schedule • Year 1 • Tool infrastructure (99% Java, A, A, A, A) • Analysis algorithms (uniqueness, effects, mayEqual, etc.) • Demonstrate preservation of assurance properties thru change • Manipulations for threading • Case studies for thread safety and pattern • Year 2 • Class-level structural manipulations • Management of uses information • Exploitation of aliasing annotations to assure code safety props • Threading annotations and analyses • Design record to support assurance information • Year 3 • Manipulation library for improvement of code safety • Prevent  Detect  Tolerate • Manipulation through analytic views • Tool-based case study based on intrusion scenarios

Accomplishments(recent) • Four-A tool prototype • Supports non-local manipulations • Model-view infrastructure • Annotations and analyses • Unique. MayEqual. • Threading • Manipulations (preliminary form) • Annotations • Policy model and generative approach • Software engineering baseline • Evolution census: JDK changes: source code, logs

Transition • Build on mainstream commercial technologies • Java, beans, etc. • Build on existing infrastructure • Tool (developed by our team) for Java analysis, manipulation, engineering process, design information management. • Platform (UI, IM, VM, syntax) is also usable for other languages. • Usability/adoptability a priority from the outset • Enable experimentation/studies without high adoption cost • E.g., gesture-based interface where possible • Conduct engineering baseline analyses • What are the code-leve vulnerabilities being exploited? • What kinds of changes are routinely made in commercial APIs? • What is the impact of those changes on code safety?

Four-A — Component Adaptation and Assurance