Four-A — Component Adaptation and Assurance

Four-A — Component Adaptation and Assurance DARPA ITSPI Meeting22 Feb 00Aspen, CO Bill Scherlis Institute for Software Research School of Computer ScienceCMU scherlis@cs.cmu.edu412-268-8741 With: John Tang Boyland (UWM), Aaron Greenhouse, Edwin Chan

Technical objectives Code-level assurance In development and adaptation Application to specific assurance properties Code safety and threading. Frameworks. Existing practice Adaptation: JDK evolution Security: CERT data Premises and scope Technical approach Semantics-based manipulation Structural. Threads. Etc. Annotation and analysis Uniqueness. Effects. Etc. Tool-based studies Java source-level manipulation Accomplishments & plans Schedule Expected accomplishments Transition Tool Infrastructure This Presentation

Four-A Technical Objectives 1. Improve source-level software assurance • Systematically improve code safety, tolerance, etc., using source-level analysis, annotation, transformation. • Improve the extent of formal assurance using analyses, annotation, transformation. • Provide scalable and composable approaches for a variety of code-safety properties, based on annotations. 2. Provide ongoing assurance thru evolution • Avoid re-verification of code safety, tolerance, and other properties as software components and systems evolve. • Support programmer through adaptation by formally analyzing and carrying out changes, preserving and enhancing assurance where possible.

A Simple Motivating Example: Thread Safety Annotation. Manipulation. Analysis • Thread safety and security • CERT vulnerability data • Exploitation scenario: incremental thread capture • Locks and code evolution • Technical elements

Two Documented Vulnerabilities (CERT) Name: ibm/mknod • Keywords: IBM, AIX, setuid, root access, race condition • Description: Some (if not all) versions of AIX have a setuid /usr/sbin/mknod so that ordinary users may create name pipes. This is done with a mknod(2) systemcall followed by a chown(2) system call, this opens for a race condition if the user renames the names pipe, and links it to another file before the chown(2) call. So ordinary users may “steal” other users files, and thereby gain unauthorized root access. • Impact: local user gains root access Name: noclobber timing window • Keywords: noclobber; timing window; race condition • Description: There is a race condition with respect to the shell variable noclobber in some implementations of csh/tcsh. Noclobber is supposed to prevent files from being overwritten if they exist.If the file doesn’t exist, some implementations of csh determine that fact with a stat() call. If stat returns ENOENT then the shell proceeds to write on the file. However, the file could be created between then stat and write calls, thus defeating the purposes of the noclobber variable. • Impact: files are overwritten.

An Aside: The CERT Vulnerability Taxonomy(~ 1200 vulnerabilities) • Assumptions wrong or changed • Design errors • Errors in requirements specifications • Implementation errors • Basic programming practices • Improper use of a well understood algorithm • Privileged programs • Timing windows • Trusts something not designed to support trust • Trusts untrustworthy information • Other problems • User interface

Evolving MultiThreaded CodeWork in progress – Aaron Greenhouse Why • Improve code safety and robustness • Improve performance and flexibility How • Annotations • Locks associated with regions (encapsulated sets of fields) • Assignment of locks to (final) fields or instance variables • Lock ordering • Manipulations • Shrink lock • Split/merge locks • Analyses • (multiple) • Tool support

EventQueue • EventQueue • Sends an event to listeners on dequeue. • Priority levels. • Initial code state • Free of race conditions • All methods are declared to be synchronized. • [NB. Deadlocks are still possible.] • Evolution goal • Performance • Synchronization is too coarse • Remove unneeded synchronization. • Introduce multiple locks. • Appropriate simultaneous access Code fragments below illustrate the systematic refinement process. [ Work in progressby Aaron Greenhouse ]

class EventQueue { public region Listeners; public region Normal; public region Priority; private final unshared List listeners in Listeners { Instance in Instance }; private final unshared List normal in Normal { Instance in Instance }; private final unshared List high in Priority { Instance in Instance }; private int numNormal in Normal; private int numHigh in Priority; lock this protects Instance; public EventQueue() reads nothing writes nothing { /* ... */ } // Continued

public synchronized void addEQListener( final EQListener l ) reads nothing writes Listeners { listeners.add( l ); } private synchronized void fireEQEvent( final Object o ) reads nothing writes All { final EQEvent evt = new EQEvent( this, o ); final List copy = (List)((ArrayList)listeners).clone(); for( int i = 0; i < copy.size(); i++ ) { final EQListener l = (EQListener)copy.get( i ); l.dequeued( evt ); } } public synchronized int getSize() reads Normal, Priority writes nothing { return numNormal + numHigh; } private synchronized void dispatchEvent() reads nothing writes All { final Object o = dequeue(); fireEQEvent( o ); } . . . } // End of class

Shrink synchronized Blocks Step 1: Shrink synchronized blocks. • Convert synchronized methods to methods with synchronized bodies (trivial). • Use effects analysisexclude statements not affecting region associated with lock. • The signature of methods are not changed. • Call sites are not affected. • Other implementations of the method are not affected.

class EventQueue { //... private void fireEQEvent( final Object o ) reads nothing writes All { //... List copy; synchronized( this ) { copy = (List)((ArrayList)listeners).clone(); } //... } //... private Object dequeue() reads nothing writes Normal, Priority { Object o = null; while( o == null ) { if( (o = tryGetPriority()) == null ) { o = tryGetNormal(); } } return o; }

Split the lock Step 2: Split the lock used by EventQueue. • In general, replace a lock L on a region R with locks Li on subregions Ri. • Replace uses of L with uses of appropriate Li. • Use effects analysis to determine affected Ri. • May need to use multiple locks. • Avoid deadlock by enforcing lock ordering • Changes how fields must be accessed • Affects: ancestors and descendent classes. • Why do this: • Improve concurrency • E.g., Agenda queue— potential simultaneous actions • “Edit” separate queue elements (tasks) • Reorder spine

class EventQueue { public region Listeners; public region Normal; public region Priority; private final unshared List listeners in Listeners { Instance in Instance }; private final unshared List normal in Normal { Instance in Instance }; private final unshared List high in Priority { Instance in Instance }; private int numNormal inNormal; private int numHigh inPriority; lock listeners protects Listeners; lock normal protects Normal; lock high protects Priority; sync high before normal; public EventQueue() reads nothing writes nothing { /* ... */ } public void addEQListener( final EQListener l ) reads nothing writes Listeners { synchronized( listeners ) { listeners.add( l ); } } // Continued

private void fireEQEvent( final Object o ) reads nothing writes All { final EQEvent evt = new EQEvent( this, o ); List copy; synchronized( listeners ) { copy = (List)((ArrayList)listeners).clone(); } for( int i = 0; i < copy.size(); i++ ) { final EQListener l = (EQListener)copy.get( i ); l.dequeued( evt ); } } public int getSize() reads Normal, Priority writes nothing { synchronized( high ) { synchronized( normal ) { return numNormal + numHigh; } } } private Object tryGetPriority() reads nothing writes Priority { Object o = null; synchronized( high ) { if( numHigh > 0 ) { o = high.remove( 0 ); numHigh -= 1; } } return o; }

Case study summary • The code improvements are routine, but risky • Motivated for good reasons … • Each entails many small changes … • Any change, improperly executed, can create new vulnerabilities • Much can be done with annotation and manipulation • Enabling ongoing assurance with tool support • For threading • Manipulations: Shrink lock, Split/Merge locks, etc. • Annotations: • Locks and regions, Lock order, Lock variables, Effects, etc. • Analyses: Effects, etc. • Issue: What portion of this activity is “tool feasible”? • Interactive tool (manipulation, analysis, annotation) • Programmer guidance

Four-A Hypotheses • In evolving Java systems, semantics-based annotation and analysis techniques can provide a component-based approach to the assurance of a useful range of safety and tolerance properties. • Many code-safety properties can be composable on a basis of added specifications for “mechanical” properties • Thread-safety and race conditions • Array bounds, exceptions, extended type safety, null references, etc. • Annotations and analysis provide a mechanism • Effects. Unique references. Uses limitations. • Regions for effects, locks. • Cf. Extended Static Checking (ESC) • The safety risks of complex restructuring tasks can be reduced through the use of systematic manipulations • Administrative structural changes • Boundary movement. Hierarchy restructuring. • Representation change. • Performance improvements • Lock skrink/split. Inlining. • Robustness improvements • Method harmonization

Four-A Hypotheses • Manipulations can improve software with respect to safety, tolerance, and robustness properties • Examples • Introduce redundancies • Insert/remove audits, checks, logging • Insert techniques for graceful degradation • The annotation, manipulation, and analysis techniques can be supported in Java-based tools • 99% Java • Basis for experimentation and evaluation • Usable and adoptable • These techniques can be combined to better support the iterative development of intrusion tolerant systems

[ Preliminary JDK Census results ]

Technical objectives Code-level assurance In development and adaptation Application to specific assurance properties Code safety and threading. Frameworks. Existing practice Adaptation: JDC evolution Security: CERT data Premises Technical approach Semantics-based manipulation Structural. Threads. Etc. Annotation and analysis Uniqueness. Effects. Etc. Tool-based studies Java source-level manipulation Accomplishments & plans Schedule Expected accomplishments Transition Tool Infrastructure This Presentation

Four-A Premises • Work from code level thru design toward spec • Why: Code as ground truth. Snapshot problem. • Why: Legacy code. Exploit and improve partial specs. • Why: Manage detail design. • Use partial information about components in a system • Why: Trade secret (COTS). Security. Distributed development. • Cf. whole-program analysis • Rely on encapsulation, type safety, composable props • Java, (modified) beans, etc. • Why: Scalability. Partial information. Manipulation soundness. • Focus on administrative change in routine SWE • Why: Appropriate roles for programmers and tools. Adoptability. • Why:Tune for performance, security, robustness

Semantics-based program manipulation Source-code and design level Structural manipulations Run-time manipulations Meta-manipulations Analysis and models OO effects, mutability, uniqueness, aliasing, uses, . . . Annotation and specification Mechanical properties Tools for assured adaptation of Java components Information loss and chain of evidence Use of audit data Four-A Technologies(Adaptation, Analysis, Annotation, Accounting)

Systematic Software Adaptation Routine software structural evolution • Examples: • API change • Data representation change • Class hierarchy restructuring • Signature change • Introduce self-adaptation • Mobility • Encapsulation • Split into phases / stages • Cloning to produce specialized variants • Merging of related functions • Replication for robustness • Threading changes Provide tool support for these operations • With predictable impact on functional and mechanical program properties

Assured Software Change Structural change in practice • Costly • Changes can be distributed throughout a system. • Complex analysis (program understanding) is required. • Risky • Invariants and specifications are not present. • Many code elements may need to be changed. • Code elements may be inaccessible for analysis or change. • Avoided • Why are we stuck with bad structural design decisions? • Decisions are made early • Consequences are understood late • They often start wrong and stay wrong • Why do we tolerate brittleness? • Code rot = persistence of abstractions beyond their time. • Why do commercial APIs accrete? • Why does ad hoc code persist? • Why is it so costly to navigate structural trade-offs? • Revise interface and component structure • Trade-off generality and performance

Assured Software Change Structural change in practice • Costly • Risky • Avoided • Necessary • Structural change enables functional change • Localize/encapsulate related software elements • Sustain compatibility with evolving APIs • Address performance issues • Structural change enables code management • Code rot = persistence of abstractions beyond their time. • Create views to support programming aspects • Cf. AOP. SOP. N-Dim. • Navigate structural trade-offs during design/evolution • Support iterative software processes

Move field f from class C to class A. Checks C is descendent of A. If A is interface, f must be public static final. Shadowing: A and B have no use of ancestral f. Unshadowing: No f field in B (capture C’s f uses). D (and other sibs) have no uses of f. Initializer code can be reordered, by field type. Reordering is acceptable for interleaved constructor and field code. Actions Adjust access tags Handle special cases Caveats Visibility in D and other sibs Visibility in C’s subs Promises introduced Changes in binary compatibility A f D B foo bar f C Example: Move Field Programmers can do this using drag-and-drop.

Rename methods m from oldName to newName. Checks Methods called at a callsite for oldName() or newName() are the unchanged Bindings Callsites used to dispatch to unchanged methods in override group Name conflict Callsites now dispatch to methods in a previously existing override group Uses checks and annotations to assure binary compatibility Actions Rename methods Rename proved callsites Name checks/maps for dynamic sites/classes Caveats Deletion from override and olverload groupsfor A.oldName() Addition to override and overload groups for newName() Promises introduced Changes in binary compatibility (modulo uses annotations) A C D B m m oldName() oldName() oldName() oldName() Example: Rename Method Programmers could do this with a simple gesture newName()

Manipulations • Manipulations enable systematic structural change • Trade-off generality and performance • Sacrifice (or introduce) abstractions • Reorganize component boundaries • Introduce or adjust run-time (later stage) manipulations • Managed self-adaptivity • Manipulations are idiomatic program evolution steps • Precise expression of “patterns of evolution” or “refactorings” • Enable rapid/dynamic structural change (fluid programming) • Enable model-based programming (analytic views) • Tool role • Programmer: Design intent, exploration of structural options • Tool: Mechanical details, soundness, design record

= = = = = . . . Manipulation Techniques(Examples, 1) • Boundary movement (ISAW’98) • Code relocation (expression, statement, method, class) • Abstract/unfold (method, variable, class) • Clone (class, method, etc.) • Frequency change • Pass separation • Tabulation/closure • Data representation change (ESOP’98) • Shift • Idempotency, Projection • Destructive operations • Hierarchy restructuring • Hoist • Insert • Split/clone

Manipulation Techniques(Examples, 2) • Staging, specialization, splitting • (Partial evaluation) • Merging and generalization • Pass separation • Thread management • Shrink, Split, Merge • Insert, Remove • Self-adaptation • Meta-manipulation • Polyvariance and domain-tolerance • Integrity • Replication • Redundant checks

Specifications for mechanical properties • Manipulationsrequireanalyses • Example • Manipulation: Reorder code • Analyses: Effects, aliasing (may-equal and uniqueness), uses. • At scale: • Development is distributed/collaborative. • Functional specifications (and source code) may be lacking. • Programs are dynamically linked, mobile, etc. • Analyses for manipulation • Composable: Whole-program analysis are infeasible • Goal-directed: Compiler analyses are “opportunistic” • Analysesrequiremechanical assertions • Annotations (promises) about components and their elements

Properties specified by assertions • Mechanical properties specified (examples) • Read/write effectsin OO systems • Enable reordering • Use aliasing and uniqueness information (ECOOP’99) • Region designation • Unique references • Tolerate temporary loss of uniqueness (borrowed) • Structure declarations • Precise control over uses • Mutability • Promises as a currency of flexibility (ICSE’98) • Promises change less frequently than code • Tools identify potential promises • Programmer chooses which to offer clients • Programmer can request specific promises • Tool manages dependency and validation information

Manipulation example Goal: Move statement C; A; B; C; C; A; B; 1. Compute sets of effects For each of: A; B; C; 2. Test for interference among computations: For  A; , C; and B; , C; Analyses 1. What are the effects for a given computation? 2. Do two (or more) given targets overlap? Effects Analysis for Manipulation

Key Ideas: OO Effects(ECOOP’99) • Source-level analysis of partial programs • Do not want, and may not have, the whole program • Use annotations on methods as surrogates for components • Use of regions and aliases to analyze OO programs • Encapsulate state of objects in regions to protect programmer abstractions • Use aliasing information (may-equal and unique) to improve results • Programmer-guided source-level manipulation • Goal-directed analysis (vs. compile-time opportunistic analysis)

Code safety: Why Unique Variables? • Sole access to an object entails certain privileges: • Mutations can be performed without regard to rest of program (no other read access) • Invariants can be maintained without regard to rest of program (no other write access) • Program invariants are ideally • Explicit (code readability) • Checked (code maintainability)

Uniqueness examples • String buffer character array • If unique: • Can be coerced to immutable when final string is desired. • Vector internal array • If unique: • Mutations of separate vectors can be reordered. • Hashtable internal array • If unique: • One can enforce hashing invariants, • And can rehash without interference.

Features Global name spaces Entitles (fluid.ir.IRNode) Attributes (fluid.ir.SlotInfo) Types Versioning Several policies Possible at cell level Configurations Dependencies Notification Tracking Conventional wrappers Attribute patterns: navigable ordered trees, etc. Collaboration support Persistence Fine-grained concurrency policy Surrogacy Attributenamespace Entitynamespace Cell Information ManagementThe Internal Representation (IR)

The version forest E.g.,400,000 nodes10,000 versions Initial version Each transition represents a manipulation Latest release A growing tip in the tree (abandoned) Latest snapshot Shared manipulations Experimental Configuration

[ demo ]

Technical objectives Code-level assurance In development and adaptation Application to specific assurance properties Code safety and threading. Frameworks. Existing practice Adaptation: JDK evolution Security: CERT data Premises and scope Technical approach Semantics-based manipulation Structural. Threads. Etc. Annotation and analysis Uniqueness. Effects. Etc. Tool-based studies Java source-level manipulation Accomplishments & plans Schedule Expected accomplishments Transition Tool Infrastructure This Presentation

Four-A Schedule • Year 1 • Tool infrastructure • 99% Java, analysis, annotation, adaptation, accounting • Analysis algorithms (uniqueness, effects, mayEqual, etc.) • Demonstrate preservation of assurance properties thru change • Manipulations for threading • Case studies for thread safety and pattern • Year 2 • Class-level structural manipulations • Management of uses information • Exploitation of aliasing annotations to assure code safety props • Threading annotations and analyses • Design record to support assurance information • Year 3 • Manipulation library for improvement of code safety • Prevent  Detect  Tolerate • Large-scale manipulation through analytic views • Tool-based case study based on intrusion scenarios

Recent Accomplishments • Four-A tool prototype • Supports non-local manipulations • Annotations and analyses • Unique. MayEqual. • Support for evolving multi-threaded code safely • Manipulations (preliminary form) • Annotations • Software engineering baseline • Evolution census: JDK changes: source code, logs

Transition • Build on mainstream commercial technologies • Java, beans, etc. • Build on existing infrastructure • Tool (developed by our team) for Java analysis, manipulation, engineering process, design information management. • Platform (UI, IM, VM, syntax) is also usable for other languages. • Usability/adoptability a priority from the outset • Enable experimentation/studies without high adoption cost • E.g., gesture-based interface where possible • Conduct engineering baseline analyses • What are the code-level vulnerabilities being exploited? • What kinds of changes are routinely made in commercial APIs? • What is the impact of those changes on code safety?

Four-A — Component Adaptation and Assurance