272 software engineering fall 2008 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
272: Software Engineering Fall 2008 PowerPoint Presentation
Download Presentation
272: Software Engineering Fall 2008

Loading in 2 Seconds...

play fullscreen
1 / 58

272: Software Engineering Fall 2008 - PowerPoint PPT Presentation


  • 203 Views
  • Uploaded on

272: Software Engineering Fall 2008. Instructor: Tevfik Bultan Lecture 15: Interface Extraction. Software Interfaces. Here are some basic questions about software interfaces How to specify software interfaces? How to check conformance to software interfaces?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '272: Software Engineering Fall 2008' - britannia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
272 software engineering fall 2008

272: Software Engineering Fall 2008

Instructor: Tevfik Bultan

Lecture 15: Interface Extraction

software interfaces
Software Interfaces
  • Here are some basic questions about software interfaces
    • How to specify software interfaces?
    • How to check conformance to software interfaces?
    • How to extract software interfaces from existing software?
    • How to compose software interfaces?
  • Today we will talk about some research that addresses these questions
software interfaces3
Software Interfaces
  • In this lecture we will talk about interface extraction for software components
  • Interface of a software component should answer the following question
    • What is the correct way to interact with this component?
    • Equivalently, what are the constraints imposed on other components that wish to interact with this component?
  • Interface descriptions in common programming languages are not very informative
    • Typically, an interface of a component would be a set of procedures with their names and with the argument and return types
software interfaces4
Software Interfaces
  • Let’s think about an object oriented programming language
    • You interact with an object by sending it a message (which means calling a method of that object)
    • What do you need to know to call a method?
      • The name of the method and the types of its arguments
  • What are the constraints on interacting with an object
    • You need a reference to the object
    • You have to have access (public, protected, private) to the method that you are calling
  • One may want to express other kinds of constraints on software interfaces
    • It is common to have constraints on the order a component’s methods can be called
      • For example: a call to the consume method is allowed only after a call to the produce method
    • How can we specify software interfaces that can express such constraints?
software interfaces5
Software Interfaces
  • Note that object oriented programming languages enforce one simple constraint about the order of method executions:
    • The constructor of the object must be executed before any other method can be executed
    • This rule is very static: it is true for every object of every class in every execution
  • We want to express restrictions on the order of the method executions
    • We want a flexible and general way of specifying such constraints
software interfaces6
Software Interfaces
  • First, I will talk about the following paper
    • ``Automatic Extraction of Object-Oriented Component Interfaces,'' J. Whaley, M. C. Martin and M. S. Lam Proceedings of the International Symposium on Software Testing and Analysis, July 2002.
    • The following slides are based on the above paper and the slides from Whaley’s webpage
automatic interface extraction
Automatic Interface Extraction
  • The basic idea is to extract the interface from the software automatically
    • Interface is not written as a separate specification
  • There is no possibility of inconsistency between the interface specification and the code since the interface specification is extracted from the code
  • The extracted interface can be used for dynamic or static analysis of the software
  • It can be helpful as a reverse engineering tool
what are software interfaces
What are Software Interfaces
  • In the scope of the work by Whaley et al. interfaces are constraints on the orderings of method calls
  • For example:
    • method m1 can be called only after a call to method m2
    • both methods m1 and m2 have to be called before method m3 is called
how to specify the orderings
How to Specify the Orderings
  • Use a Finite State Machine (FSM) to express ordering constraints
  • States correspond to methods
  • Transitions imply the ordering constraints

M1

M2

Method M2 can be called after method M1 is called

example file
Example: File
  • There are two special states Start and End indicating the start and end of execution

read

open

close

START

END

write

a simple oo component model
A Simple OO Component Model
  • Each object follows an FSM model.
  • One state per method, plus START & END states.
  • Method call causes a transition to a new state.

read

m1

m2

open

close

START

END

m1(); m2() is legal,new state is m2

write

the interface model
The Interface Model
  • Note that this is a very simple model
  • It only remembers what the last called method is
  • There is no differentiation between different invocations of the same method
  • This simple model reduces the number of possible states
  • Obviously all the orderings cannot be expressed this way
adding more precision
Adding more precision
  • With above model we cannot express constraints such as
    • Method m1 has to be called twice before method m2 can be called
  • We can add more precision by remembering the last k method calls
    • If we have n methods this will create nk states in the FSM
  • Whaley et al. suggest other ways of improving the precision without getting this exponential blow up
the interface model14
The Interface Model
  • If the only state information is the name of the last method called then what are the situations that this information is not precise enough?
  • Problem 1: Assume that there are two independent sequences of methods that can be interleaved arbitrarily
    • once we call a method from one of the sequences we will lose the information about the other sequence
  • Problem 2: Assume that there is a method which can be followed by all the other methods
    • then once we get to that method any following behavior is possible independent of the previous calls
problem 1
Problem 1
  • Consider the following scenario
    • An object has two fields, a and b
    • There are four methods set_a(), get_a(), set_b(), get_b()
    • Each field must be set before being read
  • We would like to have an interface specification that specifies the above constraints
    • Can we build an FSM that corresponds to these constraints?
problem 116

START

set_a

get_a

set_b

get_b

END

Problem 1
  • These kind of constraints create a problem because once we call the set_a method it is possible to go to any other method
    • FSM does not remember the history of the method calls
    • FSM only keeps track of the last method call
  • Solution: Use one FSM for each field and take their product

FSM below allows the following

sequence: start; set_a(); get_b();

set_a

set_b

set_a

get_b

get_a

get_b

splitting by fields

START

get_b

set_b

get_b

set_b

START

set_a

START

set_a

get_a

get_a

END

END

END

Splitting by fields
  • Separate the constraints about different fields into different, independent constraints:
    • Use multiple FSMs executing concurrently (or use a product FSM)

set_a

set_b

get_b

get_a

Imprecise

Adds more precision

product fsm

START

set_a,start

set_a,set_b

get_a,get_b

start,set_b

get_a,set_b

set_a,get_b

get_a,start

start,get_b

END

Product FSM

The product FSM does not

allow the following

sequence:

start; set_a(); get_b();

There is a transition

from each state to the

END state

product fsm19
Product FSM
  • Product FSM has more number of states than the FSM which just remembers the last call
  • Assume that there are n1 methods for field 1 and n2 methods for field 2
    • simple FSM n1 + n2 states
    • product FSM n1  n2 states
  • Note that the number states in the product FSM will be exponential in the number of fields
problem 2
Problem 2
  • It is common to have methods which are used to query the state of an object
  • These methods do not change the state of the object
  • After such state-preserving methods all other methods can be called
    • Calling a state preserving method does not change the state of the object
    • If a method can be called before a call to a state preserving method, then it can be called after the call to the state preserving method
    • Since only information we keep in the FSM is the last method call, if there exists an object state where a method can be called, then that method can also be called after a call to a state-preserving method
problem 221

create

START

Problem 2
  • getFileDescriptor is state-preserving
  • Once getFileDescriptor is called then any behavior becomes possible
  • The FSM for Socket allows the sequence:

start; getFileDescriptor(); connect();

  • Solution
    • distinguish betweenstate-modifying and state-preserving methods
    • Calls to state-preserving methods do not change the state of the FSM

FSM for Socket

start

START

START

connect

connect

getFileDescriptor

getFileDescriptor

close

END

state preserving methods

create

START

State-preserving methods

start

START

Calls to state-preserving

methods do not change the

state of the FSM

getFileDescriptor

connect

m1

m2

m1 is state-modifying

m2 is state-preserving

m1(); m2() is legal,new state is m1

close

END

summary of model
Summary of Model
  • Product of FSMs
    • Per-thread, per-instance
  • One submodel per field
    • Use static analysis to find the methods that either read the value of the field or modify the value of the field.
      • Identifies the methods that belong to a submodel
        • The methods that read and write to a field will be in the FSM for that field
      • Separates state-modifying and state-preserving methods.
  • One submodel per Java interface
    • Implementation not required
static model extraction
Static Model Extraction
  • Static model extraction relies on defensive programming style
  • Programmers generally put checks in the code that will throw exceptions in case the methods are not used in the correct order
  • Such checks implicitly encode the software interface
  • The static extraction algorithm infers the method orderings from these checks that come from defensive programming
static model extractor

START

connect

read

Static Model Extractor
  • Defensive programming
    • Implementation throws exceptions (user or system defined) on illegal input.

public void connect() {

connection = new Socket();

}public void read() { if (connection == null) throw new IOException();}

connection

connection

extracting interface statically
Extracting Interface Statically
  • The static algorithm has two main steps
    • For each method m identify those fields and predicates that guard whether exceptions can be thrown
    • Find the methods m’ that set those fields to values that can cause the exception
      • This means that immediate transitions from m’ to m are illegal
      • Complement of the illegal transitions forms the model of transitions accepted by the static analysis
detecting illegal transitions
Detecting Illegal Transitions
  • Only support simple predicates
    • Comparisons with constants, null pointer checks
  • The goal is to find method pairs <source, target> such that:
    • Source method executes:
      • field = const ;
    • Target method executes:
      • if (field == const)throw exception;
algorithm
Algorithm
  • How to find the target method: Control dependence
    • Find the following predicates: A predicate such that throwing an exception is control dependent on that predicate
      • This can be done by computing the control dependence information for each method
    • For each exception check if the predicate guarding its execution (i.e., the predicate that it is control dependent on) is
      • a single comparison between a field of the current object and a constant value
      • the field is not written in the current method before it is tested
    • Such fields are marked as state variables
algorithm30
Algorithm
  • The second step looks for methods which assign constant values to state variables
  • How to find the source method: Constant propagation
    • Does a method set a field to a constant value always at the exit?
  • If we find such a method and see that
    • that constant value satisfies the predicate that guards an exception in an other method
      • then this means that we found an illegal transition
sidenote control dependence
Sidenote: Control Dependence
  • A statement S in the program is control dependent on a predicate P (an expression that evaluates to true or false) if the evaluation of that predicate at runtime may decide if S will be executed or not
  • For example, in the following program segment

if (x > y) max:=x; else max:=y;

the statements max:=x; and max:=y; are control dependent on the predicate (x > y)

  • A common compiler analysis technique is to construct a control dependence graph
    • In a control dependence graph there is an edge from a node n1 to another node n2 if n2 is control dependent on n1
sidenote constant propagation
Sidenote: Constant Propagation
  • Constant propagation is a well-known static analysis technique
  • Constant propagation statically determines the expressions in the program which always evaluate to a constant value
  • Example

y:=0; if (x > y) then x:=5; else x:=5+y; z := x*x;

The assigned value to z is the constant 25 and we can determine this statically (at compile time)

  • Constant propagation is used in compilers to optimize the generated code.
    • Constant folding: If an expression is known to have a constant value, it can be replaced with the constant value at compile time preventing the computation of the expression at runtime.
static extraction
Static Extraction
  • Static analysis of the java.util.AbstractList.ListItr with lastRet field as the state variable
  • The analysis identifies the following transitions illegal:
    • start set
    • startremove
    • removeset, addset
    • removeremove
    • addremove
  • The interface FSM contains all the remaining transitions
automatic documentation
Automatic documentation
  • Interface generated for java.util.AbstractList.ListItr

START

next,previous

set

add

remove

dynamic interface extractor
Dynamic Interface Extractor
  • Goal: find the legal transitions that occur during an execution of the program
  • Java bytecode instrumentation
    • insert code to the method entry and exits to track the last-call information
  • For each thread, each instance of a class:
    • Track last state-modifying method for each submodel.
dynamic interface checker
Dynamic Interface Checker
  • Dynamic Interface Checker uses the same mechanism as the dynamic interface extractor
    • When there is a transition which is not in the model
      • instead of adding it to the model
      • it throws an exception
experiences
Experiences
  • Whaley et al. applied these techniques to several applications
automatic documentation38

START

begin

Automatic documentation

J2EE TransactionManager (dynamic)

An example FSM model

that is dynamically

generated and provides

a specification of the

interface

start

commit

suspend

rollback

resume

END

test coverage
Test coverage
  • Dynamically extracted interfaces can be used as a test coverage criteria
  • The transitions that are not present in the interface imply that those method call sequences were not generated by the test cases
  • For example, the fact that there are no self-edges in the FSM on the right implies that only amax recursion depth of 1 was tested

J2EE IIOPOutputStream(dynamic)

START

increaseRecursionDepth

increaseRecursionDepth

simpleWriteObject

decreaseRecursionDepth

END

upper lower bound of model

create

Upper/lower bound of model

SocketImpl model(dynamic)

start

START

(+static)

getFileDescriptor

availablegetInputStreamgetOutputStream

connect

close

  • Statically generated transitions provide an upper approximation of the possible method call sequences
  • Dynamically generated transitions provide a lower approximation of the possible method call sequences

END

finding api bugs

load

load

Finding API bugs
  • Automated interface extraction can be used to detect bugs
  • The interface extracted from the joeq virtual machine showed unexpected transitions

START

START

Expected APIfor jq_Method:

Actual APIfor jq_Method:

prepare

setOffset

prepare

compile

compile

summary automatic interface extraction
Summary: Automatic Interface Extraction
  • Product of FSM
    • Model is simple, but useful
  • Static and dynamic analysis techniques
    • Generate upper and lower bounds for the interfaces
  • Useful for:
    • Documentation generation
    • Test coverage
    • Finding API bugs
automated interface extraction continued
Automated Interface Extraction, Continued
  • There is a more recent work on interface extraction for Java
    • “Synthesis of Interface Specifications for Java Classes,” R. Alur, P. Cerny, P. Madhusan, W. Nam, in Proceedings of Principles of Programming Languages, (POPL 2005).
    • They built a tool called JIST (Java Interface Synthesis Tool).

I will discuss this work in the rest of the lecture.

java interface synthesis tool jist
Java Interface Synthesis Tool (JIST)
  • Here is the problem that JIST is trying to solve:
    • Given a class and a property such as “the exception E should not be raised”
      • generate a behavioral interface specification for the class that corresponds to the most general way of invoking the methods in the class without violating the safety property.
safe interface
Safe Interface
  • Let E denote the unsafe states of the program (for example an exception is raised)
    • E specifies the safety requirement, i.e., a state satisfying E should not be reached
  • An interface specification for a class is a safe interface with respect to a requirement E
    • if it is guaranteed that the program never reaches the unsafe state E as long as the class is used according to the interface specification
most permissive safe interface
Most Permissive Safe Interface
  • The most permissive safe interface is a safe interface that puts the least amount of restrictions on the users of the class
    • Interface I is more permissive than interface I’, if any call sequence allowed by I’ is also allowed by I
    • If I is the most permissive safe interface, then for any safe interface I’, I is more permissive than I’
  • JIST is guaranteed to find a safe interface but it is not guaranteed to find the most permissive safe interface
interface synthesis steps
Interface Synthesis Steps
  • STEP 1: Abstract the class to a Boolean program using predicate abstraction
    • The predicates are provided by the user
  • STEP 2: Find a winning strategy in a two-player partial information game
    • Player-0 is the user of the class. Player-0 chooses to invoke one of the methods of the class.
    • Player-1, the abstract class, chooses a corresponding possible execution through the abstract state-transition graph which results in an abstract return value.
    • A strategy for Player-0 is winning if the game always stays away from the abstract states satisfying the requirement E (E is provided by the user)
  • The most permissive winning strategy can be represented as a DFA
    • They use the L* algorithm to compute this DFA
    • L* is an algorithm for learning a regular language using membership and equivalence queries
jist architecture
JIST Architecture

Java

Java compiler

Predicates

Java Byte

Code

Soot

Predicate

Abstractor

Jimple

Game Language

Converter

Symbolic

Class

Boolean

Jimple

Interface

Synthesizer

NuSMV

Language

STEP1 : Abstraction

Interface

Automaton

Boolean

Symbolic

Class

STEP 2: Partial

Information Game

Solving

Interface

step 1 predicate abstraction
STEP 1: Predicate Abstraction
  • JIST uses a predicate abstraction technique similar to the one used in SLAM model checker
  • Predicate abstraction is an automated abstraction technique which can be used to reduce the state space of a program
  • The basic idea in predicate abstraction is to remove some variables from the program by just keeping information about a set of predicates about them
  • For example a predicate such as x = y maybe the only information necessary about variables x and y to determine the behavior of the program
    • In that case we can just store a boolean variable which corresponds to the predicate x = y and remove variables x and y from the program
    • Predicate abstraction is a technique for doing such abstractions automatically
predicate abstraction
Predicate Abstraction
  • Given a program and a set of predicates, predicate abstraction abstracts the program so that only the information about the given predicates are preserved
  • The abstracted program adds nondeterminism since in some cases it may not be possible to figure out what the next value of a predicate will be based on the predicates in the given set
  • One needs an automated theorem prover to compute the abstraction
predicate abstraction simple example
Predicate Abstraction, Simple Example
  • Assume that we have two integer variables x,y
  • Abstract the program “y := y+1” using a single predicate “x=y”
  • We will represent the predicate “x=y” as the boolean variable B in the abstract program
    • “B=true” will mean “x=y” and “B=false” will mean “xy”

Step 2: Use Decision Procedures to

determine if the predicates used for

abstraction imply any of the preconditions

Concrete Statement

y := y + 1

x = y x = y + 1 ? No

Step 1: Calculate the preconditions

x  y x = y + 1 ? No

{x = y + 1}

y := y + 1 {x = y}

x = yx  y + 1 ? Yes

x  y x  y + 1 ? No

{x  y + 1}

y := y + 1 {x  y}

Step 3: Generate Abstract Code

precondition for B being false after

executing the statement y:=y+1

IF B THEN B := false

ELSE B := true | false

(Example taken from Matt Dwyer’s slides)

step 1 predicate abstraction52
STEP 1: Predicate Abstraction
  • JIST’s predicate abstraction implementation does not handle the following:
    • Floating point types, arrays, recursive method calls (then inline the method calls by inlining), exceptions (other than the one used for the requirement E)
  • They do not use an automated theorem prover since they only handle simple expressions
  • The result of the abstraction step is an Abstract class which only contains boolean variables and is nondeterministic
    • It provides an over-approximation of the behaviors that can be generated by the concrete class
    • I.e., if a call sequence does not reach E in the abstract class then it is guaranteed that it will not reach E in the concrete class
step 2 game solving
STEP 2: Game Solving
  • Player-0 user of the abstract class
  • Player-1 the abstract class
  • Game
    • Player-0 chooses a method and calls it
    • Player-1 picks a possible execution for the method that is called (remember that there is non-determinism)
  • Player-0 wins if E is not reached
  • Question: Find the most permissive winning strategy for Player-0
    • The most permissive winning strategy corresponds to the interface for the class
game solving via learning
Game Solving via Learning
  • Results from game theory show that the winning strategy can be characterized as a DFA
  • JIST uses a learning algorithm called L* to find a winning strategy
  • L* is an algorithm that can compute a DFA by repeatedly asking membership and equivalence queries
    • Membership query: Is this string accepted by the target DFA?
    • Equivalence query: Given a DFA (a guess) is it equal to the target DFA?
      • If the equivalence query returns false, it should also give a counter-example string that is accepted by one of the DFAs but not the other
  • If these two types of queries can be answered, then L* algorithm can compute the target DFA
implementing equivalence queries
Implementing Equivalence Queries
  • Let G be the DFA guessed by the learning algorithm and let T be the target DFA
    • Equivalence query: Are the language accepted by G and the language accepted by T equal?
  • The equivalence query can be divided to two separate queries

L(G) = L(T) if and only if L(G)  L(T) and L(T)  L(G)

  • They can handle subset queries precisely
    • Membership queries can also be translated to subset queries (generate a DFA that accepts only the input string)
  • They cannot handle superset queries precisely, and because of that they are not guaranteed to compute the most permissive interface
    • However, they always compute a safe interface
implementing subset queries
Implementing Subset Queries
  • Checking a Subset query means the following
    • The learning algorithm suggests an interface I
    • They compute the composition of this interface I with the abstract class A (A || I)
    • Then they check if A || I satisfies the property AG( E) using the model checker NuSMV
      • E is the requirement (and interface is a safe interface if E never becomes true)
  • If A || I satisfies AG( E), then I is a safe interface and hence it accepts a subset of the language accepted by the most permissive interface
    • The answer to the subset query is TRUE
  • If A || I violates AG( E), then they generate a counter-example execution which shows that I can lead to violation of property E, i.e., it is not a safe interface
    • The answer to the subset query is FALSE and the counter-example is returned to the learning algorithm
implementing superset queries
Implementing Superset Queries
  • Checking a Superset query means the following
    • The learning algorithm suggests an interface I
    • I is the superset of the most permissive safe interface
      • if all the call sequences that are not allowed by I lead to some execution of class A which reaches E
    • There is no efficient way of checking this
  • They check the following:
    • If in any call sequence, the first method call that is not allowed by I always reaches E, then the answer to the superset query is TRUE
    • Otherwise, we look at the counter-example call sequence generated by the model checker and check if that call sequence is safe
      • If it is, then the answer to the superset query is FALSE and that call sequence is a counter-example to the superset query
      • If it is not safe, then we do not know the answer to the superset query, but we can still report the interface as a safe interface since it passed the subset query
experiments
Experiments
  • The automatically synthesized interfaces for some Java classes:
    • Signature, ServerTableEntry, ListItr, PipedOutputStream
  • The computation time is 5 to 100 seconds
  • In 4 our of 6 cases they found the most permissive interface

Signature class interface

s0

initSign

initVerify

update

sign

initSign

update

verify

initVerify

initSign

s1

s2

initVerify