Inferring specifications
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Inferring Specifications PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Inferring Specifications. A kind of review. The Problem. Most programs do not have specifications Those that do often fail to preserve the consistency between specification and implementation Specification are needed for verification, testing and maintenance. Suggested Solution.

Download Presentation

Inferring Specifications

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Inferring specifications

Inferring Specifications

A kind of review


The problem

The Problem

  • Most programs do not have specifications

  • Those that do often fail to preserve the consistency between specification and implementation

  • Specification are needed for verification, testing and maintenance


Suggested solution

Suggested Solution

  • Automatic discovery of specifications


Our playground

Our Playground

  • Purpose

    • Verification, testing, promote understanding

  • Specification representation

    • Contracts, properties, automaton, …

  • Inference technique

    • Static, dynamic, combination

  • Human intervention


Restrictions and assumptions

Restrictions and Assumptions

  • Learning automata from positive traces is undecidable [Gold67]

  • An executing program is usually “almost” correct

  • If a miner can identify the common behavior, it can produce a correct specification, even from programs that contain errors


Perracota mining temporal api rules from imperfect traces

Perracota: Mining Temporal API Rules From Imperfect Traces

Jinlin Yang, David EvansDepartment of Computer Science, University Of VirginiaDeepali Bhardwai, Thirumalesh Bhat, Manuvir Das

Center for Software Excellence, Microsoft Corp.

ICSE ‘06


Key contribution

Key Contribution

  • Addressing the problem of imperfect traces

  • Techniques for incorporating contextual information into the inference algorithm

  • Heuristics for automatically identifying interesting properties


Perracota

Perracota

  • A dynamic analysis tool for automatically inferring temporal properties

  • Takes the program's execution traces as input and outputs a set of temporal properties it likely has

Inference

instrumentation

Testing

InstrumentedProgram

ExecutionTraces

InferredProperties

Program

PropertyTemplates

Test Suite


Property templates

Property Templates


Initial approach

Initial approach

  • Algorithm is developed for inferring two-event properties (scalability)

  • Complexity O(nL) time, O(n2) space

    • n – number of distinct events

    • L – length of trace

  • Each cell in the matrix holds the current state of a state machine that represent the alternating pattern between the pair of events

  • Require perfect traces


Approximate inference

Approximate Inference

  • Partition trace into sub-traces

    • For example:PSPSPSPSPSPPPP PS|PS|PS|PS|PS|PPP

  • Compute satisfaction rate of each template

    • The ratio between partitions satisfying the alternate property and the total number of partitions

  • Set a satisfaction threshold


Contextual properties

slicing

//lock1acqrel//lock2acqrel

Contextual Properties

neutral

No property Inferred

lock1.acqlock2.acqlock2.rellock1.rel

sensitive

lock1.acq→lock2.acqlock1.acq→lock2.rellock1.acq→lock1.rellock2.acq→lock2.rellock2.acq→lock1.rellock2.rel→lock1.rel

lock1.acq→lock2.acq


Selecting interesting properties

Selecting Interesting Properties

  • Reachablity

    • Mark a property P→Sas probably uninteresting if S is reachable from P in the call graph

    • For example:

    • The relationship between C and D is not obvious from inspecting either C or D

A() { … B(); …}

X() { … C(); … D(); …}


Selecting interesting properties1

Selecting Interesting Properties

  • Name Similarity

    • A property is more interesting if it involves similarly named events

    • For Example:ExAcquireFastMutexUnsafeExReleaseFastMutexUnsafe

    • Compute word similarity as


Chaining

Chaining

  • Connect related Alternating properties into chains

    • A→B, B→C and A→C implies A→B→C

  • Provide a way to compose complex state machines out of many small state machines

  • Identification of complex multi-event properties without suffering a high computational cost


Smartic towards building an accurate robust and scalable specification miner

SMArTIC: Towards Building an Accurate, Robust and Scalable Specification Miner

David Lo and Siau-Cheng Khoo

Department of Computer Science, National University of Singapore

FSE ‘06


Hypotheses

Hypotheses

  • Mined specifications will be more accurate when:

    • erroneous behavior is removed before learning

    • they are obtained by merging the specifications learned from clusters of related traces than when they are obtained from learning the entire traces


Structure

Automatons

Clusters ofFilteredTraces

Merged Automaton

Structure

Filtering

Clustering

Learning

Merging

FilteredTraces

Traces


Filtering

Filtering

  • How can you tell what’s wrong if you don’t know what’s right?

  • Filter out erroneous traces based on common behavior

  • Common behavior is represented by “statistically significant” temporal rules


Pre post rules

Pre → Post Rules

  • Look for rules of the form a→bcwhen a occurs b must eventually occur after a, and c must also eventually occur after b

  • Rules exhibiting high confidence and reasonable support can be considered as “statistical” invariants

    • Support – Number of traces exhibiting the property pre→post

    • Confidence – the ratio of traces exhibiting the property pre→post to those exhibiting the property pre


Clustering

Clustering

  • Convert a set of traces into groups of related traces

    • Localize inaccuracies

    • Scalability


Clustering algorithm

Clustering Algorithm

  • Variant of the k-medoid algorithm

    • Compute the distance between a pair of data items (traces) based on a similarity metric

    • k is the number of clusters to create

  • Algorithm:

    Repeatedly increase k until a local maximum is reached


Similarity metric

Similarity Metric

  • Use global sequence alignment algorithm

  • Problem: doesn’t work well in the presence of loops

  • Solution: compare the regular expression representation

FTFTALILLAVAVF--TAL-LLA-AV

ABCBCDABCBCBCDABCD

(A(BC)+D)+ABCD


Learning

Learning

  • Learn PFSAs from clusters of filtered traces

    • PFSA per cluster

  • A “place holder”

    • In current experiment – sk-strings learner


Merging

Merging

  • Merge multiple PFSAs into one

  • The merged PFSA accepts exactly the union of sentences accepted by the multiple PFSAs

  • Ensures probability integrity

    • Probability for transition  in output PFSA


From uncertainty to belief inferring the specifications within

From Uncertainty to Belief: Inferring the Specifications Within

Ted Kremenek, Paul Twohey, Andrew Y. Ng, Dawson EnglerComputer Science Dep., Stanford University

Godmar BackComputer Science Dep.,Virginia TechOSDI ‘06


Motivating example

Motivating Example

  • Problem: Inferring ownership roles

    • Ownership idiom: a resource has at any time exactly one owning pointer

  • Infer annotation

    • ro – returns ownership

    • co – claims ownership

  • Is fopen a ro? fread/fclose a co?

FILE* fp = fopen(“myfile.txt”, r);fread(buffer, n, 1000, fp);fclose(fp);


Basic ownership rules

¬co

end-of-path

Owned

Claimed

OK

co

ro

Uninit

anyuse

¬co

¬ro

¬Owned

Bug

co

Basic Ownership Rules

  • fp = ro(); ¬co(fp); co(fp);

  • fp = ¬ro(); ¬co(fp); ¬co(fp);


Inferring specifications

Goal

Provide a framework that:

  • Allows users to easily express every intuition and domain-specific observation they have that is useful for inferring annotations

  • Reduce such knowledge in a sound way to meaningful probabilities (“common currency”)


Annotation inference

Annotation Inference

  • Define the set of possible annotations to infer

  • Model domain-specific knowledge and intuitions in the probabilistic model

  • Compute annotations probabilitis


Factors modeling beliefs

{

q<ok>: if DFA = OK

f<check>=

q<bug>: if DFA = Bug

Factors – Modeling Beliefs

  • Relations mapping the possible values of one or more annotations variables to non-negative real numbers

  • For example:

    • CheckFactor

    • belief: any random place might have a bug 10% of times; set q<bug>= 0.1, q<ok>= 0.9


Factors

Factors

  • Other factors

    • bias toward specifications with ro

    • without co

    • based on naming conventions


Annotation factor graph

f<ro>

f<co>

f<co>

f<ro>

f<co>

fopen:ret

fread:4

fclose:1

fdopen:ret

fwrite:4

f<check>

f<check>

Annotation Factor Graph

prior beliefs

annotationvariables

behavioral tests


Results

Results


Quark empirical assessment of automaton based specification miners

QUARK: Empirical Assessment of Automaton-based Specification Miners

David Lo and Siau-Cheng Khoo

Department of Computer Science, National University of Singapore

WCRE ‘06


Quark framework

QUARK Framework

  • Assessing the quality of specification miners

  • Measure performance along multiple dimensions

    • Accuracy – extent of inferred specification being representative of the actual specification

    • Scalability – ability to infer large specifications

    • Robustness – sensitivity to errors


Quark framework1

Simulator Model (PFSA)

QUARK Framework

QualityAssessment

TraceGenerator

User Defined Miner

Measurements


Accuracy trace similarity

Accuracy (Trace Similarity)

  • Absence of error

  • Metrics:

    • Recall – correct information that can be recollected by the mined model

    • Precision – correct information that can be produced by the mined model

    • Co-emission – probability similarity


Robustness

Z

Z

error

Z

Z

Robustness

  • Sensitivity to errors

  • “Inject” error nodes and error transition to the PFSA model

start

A

D

B

C

1

2

3

4

E

F

E

H

end


Scalability

Scalability

  • Use synthetic models

    • Build a tree from a pre-determined number of nodes

    • Add loops based on ‘locality of reference’

    • Assign equal probabilities to transition from the same node

  • Vary the size of the model (nodes, transitions)


  • Login