mining specifications lots of code specifications
Download
Skip this Video
Download Presentation
Mining Specifications (lots of) code  specifications

Loading in 2 Seconds...

play fullscreen
1 / 18

Mining Specifications (lots of) code  specifications - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Mining Specifications (lots of) code  specifications. Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research. Drivers wanted. Verification: beyond engine-less cars. Recent successes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mining Specifications (lots of) code  specifications' - minowa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
mining specifications lots of code specifications

Mining Specifications(lots of) code  specifications

Glenn Ammons Ras Bodík Jim Larus

Univ. of Wisconsin Univ. of Wisconsin Microsoft Research

verification beyond engine less cars

Drivers wanted.

Verification: beyond engine-less cars

Recent successes.

  • specifications languages
  • checkers
  • abstractors

What’s still missing?

  • specifications
so who formulates specifications
So who formulates specifications?

Programmers? Probably not.

Why they won’t:

  • too busy; Yet another language to learn?
  • specifications aren’t cool.

Why they shouldn’t:

  • may misunderstand usage rules.
  • may not know all usage rules.

Mining Specifications:

  • Convenience.
  • Like in data mining, discover surprise rules.
advantages of mining
Advantages of mining

Exploits the massive programmers’ effort reflected in the code.

  • Programmers resolved many problems:
    • incomplete system requirements.
    • incomplete API documentation.
    • implementation-dependent rules.
  • Want redundancy? (without redundant programming)
    • ask multiple programmers (and vote).
our output a specification

x = socket()

bind(x)

listen(x)

y = accept(x)

read(y)

write(y)

close(y)

close(x)

Our output: a specification
how do we mine
How do we mine?
  • Underlying premise:
  • Even bad software is debugged enough to show hints of correct behavior.
    • Maxim: Common usage is the correct usage.
mining machine learning
Mining = machine learning

Reduce the problem into the well-known problem of learning regular languages.

Obstacles:

  • bugs from source code may be learned into specification
  • what is “common” behavior?

Solutions:

  • learn from dynamic behavior
  • learn probabilistically

learn from traces into probabilistic FSMs

input trace s

x = socket()

bind(x)

listen(x)

y = accept(x)

read(y)

write(y)

close(y)

close(x)

Input: trace(s)

7 = socket(2, 1, 0);

bind(7, 0x400120, 16);

listen(7, 5);

8 = accept(7, 0x400200, 0x400240);

read(8, 0x400320, 255);

write(8, 0x400320, 12);

read(8, 0x400320, 255);

write(8, 0x400320, 7);

close(8);

10 = accept(7, 0x400200, 0x400240);

read(10, 0x400320, 255);

write(10, 0x400320, 13);

close(10);

close(7);

the mining algorithm

dynamicexecution(traces)

usage scenarios(strings)

(off-the-shelf)RegExp

learner

trace abstraction

generalizedscenarios(probabilistic NFA)

extract heavy core

(and approve)

specification

(NFA)

dynamic exe.to be checked

(trace)

dynamic checker

OK/bug

The mining algorithm
trace abstraction 4 challenges
Trace abstraction: 4 challenges
  • Traces interleave useful and useless events.
    • RegExp learner cannot separate them.
  • Specifications must include both temporal and value-flow constraints.
    • RegExp learner only good with temporal constraints.
  • Only some of API calls’ arguments impose “true” dependences.
    • Infeasible to learn value-flow constraints on all arguments.
  • Specifications may impose only partial order.
    • Encoding all legal partial orders would produce a huge FSM.
trace abstraction

h(_, 5)

c(10)

a(4, 5)

d(4, 7)

b(_, 5)

f(10)

h(_, 11)

e(7)

f(_)

d(_, _)

c(7)

a(9, 11)

b(_, 11)

d(9, _)

e(_)

f(_)

h(_, )

a( , )

d( , )

b(_, )

e( )

h(_, X)

a(Y, X)

d(Y, Z)

b(_, X)

e(Z)

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

e(Z)

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

Trace abstraction

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

e(Z)

h(3, 5)

c(10)

a(4, 5)

d(4, 7)

b(0, 5)

f(10)

h(8, 11)

e(7)

f(50)

d(15, 1)

c(7)

a(9, 11)

b(6, 7)

d(9, 14)

f(20)

e(7)

preliminary experiments
Preliminary experiments

Attempted to learn and verify two published X Windows rules

As of Friday:

  • A timestamp-passing rule
    • learned the rule! (compact: 6 states)
    • bugs in 2 out of 17 programs (ups, e93)
  • SetOwner(x) must be followed by GetSelection(x)
    • failed to learn the rule (small learning set) but
    • bugs in 2 out of 5 programs (xemacs, ups)
related work
Related work

Arithmetic pre/post conditions

  • Daikon, Houdini
    • properties orthogonal from us
    • eventually, we may need to include and learn some arithmetic relationships

Temporal relationships over calls

  • intrusion detection: [Ghosh et al], [Wagner and Dean]
  • software processes: [Cook and Wolf]
  • error checking: [Engler et al SOSP 2001]
    • lexical and syntactic pattern matching
    • user must write templates (e.g., <a> always follows <b>)
ongoing work
Ongoing work

Mechanize tool. Find more gold.

future work
Future work

ESP

Vault

SPIN

code

Mining

specifications

bugs

inputs

Verisoft

?

SLAM

Give gold to jewelers.

summary
Summary
  • Semi-automatically creating well-formend, non-trivial specifications is an important part of the verification tool chain.
  • Contributions:
    • introduced specifications mining
    • phrased it as probabilistic learning from dynamic traces
    • decomposed it into a sequence of subproblems (using an off-the-shelf learner)
    • developed dynamic checker
    • found bugs
discussion
Discussion

Expressibility

  • what classes of properties can/should we learn?
  • can we learn more than we can check?
  • can a single-threaded specification avoid race conditions?
ad