Mining specifications lots of code specifications
Download
1 / 18

Mining Specifications (lots of) code  specifications - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Mining Specifications (lots of) code  specifications. Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research. Drivers wanted. Verification: beyond engine-less cars. Recent successes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Mining Specifications (lots of) code  specifications' - minowa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mining specifications lots of code specifications

Mining Specifications(lots of) code  specifications

Glenn Ammons Ras Bodík Jim Larus

Univ. of Wisconsin Univ. of Wisconsin Microsoft Research


Verification beyond engine less cars

Verification: beyond engine-less cars

Recent successes.

  • specifications languages

  • checkers

  • abstractors

    What’s still missing?

  • specifications


So who formulates specifications
So who formulates specifications?

Programmers? Probably not.

Why they won’t:

  • too busy; Yet another language to learn?

  • specifications aren’t cool.

    Why they shouldn’t:

  • may misunderstand usage rules.

  • may not know all usage rules.

    Mining Specifications:

  • Convenience.

  • Like in data mining, discover surprise rules.


Advantages of mining
Advantages of mining

Exploits the massive programmers’ effort reflected in the code.

  • Programmers resolved many problems:

    • incomplete system requirements.

    • incomplete API documentation.

    • implementation-dependent rules.

  • Want redundancy? (without redundant programming)

    • ask multiple programmers (and vote).


Our output a specification

x = socket()

bind(x)

listen(x)

y = accept(x)

read(y)

write(y)

close(y)

close(x)

Our output: a specification


How do we mine
How do we mine?

  • Underlying premise:

  • Even bad software is debugged enough to show hints of correct behavior.

    • Maxim: Common usage is the correct usage.


Mining machine learning
Mining = machine learning

Reduce the problem into the well-known problem of learning regular languages.

Obstacles:

  • bugs from source code may be learned into specification

  • what is “common” behavior?

    Solutions:

  • learn from dynamic behavior

  • learn probabilistically

    learn from traces into probabilistic FSMs


Input trace s

x = socket()

bind(x)

listen(x)

y = accept(x)

read(y)

write(y)

close(y)

close(x)

Input: trace(s)

7 = socket(2, 1, 0);

bind(7, 0x400120, 16);

listen(7, 5);

8 = accept(7, 0x400200, 0x400240);

read(8, 0x400320, 255);

write(8, 0x400320, 12);

read(8, 0x400320, 255);

write(8, 0x400320, 7);

close(8);

10 = accept(7, 0x400200, 0x400240);

read(10, 0x400320, 255);

write(10, 0x400320, 13);

close(10);

close(7);


The mining algorithm

dynamicexecution(traces)

usage scenarios(strings)

(off-the-shelf)RegExp

learner

trace abstraction

generalizedscenarios(probabilistic NFA)

extract heavy core

(and approve)

specification

(NFA)

dynamic exe.to be checked

(trace)

dynamic checker

OK/bug

The mining algorithm


Trace abstraction 4 challenges
Trace abstraction: 4 challenges

  • Traces interleave useful and useless events.

    • RegExp learner cannot separate them.

  • Specifications must include both temporal and value-flow constraints.

    • RegExp learner only good with temporal constraints.

  • Only some of API calls’ arguments impose “true” dependences.

    • Infeasible to learn value-flow constraints on all arguments.

  • Specifications may impose only partial order.

    • Encoding all legal partial orders would produce a huge FSM.


Trace abstraction

h(_, 5)

c(10)

a(4, 5)

d(4, 7)

b(_, 5)

f(10)

h(_, 11)

e(7)

f(_)

d(_, _)

c(7)

a(9, 11)

b(_, 11)

d(9, _)

e(_)

f(_)

h(_, )

a( , )

d( , )

b(_, )

e( )

h(_, X)

a(Y, X)

d(Y, Z)

b(_, X)

e(Z)

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

e(Z)

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

Trace abstraction

h(_, X)

a(Y, X)

b(_, X)

d(Y, Z)

e(Z)

h(3, 5)

c(10)

a(4, 5)

d(4, 7)

b(0, 5)

f(10)

h(8, 11)

e(7)

f(50)

d(15, 1)

c(7)

a(9, 11)

b(6, 7)

d(9, 14)

f(20)

e(7)


Preliminary experiments
Preliminary experiments

Attempted to learn and verify two published X Windows rules

As of Friday:

  • A timestamp-passing rule

    • learned the rule! (compact: 6 states)

    • bugs in 2 out of 17 programs (ups, e93)

  • SetOwner(x) must be followed by GetSelection(x)

    • failed to learn the rule (small learning set) but

    • bugs in 2 out of 5 programs (xemacs, ups)


Related work
Related work

Arithmetic pre/post conditions

  • Daikon, Houdini

    • properties orthogonal from us

    • eventually, we may need to include and learn some arithmetic relationships

      Temporal relationships over calls

  • intrusion detection: [Ghosh et al], [Wagner and Dean]

  • software processes: [Cook and Wolf]

  • error checking: [Engler et al SOSP 2001]

    • lexical and syntactic pattern matching

    • user must write templates (e.g., <a> always follows <b>)


Ongoing work
Ongoing work

Mechanize tool. Find more gold.


Future work
Future work

ESP

Vault

SPIN

code

Mining

specifications

bugs

inputs

Verisoft

?

SLAM

Give gold to jewelers.


Summary
Summary

  • Semi-automatically creating well-formend, non-trivial specifications is an important part of the verification tool chain.

  • Contributions:

    • introduced specifications mining

    • phrased it as probabilistic learning from dynamic traces

    • decomposed it into a sequence of subproblems (using an off-the-shelf learner)

    • developed dynamic checker

    • found bugs


Discussion
Discussion

Expressibility

  • what classes of properties can/should we learn?

  • can we learn more than we can check?

  • can a single-threaded specification avoid race conditions?



ad