1 / 28

Mining Specifications

Mining Specifications. Glenn Ammons , Dept. Computer Science University of Wisconsin Rastislav Bodik , Computer Science Division University of California, Berkeley James R. Larus , Microsoft Research POPL 2002. Motivation. Formal verification is a promising alternative to software testing

byron
Download Presentation

Mining Specifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California, Berkeley James R. Larus, Microsoft Research POPL 2002

  2. Motivation Formal verification is a promising alternative to software testing But Verifiers will be of little use without enough correctness specifications to be verified

  3. The Assumption Common behavior is (often) correct behavior. If we can identify common behavior we can produce correct specifications, even from programs that contain errors.

  4. A Program Using socketAPI 1 int s = socket(AF_INET, SOCK_STREAM, 0); 2 … 3 bind(s, &serv_addr, sizeof(serv_addr)); 4 … 5 listen(s, 5); 6 … 7 while (1) { 8 int ns = accept(s, &addr, &len); 9 if (ns < 0) break; 10 do { 11 read(ns, buffer, 255); 12 … 13 write(ns, buffer, size); 14 if (cond1) return; 15 } while (cond2) 16 close(ns); 17 } 18 close(s);

  5. An Example Trace 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8) 5 read(fd = 8, buf = 0x400320, len = 255, return = 12) 6 write(fd = 8, buf = 0x400320, len = 12, return = 12) 7 read(fd = 8, buf = 0x400320, len = 255, return = 7) 8 write(fd = 8, buf = 0x400320, len = 7, return = 7) 9 close(fd = 8, return = 0) 10 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 10) 11 read(fd = 10, buf = 0x400320, len = 255, return = 13) 12 write(fd = 10, buf = 0x400320, len = 13, return = 13) 13 close(fd = 10, return = 0) 14 close(fd = 7, return = 0)

  6. Design Decisions • Learn from traces not from source • Contain fewer bugs • Take a “vote” on what the common program behavior is. • the high-probability core encodes the frequently followed protocol.

  7. Instrumented program Tracer Program Run Test inputs Traces Flow dependence annotator Annotated traces Abstract scenario strings Scenario extractor Scenario seed Automaton learner Specifications Mining System

  8. The (unsolvable) Problem • I - the set of all traces of interaction with an API or ADT . • C I - the set of all correct traces of interaction. • T - an unlabelled training set of interaction traces. Find an automaton A that generates exactly the traces in C.

  9. Restriction 1 • C must be a regular language. • Model checkers require finite-state specifications. • Algorithms for learning finite-state automatons are relatively well developed.

  10. malloc(return = O1) malloc malloc(return = O1) free(p = O1) malloc(return = Ostd) free(p = Ostd) O1{ O1{ malloc(return = O2) malloc ... malloc(return = O2) free(p = O2) malloc(return = Ostd) free(p = Ostd) O2{ O2{ malloc(return = On) malloc free(p = On) free ... ... ... malloc(return = Ostd) free(p = Ostd) malloc(return = On) free(p = On) free(p = O2) On{ On{ free free(p = O1) free Interaction Scenarios LinkedList(n)

  11. The Problem – Take 2 • IS - the set of all interaction scenarios with an API or ADT that manipulate no more than k data objects. • CS IS - the regular set of all correct scenarios. • TS - an unlabelled training set of interaction scenarios from IS. Find a finite-state automaton ASthat generates exactly the scenarios in CS.

  12. Restriction 2 - Linking Ts and Cs TS = c0,c1,… be an infinite sequence of elements from CSin which each element of CSoccurs at least once. for each n > 0: c0,c1,… cn ASn for some N ≥ 0, ASN generates exactly the scenarios in CSand ASn= ASNfor all n ≥ N. AS0,AS1,… identifies CSin the limit.

  13. The Probabilistic Approach • Is – as before. • M – a target PFSA and PM a distribution over Is that M generates. “Efficiently” find a PFSA M’ such that its distribution PM’ is an ε-good approximation of PM.

  14. Instrumented program Tracer Program Run Test inputs Traces Flow dependence annotator Annotated traces Abstract scenario strings Scenario extractor Scenario seed Automaton learner Specifications Mining System

  15. Tracer • C stdio replacement (requires recompilation) • Executable editing skeleton: interaction(attribute0 ,…, attributen) 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)

  16. Flow Dependence Untyped trace with dependencies Dependence analysis Traces Type inference Annotated traces

  17. Dependence Analysis • Takes a list of attributes that define or use objects (manually created). • Creates a flow dependence between users and definers. Definers:socket.returnbind.solisten.soaccept.returnclose.fd Users:bind.solisten.soaccept.soread.fdwrite.fdclose.fd

  18. Type Inference If there exists a flow dependency between two attributes then typing gives these attributes the same type. Type(socket.return)=T0 Type(bind.so)=T0 Type(listen.so)=T0 Type(accept.so)=T0 Type(accept.return)=T0 Type(read.fd)=T0 Type(write.fd)=T0 Type(close.fd)=T0

  19. Scenario Extraction Annotaed traces Extraction scenarios Scenario seeds Simplification simplified scenarios Standardization Abstract scenario strings

  20. Extraction • A scenario is a set of interactions related by flow dependences. 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8) 5 read(fd = 8, buf = 0x400320, len = 255, return = 12) 6 write(fd = 8, buf = 0x400320, len = 12, return = 12) 7 read(fd = 8, buf = 0x400320, len = 255, return = 7) 8 write(fd = 8, buf = 0x400320, len = 7, return = 7) 9 close(fd = 8, return = 0)

  21. Simplification Eliminate all interaction attributes that do not carry a flow dependence. 1 socket(return = 7) 2 bind(so = 7) 3 listen(so = 7) 4 accept(so = 7, return = 8) [seed] 5 read(fd = 8) 6 write(fd = 8) 7 read(fd = 8) 8 write(fd = 8) 9 close(fd = 8)

  22. Standardization • Naming: replaces attribute values with symbolic variables. • Reordering 1 socket(return = x0:T0) 2 bind(so = x0:T0) 3 listen(so = x0:T0) 4 accept(so = x0:T0, return = x1:T0) [seed] 5 read(fd = x1:T0) 7 read(fd = x1:T0) 6 write(fd = x1:T0) 8 write(fd = x1:T0) 9 close(fd = x1:T0) (A) (B) (C) (D) (E) (E) (F) (F) (G)

  23. start 5 10000 5 10000 10000 5 5 final Automaton Learning • OTS learner learns a PFSA • A corer removes infrequently traversed edges and converts the PFSA into an NFA.

  24. socket(return = x) bind(so = x) listen(so = x) accept(so = x, return = y) read(fd = y) write(fd = y) close(fd = y) close(fd = x) Specification Automaton for the Socket Protocol

  25. Experimental Results • Analyzed traces from programs that use the Xlib and X Toolkit Intrinsics libraries for the X11 windowing system. • Traces were generated manually • Compare mined specification to Interclient Communication Conventions Manual (ICCCM) rules.

  26. Experimental Results • A small and buggy training set prevented the miner from discovering the rule. • solution: an expert chooses correct traces as the training set.

  27. Benefits • Exploits the massive programmers' effort that is reflected in the code (and nowhere else). • Offers convenience and insights.It is easier to approve a mined formal specification than to write one.

  28. Conclusion • Introduced a (semi) automatic machine-learning approach for discovering formal specifications. • Reduced the problem to learning regular languages. • Initial experience is promising.

More Related