applying model checking to large programs l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Applying Model Checking To Large Programs PowerPoint Presentation
Download Presentation
Applying Model Checking To Large Programs

Loading in 2 Seconds...

play fullscreen
1 / 41

Applying Model Checking To Large Programs - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Applying Model Checking To Large Programs . Madan Musuvathi Microsoft Research. The Model Checking Problem. A system model S A property P Check if S satisfies P. The Model Checking Problem. A system model S An environment E A property P Check if S in E satisfies P.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Applying Model Checking To Large Programs' - maia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
applying model checking to large programs

Applying Model Checking To Large Programs

Madan Musuvathi

Microsoft Research

the model checking problem
The Model Checking Problem

A system model S

A property P

Check if S satisfies P

the model checking problem3
The Model Checking Problem

A system model S

An environment E

A property P

Check if S in E satisfies P

in previous lectures
In Previous Lectures

A system model S

An environment E

A property P

Check if S in E satisfies P

Assume Given

Mighty hard stuff

when applied to large systems
When Applied to Large Systems

A system model S

An environment E

A property P

Check if S in E satisfies P

Even this is challenging!

Try the simplest thing that works!

model checking an engineer s view
Model Checking : An Engineer's View

Given a system and its environment

Expose nondeterminism

Environment nondeterminism : inputs, timers, events

Internal nondeterminism: arising from abstractions

Systematically explore all states of the system

Do this exploration intelligently

If lucky, you find a bug

If luckier, you verify the system

explicit state model checking
Explicit State Model Checking

Explicitly generate the individual states

Systematically explore the state space

State space: Graph that captures all behaviors

Model checking == Graph search

Generate the state space graph "on-the-fly"

State space is typically much larger than the reachable set of states

guarded transition system
Guarded Transition System

System = State + Transitions

Readily models event-driven systems

State{

int x;

}

Init: {x = 0;}

// Transitions

Trans1(){ if (x < 3) x' = x + 1; }

Trans2(){ if (x == 3) x' = 0; }

the algorithm
The Algorithm

Hashtable states_seen;

Queue pending;

insert init_state into pending;

while(pending is not empty){

current = pending.remove();

for each enabled transition T {

restore_state(current);

execute transition T

successor = save_state();

if(successor in states_seen)

continue;

check successor for correctness;

insert successor into pending queue;

}

}

how to write a model checker in an hour
How to write a model checker in an hour

Specify the system and the environment as a class

State = member fields

Transitions = member functions

Each member function has a Boolean guard function

Capturing state : provide serialization functions

GetState() returns the state in a buffer

SetState() copies the state from a buffer

Implement the search algorithm

state explosion problem
State Explosion Problem

Simple descriptions result in (very) large state spaces

State space reduction techniques

Identify behaviorally equivalent states

Process symmetry reduction

Heap symmetry reduction

Identify behaviorally equivalent transition orderings

Partial-order reduction

how to write a model checker in a week
How to write a model checker in a week

Specify the system and the environment as a class

State = member fields

Transitions = member functions

Each member function has a Boolean guard function

Capturing state : provide serialization functions

GetState() returns the state in a buffer

SetState() copies the state from a buffer

Implement the search algorithm

Implement some state space reduction techniques

practical challenges
Practical Challenges

Reduce manual intervention

How to specify the system?

What is the environment?

Guarantees

Soundness

If the tool terminates without finding a bug (of a certain type), then the program has no bugs

Preciseness

If the tool reports an error, then it is indeed a real error

Orthogonal to the difficulty of model checking algorithms

specifying the model
Specifying the Model

Conventional model checkers require an intermediate description (or "model")

Describes the system at a high level

Throws away implementation details

Good for checking designs, rather than implementations

Success stories: hardware, cache-coherence protocols

Problems

Specifying a model is HARD for large systems

As the system evolves model has to be updated

What you check is not what you run!

Manual errors can miss or introduce errors

automatically extract the model
Automatically Extract the Model

Statically analyze the code to generate a model

Models usually mimic the implementation

Murphi model

FLASH

Rule "PI Local Get (Put)"

1:Cache.State = Invalid

& ! Cache.Wait

2: & ! DH.Pending

3: & ! DH.Dirty ==>

Begin

4: Assert !DH.Local;

5: DH.Local := true;

6: CC_Put(Home, Memory);

EndRule;

void PILocalGet(void) {

// ... Boilerplate setup

2 if (!hl.Pending) {

3 if (!hl.Dirty) {

4! // ASSERT(hl.Local);

...

6 PI_SEND(F_DATA, F_FREE, F_SWAP,

F_NOWAIT, F_DEC, 1);

5 hl.Local = 1;

automatic extraction
Automatic Extraction

FeaVer : C program -> Promela (SPIN) model

User provided patterns to extract features

Bandera: Java -> Bandera model

Sophisticated property-driven slicing techniques

Can throw away unrelated parts, if applicable

Problems

Not all primitives are available in the modeling language

Pointers, dynamic object creation, dynamic threads, exceptions

A precise-enough slice could be as large as the program iteself

code as the model
Code as the model

Directly execute the code

Pioneered by Verisoft

State-less model checking

Explicit model checkers

Java Path Finder (Java)

CMC (C/C++)

State space can be infinite (or very large)

Try exploring as much behaviors as possible

Focus on precision

model checking testing
Model Checking == Testing ?

Almost!

Systematic exploration of nondeterminism

Testing = random walks in the state space

Model checking = systematic graph search

Forces the user to expose more nondeterminism

A call to malloc() can fail, a packet can get lost

State space reduction techniques identify redundant tests

specifying the system
Specifying the System

Similar to building a unit-test framework

Extract the code to be checked

Provide an environment model

Includes entities that the implementation interacts with

Calls to libraries, network, timers manual input

Code + environment is a closed system

An executable that you can run

Provide correctness properties

identify the transitions
Identify the Transitions

Transition is a code execution between two non-deterministic choices

Atomic execution of a thread between two schedule points

Execution of an event handler

Model checker should get control at these choice points

capturing the state
Capturing the State

State of the program is captured by global variables, stack, heap, and registers

Need a way to capture the state of the environment model

backtracking
Backtracking

Physically reset the state to an older version

Java Pathfinder, CMC

Go to the initial state and reexecute

Fork a separate process at initial state (Verisoft)

Some systems have a natural 'reset'

Unload and reload a driver

Reformat the disk

experience with cmc
Experience with CMC

Three AODV implementations

35 implementation bugs, 1 specification bug

Linux TCP

4 bugs, 90% protocol coverage

Three Linux filesystems

32 bugs in total

10 serious ones (such as deleting "/")

environment problem
Environment Problem

Where to separate the system and the environment

Need a faithful abstraction of the environment

Enough nondeterminism to trigger interesting behaviors in the system

Not too much nondeterminism to trigger false behaviors

An Example

System: Linux TCP implementation

Environment: Kernel, network (driver + hardware), …

extracting linux tcp from the kernel
Extracting Linux TCP from the Kernel

Conventional wisdom:

Extract TCP along a minimal, narrow interface

Minimizes the model state

Provide a ‘kernel library’

Implements stubs for all kernel functions TCP requires

Never worked!

The narrowest interfaces still had ~150 interface fns

These interfaces are not documented

Errors in stubs can cause subtle but false errors

Model checkers are good in finding subtle errors!

Errors in stubs can miss errors

slide26
Solution (hard learned) :

Extract along well-defined interfaces

Minimize errors in stub implementations

These interfaces change infrequently

Do so even if it stresses model checking

Well defined interfaces around TCP

The system call interface (kernel & user processes)

The hardware abstraction layer (kernel & hardware)

Extracting at these two interfaces

Forces CMC to run the entire Linux kernel

Extracting Linux TCP from the Kernel

running the entire kernel in cmc
Running the Entire Kernel in CMC

Linux kernel has to run in user space

Has been done before (UML : User Mode Linux)

CMC needs to handle much larger states

Approximately 300 kilobytes

Incremental states in effect extract TCP relevant state

A larger state space

Restrict the environment to trigger TCP events only

Compensated by the ease of environment model generation

Approach not possible when model checking with an intermediate description

specifying properties
Specifying Properties

Assertion in the code

Trigger automatically as we are running the code

Heap related errors

Build your own memory allocator

Check for leaks, double-free

Purify-style dynamic techniques

Reading uninitialized variables, access after free

Checking for resource leaks

Check if you reached the initial state if you should have

Identify idempotent sequences

CreateFile(A) followed by DeleteFile(A)

some properties are hard to specify
Some properties are hard to specify

Real systems have ambigous / incomplete specifications

TCP congestion control should does not use up "too much " network bandwidth

A file system should not lose files

Difficult to check in the presence of crashes

Identify properties that are easy to check

A file system is in a bad state if its own fsck() cannot recover from it

state space reduction techniques
State Space Reduction Techniques

Downscaling

Hash Compaction

Identifying State Symmetries

downscaling
Downscaling

Check smaller versions of the model

Example

Run with only 3-4 nodes in the network

Send just 3 data packets

Find bugs involving complex interactions in smaller instances

Potentially miss bugs present only in larger instances

hash compaction
Hash Compaction

Compact states in the hash table [Stern, 1995]

Compute a signature for each state

Only store the signature in the hashtable

Signature is computed incrementally

Partial signature cached at each page

Might miss errors due to collisions

Orders of magnitude memory savings

Compact 100 kilobyte state to 4-8 bytes

Possible to search ~10 million states

state symmetries
Explore one out of a (large) set of equivalent states

Canonicalize states before hashing

State Symmetries

Canonical

State

Hash

Signature

Current State

Successor States

Hash table

  • State transformations can be approximate
    • But, use the original state for further state exploration
    • Thus, approximations do not generate false errors!
heap canonicalization
Heap Canonicalization

Heap objects can be allocated in different order

Depends on the order events happen

Relocate heap objects to a unique representation

state1

state2

Canonical Representation

  • Essentially:

Find a canonical representation for each heap graph

By abstracting the concrete values of pointers

heap canonicalization algorithm
Heap Canonicalization Algorithm

Basic algorithm [Iosif 01]

Do a deterministic graph traversal of the heap (bfs / dfs)

Relocate objects in the order visited

CMC extensions:

How to do it incrementally?

Should not traverse the entire heap in every transition

How to do it for C objects?

Type information is not available at run time

iosif s canonicalization algorithm
Iosif’s Canonicalization Algorithm

Do a deterministic graph traversal of the heap (bfs / dfs)

Relocate objects to a canonical location

Determined by the dfs (or bfs) number of the object

Hash the resulting heap

x

a

c

x

y

a

c

y

r

0

2

4

6

r

2

6

s

s

Canonical Heap

Heap

two linked list example
Two Linked List Example

a

a

y

x

c

c

b

y

b

a

c

a

y

x

x

y

y

c

x

Heap

Canonical Heap

0

2

4

6

r

r

2

6

s

s

Partial

hash values

Transition: Insert b

0

2

4

6

8

r

r

s

s

a much larger example linux kernel
A Much Larger Example : Linux Kernel

Heap

Canonical Heap

p

Network

File-

system

Core OS

Core OS

Network

Filesystem

p

An object insertion here

Affects the canonical location of objects here

incremental heap canonicalization
Incremental Heap Canonicalization

Access Chain :

A path from the root to an object in the heap

Bfs Access Chain:

Shortest of all access paths

Break ties lexicographically

Note: Bfs access chain is a shortest path from a global variable

Canonical location of an object is a function of its bfs access chain

r

f

g

f

a

b

g

h

c

  • Access chain of c
        • <r,f,g>
        • <r,g,h>
        • <r,f,f,h>
  • Bfs access chain of c
  • <r,f,g>
revisiting two linked lists example
Revisiting Two Linked Lists Example

c

b

b

a

y

x

c

x

y

a

y

x

c

a

y

x

a

c

Relocation

Function

Table

r,s are root vars

n is the next field

0

2

4

6

r

r

2

6

s

s

0

2

4

6

8

r

r

s

s

Heap

Canonical Heap

and on the much larger example
And on the much larger example

Heap

Canonical Heap

p

Network

File-

system

Core OS

Filesystem

Core OS

p

Core OS’

Filesystem’

Changes here do not affect

the canonical location of p

  • Canonical location of p does not change
    • Unless its Bfs Access Chain changes
  • For small changes to the graph
    • Shortest path of most objects remains the same