FATAL ERROR: Evento Imprevedibile … o no. Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, 00198 Roma, Italy, tronci@dsi.uniroma1.it , http://www.dsi.uniroma1.it/~tronci.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
FATAL ERROR: Evento Imprevedibile … o no
Enrico Tronci
Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113,
00198 Roma, Italy,tronci@dsi.uniroma1.it, http://www.dsi.uniroma1.it/~tronci
Incontri 20032004 con la Facoltà di Scienze Matematiche Fisiche e Naturali
3 Dicembre 2003
This program has executed an illegal
operation and will be terminated
Error code: 56
Press any key to restart your system
Given:
a system Sys (e.g. hardware, software, hybrid, etc) and an specifications Spec for Sys (e.g. what Sys should or should not do)
We want to know:
If system Sys satisfies the given specifications Spec.
This program has executed an illegal
operation and will be terminated
Error code: 56
Press any key to restart your system
… is an undesired state for Windows XX
… is an undesired state for ANY Microprocessor FPU (… Pentium 2 included … )
1.1
1.1
= 3.7
A CRASH is an undesired state for rockets (e.g. Arianne V), airplanes, trains, cars, …
Input sequence
(stimulus)
Output sequence
System (Model)
Compute output by
Simulation or by running the actual
system when possible
… u(3) u(2) u(1) u(0)
y(0) y(1) y(2) y(3) …
Define initial state + parameters
Observer
checks that output sequence ok
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
2
1
3
0
1
1
1
1
1
2
2
2
4
2
Sim length: 10
1, 2, 1, 2, 1, 1, 2, 2, 2, 1
2
Spec: x(t) < 5.
I.e. no state with x(t) >= 5 is reachable.
Spec does not fail on this run
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
1
3
2
2
0
1
5
1
1
2
4
2
Sim length: 6
1, 2, 1, 2, 1, 2
Spec: x(t) < 5.
I.e. no state with x(t) >= 5 is reachable.
Spec FAIL
Generation of targeted testing sequences can be costly (human resources + time). Methods improving ATPG (Automatic Test Pattern Generation) are needed.
Hand Made +
ATPG +
Random Walk +
…
Testing Sequences
Requirements
Testing is SLOW because to get a reasonable coverage we need to run many testing sequences. This is a problem when timetomarket is an issue (and for most product this IS an issue). Methods to speed up testing are needed.
System (Model)
Compute output by
Simulation or by running the actual
system when possible
Input sequences
Output sequences
y1(0) y1(1) y1(2) y1(3) …
… u1(3) u1(2) u1(1) u1(0)
………………
………………
… un(3) un(2) un(1) un(0)
yn(0) yn(1) yn(2) yn(3) …
The value of n can easily be in the order of 106. Note that for each input sequence an output sequence has to be generated and checked for conformity.
Testing without automation tends to discover errors towards the end of the design flow. Error fixing is very expensive at that point and may delay product release. Methods to discover errors as soon as possible are needed.
Source:
Mercury Interactive,
Siebel Siemens
Errors caught (percent)
Number of times more expensive to fix
Early development
Implementation
Presently more than 50% of the cost of the final product is testing, and this cost is growing up. Thus keeping low (no much more than 50%) the cost of testing is a key issue in a competitive market.
Testing can only cover a SMALL part of the set of reachable system states. This may lead to false negatives (… unforseeable circumstances).
Typically corner cases (i.e. states that have a low probability of being reached) are not visited during testing.
Thus errors that have a low probability of showing up are hard (impossible) to detect using testing.
Unfortunately such low probability errors may be costly to fix at later design stages and/or their consequences can be very costly (Pentium 2 bug: billions of dollars …).
Moreover today complex designs are full of such cases …
Speed up bug hunting(to decrease costs and timetomarket)
Improve coverage
(to increase quality … and to decrease the probability of losing markets)
HOW ???
Given:
a system Sys (e.g. hardware, software, hybrid, etc) and an specifications Spec for Sys (e.g. what Sys should or should not do)
Check automatically:
If system Sys satisfies the given specifications Spec.
This is equivalent to run ALL possible testing sequences!!
Sys: definition of system under consideration using your beloved language, e.g.: VHDL, Verilog, SDL, StateCharts, C, C++, Java, MathLab, Simulink, …
Spec: definition of what Sys should do and should not do using your beloved language (again) and/or your most favorite logic, e.g:
Temporal Logic (CTL, CTL*, …), First Order Logic, etc
The main goal of Formal Verification is to verify that a given system (hardware and/or software) meets its specifications. Thus formal verification is conceptually equivalent to testing with 100\% coverage.
Exhaustive testing is not feasible even for small systems.Thus verification methods rely on a suitable analysis of system definition and system specifications to produce their answers.
As a result formal verification, unlike testing, applies only to the system description. Testing also applies to the physical system (when it exists). Formal verification can be interactive or automatic.
Model Checking is an automatic method for formal verification of Finite State Systems. Note that many hardware and/or software systems can be modeled as finite state systems.
Sys
(VHDL, Verilog, C, C++
Java, MathLab, Simulink, …)
BAD
(CTL, CTL*, LTL, …)
Model Checker
(Equivalent to
Exhaustive testing)
PASS
FAIL
I.e. no sequence of events (states) can possibly lead to an undesired state.
What went
wrong …
Counterexample
I.e. sequence of events (states) leading to an undesired state.
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
1
3
2
2
0
5
1
2
2
2
4
Spec: x(t) < 5.
I.e. no state with x(t) >= 5 is reachable.
Spec FAIL
2
1
3
2
2
0
1
5
1
1
1
1
2
1
2
2
4
2
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
Spec: x(t) < 5.
I.e. no state with x(t) >= 5 is reachable.
Spec FAIL
Spec ok if u(t) = 0, 1.
x(t + 1) = case x(t) – 2 + u(t) when x(t) + y(t) > 4
x(t) – 1 + u(t) when x(t) + y(t) = 4
x(t) + u(t) when x(t) + y(t) = 3
x(t) + 1 + u(t) when x(t) + y(t) = 2
x(t) + 2 + u(t) when x(t) + y(t) < 2 esac
y(t + 1) = u(t)
u(t) = 1, 0, 1
x,y
1,1
2,1
3,1
1
0
0,0
2,0
3,0
4,0
1
3,1
4,1
5,1
Verification, from system model Sys AND specifications Spec produces a sequence of stimuli (events) j, if any, leading Sys to violate Spec.
+ faster than testing (good to improve timetomarket)
+ gives full coverage (good to improve quality)
+ early error detection (decreases costs)
 is computationally VERY expensive (because of state explosion)
0
1
x(t + 1) = f(x(t), u(t))
x’ = f(x, u)
x’ = if (u = 0) then (x + 1)mod3 else (x – 1)mod3; x(0) = 0; u = 0, 1
u = 1
u = 0
u = 0
x
1
0
2
u = 1
u = 1
u = 0
u
Transition Graph = Transition Relation
u, x
0, 0
0, 1
0, 2
1, 0
1, 1
1, 2
We show with a small running example (mutex) a typical “verification via model checking” layman session.
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
S2=n2 & S1=t1
c1
c2
n1, n2, 1
t1, n2, 1
c1, n2, 1
n1, t2, 1
t1, t2, 1
c1, t2, 1
n1, c2, 1
t1, c2, 1
c1, c2, 1
n1, n2, 2
t1, n2, 2
c1, n2, 2
n1, t2, 2
t1, t2, 2
c1, t2, 2
n1, c2, 2
t1, c2, 2
c1, c2, 2
Mutual exclusion: AG (S1 != c1  S2 != c2) … true
Negation of mutual exclusion: EF (S1 = c1 & S2 = c2) … false
No starvation S1: AG (S1 = t1 > AF (S1 = c1)) … true
No starvation S2: AG (S2 = t2 > AF (S2 = c2)) … true
State (t1, n2, *) reachable: AG (S1 != t1  S2 != n2) … false
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
S2=n2 & S1=t1
c1
c2
Mutual exclusion: AG (S1 != c1  S2 != c2) …
Negation of mutual exclusion: EF (S1 = c1 & S2 = c2) …
No starvation S1: AG (S1 = t1 > AF (S1 = c1)) …
No starvation S2: AG (S2 = t2 > AF (S2 = c2)) …
 AG (S1 != c1  S2 != c2) is false
as demonstrated by the followingexecution sequence
state 1.1:S1 = c1S2 = c2turn = 2
 EF (S1 = c1 & S2 = c2) is false
 AG (S1 = t1 > AF S1 = c1) is true
AG (S2 = t2 > AF S2 = c2) is true
resources used:
user time: 0.03 s,
system time: 0.04 s
BDD nodes allocated: 730
Bytes allocated: 1245184
BDD nodes representing transition relation: 31 + 6
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
S2=n2 & S1=t1
c1
c2
Mutual exclusion: AG (S1 != c1  S2 != c2) …
Negation of mutual exclusion: EF (S1 = c1 & S2 = c2) …
No starvation S1: AG (S1 = t1 > AF (S1 = c1)) …
No starvation S2: AG (S2 = t2 > AF (S2 = c2)) …
 specificationAG (S1 != c1  S2 != c2) is true
 specification EF (S1 = c1 & S2 = c2) is false
specificationAG (S1 = t1 > AF S1 = c1) is true
specificationAG (S2 = t2 > AF S2 = c2) is true
resources used:
user time: 0.02 s,
system time: 0.04 s
BDD nodes allocated: 635
Bytes allocated: 1245184
BDD nodes representing transition relation: 31 + 6
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
c1
c2
S2=n2 & S1=t1
Mutual exclusion: AG (S1 != c1  S2 != c2) …
Negation of mutual exclusion: EF (S1 = c1 & S2 = c2) …
No starvation S1: AG (S1 = t1 > AF (S1 = c1)) …
No starvation S2: AG (S2 = t2 > AF (S2 = c2)) …
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
c1
c2
S2=n2 & S1=t1
 AG (S1 != c1  S2 != c2) is true
 AG (S1 = t1 > AF S1 = c1) is false
 as demonstrated by the following execution sequence
state 2.1: S1 = c1 S2 = n2 turn = 2
state 2.2: S1 = n1 S2 = t2
 loop starts here –
state 2.3: S1 = t1 S2 = c2
state 2.4:
 AG (S2 = t2 > AF S2 = c2) is false
 as demonstrated by the followingexecution sequence
state 3.1: S1 = c1 S2 = n2 turn = 2
 loop starts here –
state 3.2: S2 = t2
state 3.3:
resources used:
user time: 0.03 s,
system time: 0.04 s
BDD nodes allocated: 799
Bytes allocated: 1245184
BDD nodes representing transition relation: 34 + 6
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
c1
c2
S2=n2 & S1=t1
FAIRNESS !(S1 = n1)
FAIRNESS !(S1 = t1)
FAIRNESS !(S1 = c1)
FAIRNESS !(S2 = n2)
FAIRNESS !(S2 = t2)
FAIRNESS !(S2 = c2)
SPEC AG((S1 != c1)  (S2 != c2))
SPEC EF((S1 = c1) & (S2 = c2))
SPEC AG((S1 = t1) > AF (S1 = c1))
SPEC AG((S2 = t2) > AF (S2 = c2))
S1=n1 & S2=t2
S1
S2
n1
t1
n2
t2
1
T
2
S2 = n2
S1 = n1
S1=t1 & T=2
S2=t2 & T=1
c1
c2
S2=n2 & S1=t1
 AG (state1 != c1  state2 != c2) is true
 EF (state1 = c1 & state2 = c2) is false
 AG (state1 = t1 > AF state1 = c1) is true
 AG (state2 = t2 > AF state2 = c2) is true
resources used:
user time: 0.03 s,
system time: 0.04 s
BDD nodes allocated: 615
Bytes allocated: 1245184
BDD nodes representing transition relation: 34 + 6
Spec
E.g. CTL, CTL*, LTL, …
Sys
(VHDL, Verilog, C, C++
Java, MathLab, Simulink, …)
Sys can be described by boolean functions:
initial states:
I(x) = 1 iff x is an initial state of Sys
transition relation:
N(x, x’) = 1 iff
there exists a transition from x to x’
From Spec we can define a function
F from I, N to boolean values {0, 1}
s.t.
F(I, N ) is identically 1
iff Spec is satisfied
Check if it holds that
F(I, N ) is identically 1
Representation of Sys (i.e. (I, N)) may be too big (easily gigabytes … of RAM).
Note: Sys transition graph can easily have more than 1020 nodes (state explosion).
Even if (I, N) is not too big we may run out of memory when checking if F(I, N) is identically 1.
The above obstructions cannot be eliminated. However there are algorithms that can actually mitigate them. Such algorithms are effective in many practical cases (… altough they are exponential with probability 1).
For safety properties (i.e. no bad state is reachable) the model checking problem becomes the reachability problem on the transition graph of the system to be analyzed.
Given a Finite State System S = (S, I, Next), where:
S : Finite set of states;
I : set of initial states;
Next : function mapping a state to the set of its successors;
Visit all states that S can reach from I … in order to check if there is bad reachable state (i.e. a state that violates our specs).
Explicit
Set Reach of visited states stored in a Hash Table.
Explicit approach typically works well for protocols, hybrid systems and softwarelike systems (i.e. asynchronous systems).
E.g.: SPIN (Bell Lab), Murphi (Stanford), COSPAN (Bell Lab)
Symbolic
Set Reach of visited states represented with its characteristic function f. That is f(s) = if (s is in Reach) then 1 else 0.
States are bit vectors, thus f is a Boolean function. Ordered Binary Decision Diagrams (OBDDs) are used to efficiently represent and manipulate f.
Symbolic approach typically works well for Hardwarelike systems
(i.e. synchronous systems).
E.g.: SMV (CMU), VIS (CU + Berkeley), CUDD (CU), FORTE (INTEL), SLAM (Microsoft), RuleBase (IBM).
With such functions we can define a State Space Exploration function.
E.g. we can use a BFS (Breadth first Search) or a DFS (Depth First Search).
Visited states
Visited states
to be expanded
s1
1. Get a new state s to expand from queue
System Transition Graph
Hash
Table
T
Queue Q
s2
s
2. Check inv for s
3. If s1 (s2, s3) is not already in H,
insert s1 (s2, s3) in H and Q.
s3
Hash_Table T;
Queue Q;
bfs()
{ for each startstate s
{insert(T, s); enqueue(Q, s)}
while (Q is not empty)
{ s = dequeue(Q);
check invariants for s;
for all s’ in Next(s)
if (s’ is not in T) /* fresh state */
{insert(T, s’); enqueue(Q, s’); }
}}
Successors of state s
2
1
3
2
2
0
1
5
1
1
1
1
2
1
2
2
4
2
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
x(t + 1) = if x(t) <= 3 then x(t) + u(t) else x(t) – u(t), u(t) = 1, 2. x(0) = 0
1
3
2
2
0
5
1
2
2
2
4
States may take hundreds of bytes. To save on RAM we can store in T just state signatures h(s). Usually a state signature takes 5 bytes or so.
It can be proved that the omission probability is very low.
011000111001010101010101100001111111001010101010101010101010101010101
Hash Compaction
001010101001000010101000
To save even more RAM we can forget some of the state signatures in hash table T.
Experimental results show that we can forget about 50% of the states in T and still get termination.
This work because protocol transitions are local.
Previously stored state forgot
Collision
Hash Table T
Hash Table T
000000011111111
01101010101010
000000011111111
Danger: we may revisit the same state forever and ever: no termination!!
Ktransition iff
level(s’) – level(s) = K
Transition klocal iff
level(s’) – level(s) <= k
4
1
2
1
0
1
0
1
1
1
0
1
1
1
0 1 2 3 4
Our experimental results show that:
For all protocol like systems, for most states, most transitions (typically more than 75%) are 1local.
Let d(s, k) be the fraction of transitions from state s that are ktransitions.
Thus d(s, k) is the probability of getting a ktransition when picking at random a transition from state s.
Consider the experiment of selecting at random a state s and then returning d(s, k). In this way we get a random variable that we denote with d(k).
The expected value of d(k) is the average value of d(s, k) on all reachable states.
s
Set of initial states represented with a boolean function I s.t.:
I(x) = 1 iff x is an initial state.
Transition graph represented with transition relation, i.e. a boolean function N s.t.:
N(x, x’) = 1 iff there is a transition from x to x’
Reachable states: least solution to the following (functional) fixpoint equation (unknown: R)
R(x) = I(x) E y [R(y) N(y, x)]
x’
x
Problem: how do we solve equation
R(x) = I(x) E y [R(y) N(y, x)]
Answer (classical):
R(0)(x) = 0
R(k+ 1)(x) = I(x) E y [R(k)(y) N(y, x)], k = 0, 1, 2…
Stop when R(k+ 1) = R(k) This eventually happens …
Obstructions:
Efficient manipulation of boolean functions
Efficient check of functional equality
x1
x1 x2
x1 x2
x2
1
1
1
0
0
0
1
1
2
1
0
1
1
0
OBDDs represent f(x1, … xn) in a canonical way once an ordering on the variable x1, … xn is given.
Equality test O(1).
If_then_else(F, G, H) computable in O(max(F, G, H)).
OBDDs often compact on boolean functions occurring in practice (isotropic boolean functions).
OBDDs representation of randomly selected nary boolena function has size exponential in n with probability 1.
To reduce complexity lets just check if a state reachable in k steps is an error state.
ERROR(xk)
R(x0, x1)
R(x1, x2)
I(x0)
R(xk  1, xk)
x0
x1
x2
xk
. . .
FAILk = E x0 x1 … xk
[I(x0) R(x0, x1) R(x1, x2) … R(xk  1, xk) ERROR(xk)]
This is SATisfiability, i.e. given a boolean expression F(z) find an assignment z to z s.t. F(z) = 1. ,
In our case F(x0, x1, … , xk ) =
[I(x0) R(x0, x1) R(x1, x2) … R(xk  1, xk) ERROR(xk)].
There are many cases in which digital devices (hardware and/or software) interact with the physical environment. In such cases the environment continuous dynamics must be taken into account.
Hybrid Systems are systems with discrete as well as continuous state variables. Typically requirements analysis for embedded software/hardware leads to study verification of hybrid systems.
We sketch how model checking can be used for hybrid systems analysis by showing its usage for automatic verification of the Turbogas Control System of a 2MW Cogenerative Power Plant (ICARO).
Disturbances: electric users, param. var, etc
Settings
Fuel Valve Opening
FG102
Controller
Gas Turbine
(Turbogas)
Vrot, Texh, Pel, Pmc
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
Vrot
N1Gov
MIN
Offset
Pel
PowLim
12MW
ADJ
Limiter
ExTLim
Winner
Texh
Valve FG102 Opening Command
Pmc
S
Cell
i = “Power Limiter”
A = 3000kW
B = 10Mw
P
Pel Setpoint (+2MW)
Output
PowLim
Pel
Winner
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
Accelleration
1/s
105%
Deceleration
Output
N1 Governor
network
6%
+
Pel
X

S
Cell
i = “N1 Governor”
A = 0
B = 10MW
isle
Kdr
P
Vrot
Winner
Texh
Cell
i = “Exhaust
Temperature Limiter”
A = 0
B = 10MW
P
Pmc
+
S
Offset
Winner
Output
Exhaust
Temperature
Limiter
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
B
A
A
B
Kp
Cell
Output
+

10MW
S
X
+
10MW
SAT

Ki
1/s
P
X
SAT
>0?
Reset at u + 4kW
u = min(
output N1Gov,
output PowLim,
output ExTLim)
AND
Winner
name
Winner != i?
Disturbances: el. users, par. var, etc.
Texh
Gas Turbine
FG102
Vrot
Pel
Vrot: Turbine Rotation speed
Texh: Exhaust smokes Temperature
Pel: Generated Electric Power
Pmc: Compressor Pressure
This is very similar to simulation code, only more abstract because of model checking limitations (state explosion).
Results on a INTEL Pentium 4, 2GHz Linux PC with 512 MB RAM.
Murphi options: b, c, cache, m350
10 ms time step (100 Hz sampling frequency)
Electric user demand (KW)
Rotation speed (percentage of max = 22500 rpm)
Allowed range for rotation speed:
40120
10 ms time step (100 Hz sampling frequency)
Electric user demand (KW)
Rotation speed (percentage of max = 22500 rpm)
Allowed range for rotation speed:
40120
Sometimes we can associate a probability with each transition. In such cases reachability analysis becomes the task of computing the stationary distribution of a Markov Chain. This can be done using a Probabilistic Model Checker (state space too big for matrices).
0.4
1
0.3
0
0.7
0.2
0.8
2
0.6
Let u(t) be the user demand at time t. We can define the (stochastic) dynamics of the user demand as follows:
min(u(t) + a, M) with probability p(u(t), 1)
u(t + 1) = u(t) with probability p(u(t), 0)
max(u(t)  a, 0) with probability p(u(t), 1)
Where:
M = max user demand (MAX_U),
a = speed of variation of user demand (MAX_D_U)
0.4 + b*(v – M)*v – M /M2 when i = 1
p(v, i) = 0.2 when i = 0
0.4 + b*(M  v)*M  v /M2 when i = 1
0.4 <= b <= 0.4
The further u(t) from u0 (nominal user demand) the higher u(t) probability to return towards u0. That is to decrease when u(t) > u0, to increase when u(t) < u0.
Nonwithstanding state explosion, Automatic Verification (reachability analysis) is a very useful tool for design and analysis of complex systems such as: digital hardware, software and hybrid systems.
Automatic Verification allows us to:
Decrease the probability of leaving undetected bugs in our design, thus increasing design quality.
Speed up the testing/simulation process, thus decreasing costs and timetomarket.
Early error detection, thus decreasing design costs.
Support exploration of more complex, hopefully more efficient, solutions by supporting their debugging.
Future work: fight state explosion to make possible verification of larger systems.
Directions: automatic model abstraction, better algorithms and data structures for SAT, statistical properties of transition graphs, …
Extend usage: … to anyone needing reachability analysis of beloved system.