230 likes | 238 Views
The Benefit of Concurrent Model Checking. BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton BVSRC UC Berkeley Thanks to: NSF, SRC, NSA, and Industrial Sponsors,
E N D
The Benefit of Concurrent Model Checking BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton BVSRC UC Berkeley Thanks to: NSF, SRC, NSA, and Industrial Sponsors, IBM, Intel, Synopsys, Mentor, Magma, Altera, Atrenta, Microsemi, Jasper, Oasys, Real Intent, Tabula, Verific
Overview • Overview • Model checking engines • Example • Non-concurrent • Hybrid approach • Concurrent verify and refine. • Flow • Example • Why more powerful • Questions and objections addressed • Future work
Concurrent Model Checking Overview: • Employ multiple MC engines using hybrid concurrency on a multi-core server • Benefits • Faster • almost linear speedup • plus does not waste time making a wrong decision. • More powerful • can solve harder problems • Makes sequential approach obsolete • No reason not to use concurrency • even for 1 core • simpler • Concurrency controlled by Python front end.
Model Checking Engines • Random simulation • Semi-formal simulation • Bounded model checking (BMC) [15] • BDD-based reachability [7][25] • Property directed reachability (PDR) [4] • Interpolation [14] • Synthesis: • rewriting [10] • retiming [13] • sequential signal correspondence [26] • with constraint extraction • phase abstraction [27] • temporal decomposition [23] • Abstraction: [8] • counterexample-based (CB) [19] • proof-based (PB) [20][21] • Speculation [2][3] • Verification engines • 1-3 incomplete • 4-6 complete • Transformation engines • 7 equivalence preserving • 8-9 abstracting
Example of non-concurrent MC Read_filetest_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2 PIs = 532, POs = 1, FF = 2389, ANDs = 12049 prove quick_verify (try many engines to see if one can prove) Simplifying Number of constraints = 3 Forward retiming, quick_simp, scorr_constr, trm: PIs = 532, POs = 1, FF = 2342, ANDs = 11054 Simplify: PIs = 532, POs = 1, FF = 2335, ANDs = 10607 Phase abstraction: PIs = 283, POs = 2, FF = 1460, ANDs = 8911 quick_verify (try many engines to see if one can prove) Abstracting Initial abstraction: PIs = 1624, POs = 2, FF = 119, ANDs = 1716, max depth = 39 Testing with BMC bmc3 -C 100000 -T 50 -F 78: No CEX found in 51 frames Latches reduced from 1460 to 119 Simplify: PIs = 1624, POs = 2, FF = 119, ANDs = 1687, max depth = 51 Trimming: PIs = 158, POs = 2, FF = 119, ANDs = 734, max depth = 51 Simplify: PIs = 158, POs = 2, FF = 119, ANDs = 731, max depth = 51 quick_verify (try many engines to see if one can prove) Speculating Initial speculation: PIs = 158, POs = 26, FF = 119, ANDs = 578, max depth = 51 Fast interpolation: reduced POs to 24 Testing with BMC bmc3 -C 150000 -T 75: No CEX found in 1999 frames PIs = 158, POs = 24, FF = 119, ANDs = 578, max depth = 1999 Simplify: PIs = 158, POs = 24, FF = 119, ANDs = 535, max depth = 1999 Trimming: PIs = 86, POs = 24, FF = 119, ANDs = 513, max depth = 1999 Verifying (try many engines to see if one can prove) Running reach -v -B 1000000 -F 10000 -T 75: BDD reachability aborted RUNNING interpolation with 20000 conflicts, 50 sec, max 100 frames: 'UNSAT‘ Elapsed time: 457.87 seconds, total: 458.52 seconds
test_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2.aigtest_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2.aig PIs=532,POs=1,FF=2389,ANDs=12049 ***Executing super_prove ['INTRP', 'BMC', 'pre_simp'] For_Retime: PIs=532,POs=1,FF=2365,ANDs=11064 Number of constraints = 2, frames = 1 PIs=529,POs=1,FF=2342,ANDs=10611 Simplify: PIs=529,POs=1,FF=2265,ANDs=10068 ***Trying temporal decomposition - for max 15.0 sec. No reduction ***Trying phase abstraction - Max phase = 2 [1, 2] Reparam: PIs 1056 => 264 Simplify with 2 phases: PIs=264,POs=2,FF=1462,ANDs=8319 Method pre_simp ended first in 89 sec. PIs=264,POs=2,FF=1462,ANDs=8319 ***Running abstract ['INTRP', 'BMC3', 'initial_abstract'] Method initial_abstract ended first in 106 sec. Initial abstraction: PIs=1621,POs=2,FF=105,ANDs=1427,max depth=42 ***Iterating abstraction refinement PIs=1621,POs=2,FF=105,ANDs=1427,max depth=42 Latches reduced from 1462 to 105 ***Running pre_simp Reparam: PIs 330 => 328 PIs=328,POs=2,FF=105,ANDs=1184,max depth=42 Min_Retime: PIs=328,POs=2,FF=98,ANDs=1164,max depth=42 Reparam: PIs 328 => 299 Simplify: PIs=299,POs=2,FF=98,ANDs=1064,max depth=42 Reparam: PIs 299 => 266 Trying temporal decomposition - for max 15.0 sec. No reduction Reparam: PIs 266 => 261 ***Running speculate ['INTRP', 'BMC3', 'initial_speculate'] Method initial_speculate ended first in 38 sec. Initial speculation: PIs=261,POs=38,FF=96,ANDs=833,max depth=42 ***Iterating speculation refinement BMC3: -- cex in 0.17 sec. at depth 22 => PIs=261,POs=37,FF=96,ANDs=830,max depth=42 INTRP: UNSAT in 1.4 sec. Total clock time taken by super_prove = 366.549089 sec. Same example of with concurrent MCwithout PDR
Same example of with concurrent MC but with PDR • test_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2 • PIs=532,POs=1,FF=2389,ANDs=12049 • ***Executing super_prove • ['PDR', 'INTRP', 'BMC', 'PDRm', 'pre_simp'] • PIs=532,POs=1,FF=2389,ANDs=12049 • For_Retime: PIs=532,POs=1,FF=2365,ANDs=11064 • Number of constraints = 2, frames = 1 • Reparam: PIs 532 => 529 • PIs=529,POs=1,FF=2342,ANDs=10611 • Simplify: PIs=529,POs=1,FF=2265,ANDs=10068 • PDRm proved UNSAT in 42 sec. • Total clock time taken by super_prove = 42.384159 sec.
SIM SIM || || PDR PDR || || PDRm PDRm || || BMC BMC || || BMCm BMCm || || INTRP INTRP || || REACHx REACHx || || REACHm REACHm SAT, UNSAT, TIMEOUT CEX SAT, UNSAT, TIMEOUT HybridApproach c_verify REACH and REACHm optional depending on size (#PIs, #FFs) c_refine refine
c_verify1 || simplify c_verify2 || c_abstract c_verify3 || c_speculate ||k (c_prove outputk) c_prove
|| c_verify pre_simp || || c_verify c_verify initial_speculate initial_abstract Concurrent Prover Flow - hybrid c_prove Start UNSAT SAT UNSAT SAT undecided backup kill SAT UNSAT SAT undecided pause UNSAT SAT c_refine CEX UNSAT SAT undecided pause c_refine UNSAT CEX SAT undecided || means runs concurrently SAT || (c_prove outputk) End with a definitive answer
Multiple output variation on c_refine If there are more than X outputs • group outputs and use poor man’s concurrency (PMC) • repeatedly take a group of X outputs at a time • start with time-out of 2 sec. • after all output groups done, double time-out and repeat • if cex found • refine and start at last time-out value and • last group of X where cex was found.
Example of Concurrent Flow l2snfsm_prop11_fixed2 PIs=38,POs=1,FF=372,ANDs=2150 Executing super_prove Initial: PIs=38,POs=1,FF=372,ANDs=2150 Running Simplification ['PDR', 'INTRP', 'BMC', 'PDRm', 'pre_simp'] these run in parallel PIs=38,POs=1,FF=371,ANDs=2150 Fwd_Retime: PIs=38,POs=1,FF=349,ANDs=2056 No constraints found Simplify: PIs=38,POs=1,FF=336,ANDs=1951 Trying temporal decomposition - for max 15.0 sec. No reduction Method pre_simp ended first in 9 sec. PIs=38,POs=1,FF=336,ANDs=1951
***Running abstract • Start: PIs=38,POs=1,FF=336,ANDs=1951 • ['PDR', 'INTRP', 'BMC3', 'PDRm', 'initial_abstract'] • Running initial_abstract with bob=10,stable=6,time=100,depth=20 • Method initial_abstract ended first in 103 sec. • PIs=38,POs=1,FF=336,ANDs=1951,max depth=11 • Initial abstraction: PIs=116,POs=1,FF=258,ANDs=1576,max depth=11 • Iterating abstraction refinement • Verify time set to 125 • PIs=116,POs=1,FF=258,ANDs=1576,max depth=11 • Reparam: PIs 116 => 59 changes inputs to be smaller number • …. many iterations here • SIM: -- cex in 41.48 sec. at depth 104 => cex_po = 0 • PIs=45,POs=1,FF=329,ANDs=1925,max depth=11 • Reparam: PIs 45 => 39 • Latches reduced from 336 to 329 • simplify • PIs=39,POs=1,FF=329,ANDs=1924,max depth=11 • Min_Retime: PIs=39,POs=1,FF=329,ANDs=1914,max depth=11 • No constraints found • Simplify: PIs=39,POs=1,FF=328,ANDs=1900,max depth=11 • Trying temporal decomposition - for max 15.0 sec. No reduction
***Running speculate ['PDR', 'INTRP', 'BMC3', 'PDRm', 'initial_speculate'] Method initial_speculate ended first in 39 sec. Initial speculation: PIs=39,POs=241,FF=178,ANDs=1335,max depth=11 Iterating speculation refinement PDRM: -- cex in 5.64 sec. at depth 40 => PIs=39,POs=239,FF=178,ANDs=1332,max depth=11 BMC3: -- cex in 1.84 sec. at depth 22 => PIs=39,POs=235,FF=178,ANDs=1326,max depth=22 … many iterations here BMC3: -- cex in 11.91 sec. at depth 25 => PIs=39,POs=204,FF=191,ANDs=1350,max depth=25 BMC3: -- cex in 17.77 sec. at depth 25 => PIs=39,POs=203,FF=195,ANDs=1381,max depth=25 BMC: -- cex in 29.44 sec. at depth 25 => PIs=39,POs=204,FF=195,ANDs=1390,max depth=25 BMC: -- cex in 37.03 sec. at depth 26 => PIs=39,POs=203,FF=195,ANDs=1389,max depth=25 Find_cex_par turned on poor man’s concurrency turned on here Verify time set to 148 Number of POs: 203 => 69 t_poor = 2 *** PDRM: UNSAT in 0.08 sec. PDRM: UNSAT in 0.07 sec. … many iterations here PDR: UNSAT in 0.25 sec. PDRM: UNSAT in 0.02 sec. all outputs processed => 69 outputs proved Number of POs reduced to 0 Total clock time taken by super_prove = 483.238051 sec. Out[7]: 'UNSAT'
Why is concurrent more powerful? Example of Iteratingspeculationrefinement verify time set to 50 Initial size: PIs=171,POs=41,FF=255, ANDs=2275 SIMULATION: cex 4.268283 sec, frame 911 SIMULATION: cex 0.096659 sec, frame 17 BMC: cex 6.534474 sec, frame 17 SIMULATION: cex 0.726484 sec, frame 1363 SIMULATION: cex 5.740357 sec, frame 391 BMC: cex 9.506526 sec, frame 17 SIMULATION: cex 6.436064 sec, frame 984 SIMULATION: cex 1.212145 sec, frame 444 PDRM: cex 4.335237 sec, frame 18 BMC: cex 9.853237 sec, frame 17 SIMULATION: cex 6.335866 sec, frame 81 SIMULATION: cex 4.595637 sec, frame 22 SIMULATION: cex 4.594522 sec, frame 40 SIMULATION: cex 9.182059 sec, frame 58 PDRM: cex 5.637425 sec, frame 20 BMC: cex 9.861210 sec, frame 17 .... 33 interleavings of PDR PDRM and BMC .... PDR: cex 47.217215 sec, frame 29 PDR: cex 31.134045 sec, frame 76 BMC: cex 55.010524 sec, frame 23 PDRM: UNSAT in 66 sec. Final size: PIs=171, POs=17, FF=260, ANDs=2346
Why is concurrent more powerful? refine refine refine refine refine refine refine refine refine refine cex cex cex cex cex cex cex cex cex cex Final abstraction/ speculation Initial abstraction/ speculation
Hard examples - Industrial **At the time, the IBM SixSense program did not have a PDR engine, so we eliminated those problems that were made easier because of PDR in our code. A subset of the IBM benchmarks, not solved by SixthSense using its default Expert System flow in two hours **
Multiple output variation on c_refine How long does it take? • Let O = # POs, E = #MC engines used concurrently, C = # cores, T = final time-out, X = #outputs grouped together • Final sweep (with no cex’s and assuming no memory conflicts) • with using full concurrency – time = T*(O*E)/C • with grouping and full concurrency – time = T*(O/X)*(X*E)/C = T*(O*E)/C • with grouping and PMC – time = T*2* (O/X)*(X*E)/C = 2*T(O*E)/C • Why not do full concurrency and no grouping? • Grouping done to lessen memory conflicts. • at most X*E processes are concurrent on server • choose X so that little memory conflict (why not choose X = C/E?) • PMC done to find cex early when doing grouping. • easy cex’s across all outputs are found early • When cex’s found (some heuristics) • refine and start PMC at last time-out value (instead of 2 sec.) • heuristic that expects next cex will take at least that time to find • first try the last set of X where cex was found. • heuristic that expects that last group where cex was found is most likely to yield the next cex. Number of concurrent engines running per coren
Questions addressed • Memory Use and Conflicts? • experiments run on 2 processor 4-core each, 24 Gb, 64K L1, 256K L2, 4 Mb server • grouping designed to alleviate severe memory conflicts. • did not observe slowdown due to memory conflicts, but more experiments need to be done • Run-time speedup? • linear up to # cores • concurrency alleviates wasting time due to wrong decisions • solving problems not solved by sequential flow • Wasting processor power – trying many things but throw away all but one? • wastage if some cores sitting idle • alternative is to run wrong engine for a longer time • Use SOTA algorithm? • too many MC algorithms • expert system proposed which learns which algorithms are best for a given design project (Z. Nevo - IBM)
Future Work • More and better engines • Improved BDD reachability engine (we hope) • We have 4 • We had a quite weak (HWMCC’08) in ’08 • Now have two reasonably good ones. • May have a much better one in a few months. • Improved circuit-based SAT solver • Currently used in signal correspondence to simplify larger circuits • Faster but sometimes limited quality • Will be improved to see if it can compete with MiniSat 1.14c • New specialized techniques for SEC • More use of concurrency • e.g. exchange information between engines. • will not work on parallelizing individual engines
To Learn More Recent papers http://www.eecs.berkeley.edu/~alanmi/publications • IWLS • N. Een, A. Mishchenko, and R. Brayton, “Efficient implementation of property directed reachability". IWLS'11. • B. Sterin, N. Een, A. Mishchenko and R. Brayton, “The Benefit of Concurrency in Model Checking”, IWLS’11. • S. Ray and R. Brayton, “Proving Stabilization Using Liveness-to-Safety Conversion”, IWLS’11 • Other • R. Brayton and A. Mishchenko, "ABC: An academic industrial-strength verification tool", Proc. CAV'10, LNCS 6174, pp. 24-40. • N. Een, A. Mishchenko, and N. Amla, "A single-instance incremental SAT formulation of proof- and counterexample-based abstraction". Proc. FMCAD’10. • H. Savoj, D. Berthelot, A. Mishchenko, and R. Brayton, “Combinational techniques for sequential equivalence checking". Proc. FMCAD’10, pp. 158-162. • Send email • alanmi@eecs.berkeley.edu • brayton@eecs.berkeley.edu • een@eecs.berkeley.edu • Visit BVSRC webpage www.bvsrc.org