1 / 23

The Benefit of Concurrent Model Checking

The Benefit of Concurrent Model Checking. BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton BVSRC UC Berkeley Thanks to: NSF, SRC, NSA, and Industrial Sponsors,

arnoldclark
Download Presentation

The Benefit of Concurrent Model Checking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Benefit of Concurrent Model Checking BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton BVSRC UC Berkeley Thanks to: NSF, SRC, NSA, and Industrial Sponsors, IBM, Intel, Synopsys, Mentor, Magma, Altera, Atrenta, Microsemi, Jasper, Oasys, Real Intent, Tabula, Verific

  2. Overview • Overview • Model checking engines • Example • Non-concurrent • Hybrid approach • Concurrent verify and refine. • Flow • Example • Why more powerful • Questions and objections addressed • Future work

  3. Concurrent Model Checking Overview: • Employ multiple MC engines using hybrid concurrency on a multi-core server • Benefits • Faster • almost linear speedup • plus does not waste time making a wrong decision. • More powerful • can solve harder problems • Makes sequential approach obsolete • No reason not to use concurrency • even for 1 core • simpler • Concurrency controlled by Python front end.

  4. Model Checking Engines • Random simulation • Semi-formal simulation • Bounded model checking (BMC) [15] • BDD-based reachability [7][25] • Property directed reachability (PDR) [4] • Interpolation [14] • Synthesis: • rewriting [10] • retiming [13] • sequential signal correspondence [26] • with constraint extraction • phase abstraction [27] • temporal decomposition [23] • Abstraction: [8] • counterexample-based (CB) [19] • proof-based (PB) [20][21] • Speculation [2][3] • Verification engines • 1-3 incomplete • 4-6 complete • Transformation engines • 7 equivalence preserving • 8-9 abstracting

  5. Example of non-concurrent MC Read_filetest_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2 PIs = 532, POs = 1, FF = 2389, ANDs = 12049 prove quick_verify (try many engines to see if one can prove) Simplifying Number of constraints = 3 Forward retiming, quick_simp, scorr_constr, trm: PIs = 532, POs = 1, FF = 2342, ANDs = 11054 Simplify: PIs = 532, POs = 1, FF = 2335, ANDs = 10607 Phase abstraction: PIs = 283, POs = 2, FF = 1460, ANDs = 8911 quick_verify (try many engines to see if one can prove) Abstracting Initial abstraction: PIs = 1624, POs = 2, FF = 119, ANDs = 1716, max depth = 39 Testing with BMC bmc3 -C 100000 -T 50 -F 78: No CEX found in 51 frames Latches reduced from 1460 to 119 Simplify: PIs = 1624, POs = 2, FF = 119, ANDs = 1687, max depth = 51 Trimming: PIs = 158, POs = 2, FF = 119, ANDs = 734, max depth = 51 Simplify: PIs = 158, POs = 2, FF = 119, ANDs = 731, max depth = 51 quick_verify (try many engines to see if one can prove) Speculating Initial speculation: PIs = 158, POs = 26, FF = 119, ANDs = 578, max depth = 51 Fast interpolation: reduced POs to 24 Testing with BMC bmc3 -C 150000 -T 75: No CEX found in 1999 frames PIs = 158, POs = 24, FF = 119, ANDs = 578, max depth = 1999 Simplify: PIs = 158, POs = 24, FF = 119, ANDs = 535, max depth = 1999 Trimming: PIs = 86, POs = 24, FF = 119, ANDs = 513, max depth = 1999 Verifying (try many engines to see if one can prove) Running reach -v -B 1000000 -F 10000 -T 75: BDD reachability aborted RUNNING interpolation with 20000 conflicts, 50 sec, max 100 frames: 'UNSAT‘ Elapsed time: 457.87 seconds, total: 458.52 seconds

  6. test_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2.aigtest_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2.aig PIs=532,POs=1,FF=2389,ANDs=12049 ***Executing super_prove ['INTRP', 'BMC', 'pre_simp'] For_Retime: PIs=532,POs=1,FF=2365,ANDs=11064 Number of constraints = 2, frames = 1 PIs=529,POs=1,FF=2342,ANDs=10611 Simplify: PIs=529,POs=1,FF=2265,ANDs=10068 ***Trying temporal decomposition - for max 15.0 sec. No reduction ***Trying phase abstraction - Max phase = 2 [1, 2] Reparam: PIs 1056 => 264 Simplify with 2 phases: PIs=264,POs=2,FF=1462,ANDs=8319 Method pre_simp ended first in 89 sec. PIs=264,POs=2,FF=1462,ANDs=8319 ***Running abstract ['INTRP', 'BMC3', 'initial_abstract'] Method initial_abstract ended first in 106 sec. Initial abstraction: PIs=1621,POs=2,FF=105,ANDs=1427,max depth=42 ***Iterating abstraction refinement PIs=1621,POs=2,FF=105,ANDs=1427,max depth=42 Latches reduced from 1462 to 105 ***Running pre_simp Reparam: PIs 330 => 328 PIs=328,POs=2,FF=105,ANDs=1184,max depth=42 Min_Retime: PIs=328,POs=2,FF=98,ANDs=1164,max depth=42 Reparam: PIs 328 => 299 Simplify: PIs=299,POs=2,FF=98,ANDs=1064,max depth=42 Reparam: PIs 299 => 266 Trying temporal decomposition - for max 15.0 sec. No reduction Reparam: PIs 266 => 261 ***Running speculate ['INTRP', 'BMC3', 'initial_speculate'] Method initial_speculate ended first in 38 sec. Initial speculation: PIs=261,POs=38,FF=96,ANDs=833,max depth=42 ***Iterating speculation refinement BMC3: -- cex in 0.17 sec. at depth 22 => PIs=261,POs=37,FF=96,ANDs=830,max depth=42 INTRP: UNSAT in 1.4 sec. Total clock time taken by super_prove = 366.549089 sec. Same example of with concurrent MCwithout PDR

  7. Same example of with concurrent MC but with PDR • test_lru_consist_miss_slbc.sixth_sense_style_1sif_prop2_fixed2 • PIs=532,POs=1,FF=2389,ANDs=12049 • ***Executing super_prove • ['PDR', 'INTRP', 'BMC', 'PDRm', 'pre_simp'] • PIs=532,POs=1,FF=2389,ANDs=12049 • For_Retime: PIs=532,POs=1,FF=2365,ANDs=11064 • Number of constraints = 2, frames = 1 • Reparam: PIs 532 => 529 • PIs=529,POs=1,FF=2342,ANDs=10611 • Simplify: PIs=529,POs=1,FF=2265,ANDs=10068 • PDRm proved UNSAT in 42 sec. • Total clock time taken by super_prove = 42.384159 sec.

  8. SIM SIM || || PDR PDR || || PDRm PDRm || || BMC BMC || || BMCm BMCm || || INTRP INTRP || || REACHx REACHx || || REACHm REACHm SAT, UNSAT, TIMEOUT CEX SAT, UNSAT, TIMEOUT HybridApproach c_verify REACH and REACHm optional depending on size (#PIs, #FFs) c_refine refine

  9. c_verify1 || simplify c_verify2 || c_abstract c_verify3 || c_speculate ||k (c_prove outputk) c_prove

  10. || c_verify pre_simp || || c_verify c_verify initial_speculate initial_abstract Concurrent Prover Flow - hybrid c_prove Start UNSAT SAT UNSAT SAT undecided backup kill SAT UNSAT SAT undecided pause UNSAT SAT c_refine CEX UNSAT SAT undecided pause c_refine UNSAT CEX SAT undecided || means runs concurrently SAT || (c_prove outputk) End with a definitive answer

  11. Multiple output variation on c_refine If there are more than X outputs • group outputs and use poor man’s concurrency (PMC) • repeatedly take a group of X outputs at a time • start with time-out of 2 sec. • after all output groups done, double time-out and repeat • if cex found • refine and start at last time-out value and • last group of X where cex was found.

  12. Example of Concurrent Flow l2snfsm_prop11_fixed2 PIs=38,POs=1,FF=372,ANDs=2150 Executing super_prove Initial: PIs=38,POs=1,FF=372,ANDs=2150 Running Simplification ['PDR', 'INTRP', 'BMC', 'PDRm', 'pre_simp'] these run in parallel PIs=38,POs=1,FF=371,ANDs=2150 Fwd_Retime: PIs=38,POs=1,FF=349,ANDs=2056 No constraints found Simplify: PIs=38,POs=1,FF=336,ANDs=1951 Trying temporal decomposition - for max 15.0 sec. No reduction Method pre_simp ended first in 9 sec. PIs=38,POs=1,FF=336,ANDs=1951

  13. ***Running abstract • Start: PIs=38,POs=1,FF=336,ANDs=1951 • ['PDR', 'INTRP', 'BMC3', 'PDRm', 'initial_abstract'] • Running initial_abstract with bob=10,stable=6,time=100,depth=20 • Method initial_abstract ended first in 103 sec. • PIs=38,POs=1,FF=336,ANDs=1951,max depth=11 • Initial abstraction: PIs=116,POs=1,FF=258,ANDs=1576,max depth=11 • Iterating abstraction refinement • Verify time set to 125 • PIs=116,POs=1,FF=258,ANDs=1576,max depth=11 • Reparam: PIs 116 => 59 changes inputs to be smaller number • …. many iterations here • SIM: -- cex in 41.48 sec. at depth 104 => cex_po = 0 • PIs=45,POs=1,FF=329,ANDs=1925,max depth=11 • Reparam: PIs 45 => 39 • Latches reduced from 336 to 329 • simplify • PIs=39,POs=1,FF=329,ANDs=1924,max depth=11 • Min_Retime: PIs=39,POs=1,FF=329,ANDs=1914,max depth=11 • No constraints found • Simplify: PIs=39,POs=1,FF=328,ANDs=1900,max depth=11 • Trying temporal decomposition - for max 15.0 sec. No reduction

  14. ***Running speculate ['PDR', 'INTRP', 'BMC3', 'PDRm', 'initial_speculate'] Method initial_speculate ended first in 39 sec. Initial speculation: PIs=39,POs=241,FF=178,ANDs=1335,max depth=11 Iterating speculation refinement PDRM: -- cex in 5.64 sec. at depth 40 => PIs=39,POs=239,FF=178,ANDs=1332,max depth=11 BMC3: -- cex in 1.84 sec. at depth 22 => PIs=39,POs=235,FF=178,ANDs=1326,max depth=22 … many iterations here BMC3: -- cex in 11.91 sec. at depth 25 => PIs=39,POs=204,FF=191,ANDs=1350,max depth=25 BMC3: -- cex in 17.77 sec. at depth 25 => PIs=39,POs=203,FF=195,ANDs=1381,max depth=25 BMC: -- cex in 29.44 sec. at depth 25 => PIs=39,POs=204,FF=195,ANDs=1390,max depth=25 BMC: -- cex in 37.03 sec. at depth 26 => PIs=39,POs=203,FF=195,ANDs=1389,max depth=25 Find_cex_par turned on poor man’s concurrency turned on here Verify time set to 148 Number of POs: 203 => 69 t_poor = 2 *** PDRM: UNSAT in 0.08 sec. PDRM: UNSAT in 0.07 sec. … many iterations here PDR: UNSAT in 0.25 sec. PDRM: UNSAT in 0.02 sec. all outputs processed => 69 outputs proved Number of POs reduced to 0 Total clock time taken by super_prove = 483.238051 sec. Out[7]: 'UNSAT'

  15. Why is concurrent more powerful? Example of Iteratingspeculationrefinement verify time set to 50 Initial size: PIs=171,POs=41,FF=255, ANDs=2275 SIMULATION: cex 4.268283 sec, frame 911 SIMULATION: cex 0.096659 sec, frame 17 BMC: cex 6.534474 sec, frame 17 SIMULATION: cex 0.726484 sec, frame 1363 SIMULATION: cex 5.740357 sec, frame 391 BMC: cex 9.506526 sec, frame 17 SIMULATION: cex 6.436064 sec, frame 984 SIMULATION: cex 1.212145 sec, frame 444 PDRM: cex 4.335237 sec, frame 18 BMC: cex 9.853237 sec, frame 17 SIMULATION: cex 6.335866 sec, frame 81 SIMULATION: cex 4.595637 sec, frame 22 SIMULATION: cex 4.594522 sec, frame 40 SIMULATION: cex 9.182059 sec, frame 58 PDRM: cex 5.637425 sec, frame 20 BMC: cex 9.861210 sec, frame 17 .... 33 interleavings of PDR PDRM and BMC .... PDR: cex 47.217215 sec, frame 29 PDR: cex 31.134045 sec, frame 76 BMC: cex 55.010524 sec, frame 23 PDRM: UNSAT in 66 sec. Final size: PIs=171, POs=17, FF=260, ANDs=2346

  16. Why is concurrent more powerful? refine refine refine refine refine refine refine refine refine refine cex cex cex cex cex cex cex cex cex cex Final abstraction/ speculation Initial abstraction/ speculation

  17. Hard examples - Industrial **At the time, the IBM SixSense program did not have a PDR engine, so we eliminated those problems that were made easier because of PDR in our code. A subset of the IBM benchmarks, not solved by SixthSense using its default Expert System flow in two hours **

  18. Multiple output variation on c_refine How long does it take? • Let O = # POs, E = #MC engines used concurrently, C = # cores, T = final time-out, X = #outputs grouped together • Final sweep (with no cex’s and assuming no memory conflicts) • with using full concurrency – time = T*(O*E)/C • with grouping and full concurrency – time = T*(O/X)*(X*E)/C = T*(O*E)/C • with grouping and PMC – time = T*2* (O/X)*(X*E)/C = 2*T(O*E)/C • Why not do full concurrency and no grouping? • Grouping done to lessen memory conflicts. • at most X*E processes are concurrent on server • choose X so that little memory conflict (why not choose X = C/E?) • PMC done to find cex early when doing grouping. • easy cex’s across all outputs are found early • When cex’s found (some heuristics) • refine and start PMC at last time-out value (instead of 2 sec.) • heuristic that expects next cex will take at least that time to find • first try the last set of X where cex was found. • heuristic that expects that last group where cex was found is most likely to yield the next cex. Number of concurrent engines running per coren

  19. Questions addressed • Memory Use and Conflicts? • experiments run on 2 processor 4-core each, 24 Gb, 64K L1, 256K L2, 4 Mb server • grouping designed to alleviate severe memory conflicts. • did not observe slowdown due to memory conflicts, but more experiments need to be done • Run-time speedup? • linear up to # cores • concurrency alleviates wasting time due to wrong decisions • solving problems not solved by sequential flow • Wasting processor power – trying many things but throw away all but one? • wastage if some cores sitting idle • alternative is to run wrong engine for a longer time • Use SOTA algorithm? • too many MC algorithms • expert system proposed which learns which algorithms are best for a given design project (Z. Nevo - IBM)

  20. Future Work • More and better engines • Improved BDD reachability engine (we hope) • We have 4 • We had a quite weak (HWMCC’08) in ’08 • Now have two reasonably good ones. • May have a much better one in a few months. • Improved circuit-based SAT solver • Currently used in signal correspondence to simplify larger circuits • Faster but sometimes limited quality • Will be improved to see if it can compete with MiniSat 1.14c • New specialized techniques for SEC • More use of concurrency • e.g. exchange information between engines. • will not work on parallelizing individual engines

  21. To Learn More Recent papers http://www.eecs.berkeley.edu/~alanmi/publications • IWLS • N. Een, A. Mishchenko, and R. Brayton, “Efficient implementation of property directed reachability". IWLS'11. • B. Sterin, N. Een, A. Mishchenko and R. Brayton, “The Benefit of Concurrency in Model Checking”, IWLS’11. • S. Ray and R. Brayton, “Proving Stabilization Using Liveness-to-Safety Conversion”, IWLS’11 • Other • R. Brayton and A. Mishchenko, "ABC: An academic industrial-strength verification tool", Proc. CAV'10, LNCS 6174, pp. 24-40. • N. Een, A. Mishchenko, and N. Amla, "A single-instance incremental SAT formulation of proof- and counterexample-based abstraction". Proc. FMCAD’10. • H. Savoj, D. Berthelot, A. Mishchenko, and R. Brayton, “Combinational techniques for sequential equivalence checking". Proc. FMCAD’10, pp. 158-162. • Send email • alanmi@eecs.berkeley.edu • brayton@eecs.berkeley.edu • een@eecs.berkeley.edu • Visit BVSRC webpage www.bvsrc.org

  22. end

More Related