1 / 55

Elasticity and petri nets

Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro. Elasticity and petri nets. Moore’s law. Source: Intel Corp. Is the GHz race over ?. Many-Core is here. Source: Intel Corp. Why this tutorial ?.

ocean
Download Presentation

Elasticity and petri nets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro Elasticity and petri nets

  2. Moore’s law Source: Intel Corp.

  3. Is the GHz race over ?

  4. Many-Core is here Source: Intel Corp.

  5. Why this tutorial ? • Digital circuits are complex concurrent systems • Variability and power consumption are key critical aspects in deep submicron technologies • Multi (many)-core systems will become a novel paradigm: • System design • Applications • Concurrent programming • Theory of concurrency may play a relevant role in this new scenario

  6. Elasticity • Tolerance to delay variability • Different forms of elasticity • Asynchronous: no clock • Synchronous: variability synchronized with a clock • In all forms of elasticity, token-based computations are performed(req/ack, valid/stop signals are used)

  7. Outline • Asynchronous elastic systems • The basics: circuits and elasticity • Synthesis of asynchronous circuits from Petri nets • Modern methods for the synthesis of large controllers • De-synchronization: from synchronous to asynchronous • Synchronous elastic systems • Basics of synchronous elastic systems • Early evaluation and performance analysis • Optimization of elastic systems and their correctness

  8. The basics:circuits and elasticity

  9. Outline Gates, latches and flip-flops.Combinational and sequential circuits. Basic concepts on asynchronous circuit design. Petri net models for asynchronous controllers. Signal Transition Graphs.

  10. Boolean functions Composed from logic gates a x b a y b a b z c d

  11. Memory elements: latches Q Q D D L H En En Active low: En = 1 (opaque): Q = prev(Q) En = 0 (transparent): Q = D Active high: En = 0 (opaque): Q = prev(Q) En = 1 (transparent): Q = D

  12. Memory elements: flip-flop Q D Q L H D FF CLK CLK CLK D Q

  13. Finite-state automata Inputs Ouputs CL STATE • Output function • Next-state function CLK

  14. Network of Computing Units Out In B3 B1 B2 No combinational cycles

  15. Marked Graph Model Circuit Register Combinational logic Marked graph

  16. Basic concepts on asynchronous circuit design

  17. Outline • What is an asynchronous circuit ? • Asynchronous communication • Asynchronous design styles (Micropipelines) • Asynchronous logic building blocks • Control specification and implementation • Delay models and classes of async circuits • Channel-based design • Why asynchronous circuits ?

  18. R CL R CL R CL R CLK Synchronous circuit Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R)

  19. Asynchronous circuit Ack R CL R CL R CL R Req Explicit (local) synchronization: Req / Ack handshakes

  20. Motivation for asynchronous • Asynchronous design is often unavoidable: • Asynchronous interfaces, arbiters etc. • Modern clocking is multi–phase and distributed –and virtually ‘asynchronous’ (cf. GALS – next slide): • Mesachronous (clock travels together with data) • Local (possibly stretchable) clock generation • Robust asynchronous design flow is coming(e.g. VLSI programming from Philips, Balsa fromUniv. of Manchester, NCL from Theseus Logic …)

  21. Globally Async Locally Sync (GALS) Asynchronous World Clocked Domain Req3 Req1 R R CL Ack3 Ack1 Local CLK Req4 Req2 Ack4 Ack2 Async-to-sync Wrapper

  22. Key Design Differences • Synchronous logic design: • proceeds without taking timing correctness(hazards, signal ack–ing etc.) into account • Combinational logic and memory latches(registers) are built separately • Static timing analysis of CL is sufficient todetermine the Max Delay (clock period) • Fixed set–up and hold conditions for latches

  23. Key Design Differences • Asynchronous logic design: • Must ensure hazard–freedom, signal ack–ing,local timing constraints • Combinational logic and memory latches (registers) are often mixed in “complex gates” • Dynamic timing analysis of logic is needed to determine relative delays between paths • To avoid complex issues, circuits may be builtas Delay-insensitive and/or Speed-independent (as discussed later)

  24. Synchronous communication • Clock edges determine the time instants where data must be sampled • Data wires may glitch between clock edges(set–up/hold times must be satisfied) • Data are transmitted at a fixed rate(clock frequency) 1 1 0 0 1 0

  25. Dual rail 1 1 1 • Two wires with L(low) and H (high) per bit • “LL” = “spacer”, “LH” = “0”, “HL” = “1” • n–bit data communication requires 2n wires • Each bit is self-timed • Other delay-insensitive codes exist (e.g. k-of-n)and event–based signalling (choice criteria: pin and power efficiency) 0 0 0

  26. Bundled data • Validity signal • Similar to an aperiodic local clock • n–bit data communication requires n+1 wires • Data wires may glitch when no valid • Signaling protocols • level sensitive (latch) • transition sensitive (register): 2–phase / 4–phase 1 1 0 0 1 0

  27. Example: memory read cycle Valid address • Transition signaling, 4-phase Address A A Valid data Data D D

  28. Example: memory read cycle Valid address • Transition signaling, 2-phase A A Address Valid data Data D D

  29. Asynchronous modules DATA PATH Data IN Data OUT • Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin-(more concurrency is also possible) start done req in req out CONTROL ack in ack out

  30. A C Z B A B Z+ 0 0 0 0 1 Z 1 0 Z 1 1 1 Asynchronous latches: C element Vdd A B Z B A Z B A Z Static Logic Implementation A B [van Berkel 91] Gnd

  31. Vdd A B Z B A Gnd C-element: Other implementations Vdd A Weak inverter B Z B A Dynamic Quasi-Static Gnd

  32. A.t C.t B.t A.f C.f B.f Dual-rail logic Dual-rail AND gate Valid behavior for monotonic environment

  33. done C Completion detection tree Completion detection Dual-rail logic • • • • • •

  34. Differentialcascodevoltageswitchlogic start Z.f Z.t done A.t N-type transistor network C.f B.f A.f B.t C.t start 3–input AND/NAND gate

  35. Example of dual-rail design • Asynchronous dual-rail ripple-carry adder(A. Martin, 1991) • Critical delay is proportional to logN(N=number of bits) • 32–bit adder delay (1.6m MOSIS CMOS): 11 ns versus 40 ns for synchronous • Async cell transistor count = 34versus synchronous = 28

  36. start done delay Bundled-data logic blocks Single-rail logic • • • • • • Conventional logic + matched delay

  37. r1 g1 C d1 r2 g2 d2 r1 a1 r a r2 a2 sel outf in outt Micropipelines(Sutherland 89) Micropipeline (2-phase) control blocks Request-Grant-Done (RGD)Arbiter Join Merge out0 in out1 Select Toggle Call

  38. C C C delay delay delay Micropipelines (Sutherland 89) Aout Ain C L logic L logic L logic L Rin Rout

  39. Data-path / Control L logic L logic L logic L Rin Rout CONTROL Ain Aout

  40. Control specification A+ A B+ B A– A input B output B–

  41. Control specification A+ B– B A A– B+

  42. C Control specification A+ B+ A C+ C B A– B– C–

  43. C Control specification A+ B+ A C+ C A– B B– C–

  44. Ro+ Ri+ Ri Ro FIFO cntrl Ao+ Ai+ Ao Ai Ro- Ri- C C Ai- Ao- Ri Ro Ao Ai Control specification

  45. A simple filter: specification IN Ain Rin y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT

  46. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout A simple filter: block diagram • x and y are level-sensitive latches (transparent when R=1) • + is a bundled-data adder (matched delay between Ra and Aa) • Rin indicates the validity of IN • After Ain+ the environment is allowed to change IN • (Rout,Aout) control a level-sensitive latch at the output

  47. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout– Ra– Ry– Rx– Rin– Aout– Aa– Ay– Ax– Ain– A simple filter: control spec.

  48. Rx Ax Aa Ry Ra Ay Aout C Ain Rout Rin Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout– Ra– Ry– Rx– Rin– Aout– Aa– Ay– Ax– Ain– A simple filter: control impl.

  49. x’ z+ x– x y z’ z x+ y+ z– y– Taking delays into account • Delay assumptions: • Environment: 3 time units • Gates: 1 time unit events: x+  x’–  y+  z+  z’–  x–  x’+  z–  z’+  y–  time: 3 4 5 6 7 9 10 12 13 14

More Related