1 / 51

Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation

Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions. Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation Steven M. Burns, Intel Corporation Ken Stevens, Intel Corporation

claral
Download Presentation

Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation Steven M. Burns, Intel Corporation Ken Stevens, Intel Corporation Earlier contributions: Luciano Lavagno, Alex Kondratyev, Alex Yakovlev, Alexander Taubin

  2. Outline • Why asynchronous • Relative timing • Reminder: design flow for asynchronous circuits • Lazy transition systems • Timing assumptions and constraints • Automatic generation of timing assumptions • Results

  3. Why asynchronous? • All high-performance “synchronous” design styles are “asynchronous in small” (within one/few clocks). Example: [ISSCC2001 Intel paper on 4GHz IEU for 0.18um CMOS in Pentium 4(tm)]. Requires asynchronous style timing analysis. • Relative sequential distance within a die for global wires is growing • Can we deliver global clock N years from now?

  4. Timing assumptions in design flow • Synchronous circuits (e.g., static CMOS): • max delay: stabilize within a clock (- setup - clock2q - clock_skew) • min delay: stabilize after hold time (+clock_skew - clock2q) • Speed-independent = quasi-delay insensitive: wire delays after a fork smaller than fan-out gate delays [Muller59, Varshavsky et al. 80, Martin89,…]. Problem: fat circuits • Burst-mode FSM: circuit stabilizes between two changes at the inputs [Nowick91, Yun94]. Problem: fundamental mode is similar to synchronous (external alignment by the worst case) • Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design) [Mayers95]. Problem: how do you know absolute delays before sizing/physical design?

  5. a- before b- Timing assumption (on environment): b c a RT C-element: faster,smaller; correct only under timing constraint: a- before b- Relative Timing Asynchronous Circuits Speed-independent C-element b c a

  6. Relative Timing Circuits • Assumptions:“a before b” • for concurrent events: reduces reachable state space • for ordered events: permits early enabling • both increase don’t care space for logic synthesis => simplify logic (better area and timing) • “Assume - if useful - guarantee” approach:assumptions are used by the tool to derive a circuit and required timing constraintsthat must be met in physical design flow • Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)

  7. STG for the READ cycle DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D LDS DSr VME Bus Controller LDTACK DTACK

  8. State Graph (Read cycle) DSr+ DTACK- LDS+ LDTACK- LDTACK- LDTACK- DSr+ DTACK- LDS- LDS- LDS- LDTACK+ DSr+ DTACK- D+ D- DTACK+ DSr-

  9. 01100 00110 Binary encoding of signals 10000 DSr+ DTACK- LDS+ LDTACK- LDTACK- LDTACK- DSr+ DTACK- 10010 LDS- LDS- LDS- LDTACK+ DSr+ DTACK- 10110 01110 10110 D+ D- (DSr , DTACK , LDTACK , LDS , D) DTACK+ DSr-

  10. DTACK DSr DTACK DSr D LDTACK D LDTACK 00 00 01 01 11 11 10 10 00 00 01 01 11 11 10 10 Karnaugh map for LDS LDS = 1 LDS = 0 - - - 0 0 - 1 1 - - - - - - - - 1 1 1 - - - - - 0 0 - 0 0 0 - 0/1?

  11. Speed-independent netlist

  12. ER (LDS+) 0  1 LDS+ LDS- LDS- LDS- 1  0 ER (LDS-) Transition systems Excitation region: enabling = firing, since delay can be zero

  13. Lazy Transition Systems ER (LDS+) LDS+ LDS- LDS- LDS- FR (LDS-) DTACK- ER (LDS-) Event LDS- is lazy: firing = subset of enabling

  14. Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and enabling • (a before b) for ordered events: early enabling • (a simultaneous to b wrt c) for triples of events: combination of the above

  15. Speed-independent Netlist DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS map csc DSr LDTACK

  16. LDTACK- before DSr+ SLOW FAST Adding timing assumptions (I) DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS map csc DSr LDTACK

  17. LDTACK- before DSr+ Adding timing assumptions (I) DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS map csc DSr LDTACK

  18. LDTACK- before DSr+ State space domain DSr+ LDTACK-

  19. LDTACK- before DSr+ State space domain DSr+ LDTACK-

  20. LDTACK- before DSr+ State space domain DSr+ LDTACK- Two more unreachable states

  21. DTACK DSr DTACK DSr D LDTACK D LDTACK 00 00 01 01 11 11 10 10 00 00 01 01 11 11 10 10 Boolean domain LDS = 1 LDS = 0 - - - 0 0 - 1 1 - - - - - - - - 1 1 1 - - - - - 0 0 - 0 0 0 - 0/1?

  22. DTACK DSr DTACK DSr D LDTACK D LDTACK 00 00 01 01 11 11 10 10 00 00 01 01 11 11 10 10 Boolean domain LDS = 1 LDS = 0 - - - 0 0 - 1 1 - - - - - - - - 1 1 1 - - - - - 0 0 - - 0 0 - 1 One more DC vector for all signals One state conflict is removed

  23. Netlist with one constraint DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS map csc DSr LDTACK

  24. D DTACK TIMING CONSTRAINT LDTACK- before DSr+ LDS DSr LDTACK Netlist with one constraint DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS-

  25. Timing assumptions • (a before b) for concurrent events: concurrency reduction for firing and enabling • (a before b) for ordered events: early enabling • (a simultaneous to b wrt c) for triples of events: combination of the above

  26. a a Logic for gate c may change b c Ordered events: early enabling b b a c c F G a b c

  27. D- before LDS- Adding timing assumptions (II) DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS DSr LDTACK

  28. D- before LDS- Potential enabling for LDS- State space domain LDS- D- DSr- Reachable space is unchanged For LDS- enabling can be changed in one state

  29. DTACK DSr DTACK DSr D LDTACK D LDTACK 00 00 01 01 11 11 10 10 00 00 01 01 11 11 10 10 Boolean domain LDS = 1 LDS = 0 - - - 0 0 - 1 1 - - - - - - - - 1 1 1 - - - - - 0 0 - - 0 0 - 1

  30. DTACK DSr DTACK DSr D LDTACK D LDTACK 00 00 01 01 11 11 10 10 00 00 01 01 11 11 10 10 Boolean domain LDS = 1 LDS = 0 - - - 0 0 - 1 1 - - - - - - - - - 1 1 - - - - - 0 0 - - 0 0 - 1 One more DC vector for one signal:LDS If used: LDS = DSr, otherwise:LDS = DSr + D

  31. Before early enabling DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS DSr LDTACK

  32. TIMING CONSTRAINTS LDTACK- before DSr+ and D- before LDS- Netlist with two constraints DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK DSr LDS LDTACK Both timing assumptions are used for optimization and become constraints

  33. Rule I (out of 6): a,b - non-inputevents Untimed ordering: a||b and aenabledbeforeb, but not vice versa Derived assumption: a fires before b Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) Deriving automatic timing assumptions c b a a a c b b

  34. Rule I (out of 6): a,b - non-input events Untimed ordering: (a||b) and (aenabledbeforeb), but not vice versa Derived assumption: a fires before b Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions c b a a a c b b • Effect I: a state becomes DC for all signals

  35. Rule I (out of 6): a,b - non-input events Untimed ordering: (a||b) and (aenabledbeforeb), but not vice versa Derived assumption: a fires before b Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions c b a a a c b b • Effect II: another state becomes local DC for signal of event b

  36. Backannotation of Timing Constraints • Timed circuits require post-verification • Can synthesis tools help ? • Report the least stringent set of timing constraints required for the correctness of the circuit • Not all initial timing assumptions may be required • Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

  37. Timing constraints generation c a b d d d d b c a e e e c b Assumptions: d before b and c before e and a before d

  38. Timing constraints generation c a b d d d d b c a e e e c b Assumptions: d before b and c before e and a before d

  39. Correct behavior Timing constraints generation c a b d d d d b c a e e e c b Assumptions: d before b and c before e and a before d

  40. Timing constraints generation 1 c a b d d d d b c a Incorrect behavior e e e c b 2 Assumptions: d before b and c before e and a before d

  41. d before c d before b {1} {1, 3} c before e {2, 4} Covering incorrect behavior 3 1 c a b d d d d b c a 5 e e e c b 2 4 Assumptions: d before b and c before e and a before d Other possible constraints remove states from assumption domain => invalid

  42. d before c {1} Covering incorrect behavior 3 1 c a b d d d d b c a 5 c before e e e e c b {2, 4} 2 4 Assumptions: d before b and c before e and a before d Constraints for the minimal cost solution: d before c and c before e

  43. Timing aware state encoding • Solve only state conflicts reachable in the RT assumptions domain • Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic • State variables inserted concurrently with I/O events => latency and cycle time reduction

  44. Value of Relative Timing • RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction • Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual • Back-annotation of timing constraints => minimal required timing information for the back-end tools • Timing-aware state encoding allows significant area/performance optimization

  45. Specification(STG) Reachability analysis State Graph State encoding SG withCSC Design flow without timing Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist

  46. Design Flow with Timing Specification(STG + user assumptions) Reachability analysis Lazy State Graph Timing-aware state encoding Automatic Timing Assumptions Lazy SG withCSC Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Required Timing Constraints Gate netlist

  47. li+ ro- lo- ri- li- lo+ ro+ ri+ FIFO example ro li FIFO lo ri

  48. Speed-Independent Implementation without concurrency reduction 3 state signals are required

  49. SI implementation with concurrency reduction x+ li+ ro- lo- ri- ri li - + gC x gC + ro lo li- lo+ ro+ ri+ x-

  50. x+ x+ li+ ro- lo- ri- li+ ro- lo- ri- li- lo+ ro+ ri+ li- lo+ ro+ ri+ x- x- OR RT implementation ri li x ro lo

More Related