1 / 26

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design. Authors: Nikhil Jayakumar* Rajesh Garg* Bruce Gamache $ Sunil P. Khatri* *Department of Electrical Engineering,Texas A&M University. $ Conexant Systems, Inc. Outline. Motivation Introduction Approach

berne
Download Presentation

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design Authors: Nikhil Jayakumar* Rajesh Garg* Bruce Gamache$ Sunil P. Khatri* *Department of Electrical Engineering,Texas A&M University. $Conexant Systems, Inc.

  2. Outline • Motivation • Introduction • Approach • Results • Conclusions

  3. Sub-threshold Leakage • As supply voltage scales down, the VT of the devices is scaled down as well. • Leakage increases exponentially with decreasing VT • Leakage power is becoming comparable with dynamic power. • A larger VT would reduce leakage but increase delay. • We can turn this dilemma into an opportunity !! • Use sub-threshold leakage current to implement circuits. • Set VDD less than VT.

  4. Advantages of Sub-threshold Circuit Design • We performed simulations on a 21 stage ring oscillator (BPTM 65nm) • Power is significantly lower (100-500X). • PDP improves by 10-20X. • Transconductance is an exponential function of Vgs • Circuit noise margins are high. • Ion/Ioff = 100 – 200. • Circuits get faster at higher temperature.

  5. Disadvantages of Sub-threshold Circuit Design • Ids is highly dependent on PVT variations • Need dynamic compensating circuitry such as the one mentioned in: • “A Variation-tolerant Sub-threshold Design Approach”, N. Jayakumar, S. Khatri [DAC’05] • Used Adaptive Body Biasing. • Ids is small which results in large delay. • Delay gets worse by 10-25X. • Therefore, application space is in very low power applications such as sensor networks. • Design methodologies for sub-threshold digital circuit design are ad-hoc.

  6. Contribution of this paper • Provide a systematic EDA framework for the design of complex digital systems using sub-threshold Network of PLA (NPLA) based circuits. • Use asynchronous micropipelining to provide a greater throughput. • Ideally suited for Data-flow type circuits.

  7. Why NPLAs? • NPLAs are fast and area-efficient when compared to standard-cell based designs - “Cross-talk immune VLSI design using a Network of PLAs Embedded in a Regular Layout Fabric”, S.Khatri, R. Brayton, A. Sangiovanni-Vincentelli [ICCAD’00] • Predictable delay of dynamic PLAs • Good circuit implementation choice for sub-threshold/near-threshold logic. • Regular Layout Structure • Compatible with Restrictive Design Rules (RDRs) required to handle current and future lithographic issues. • Technology independent optimizations (literal reduction) utilized better • No intervening technology mapping step. • Implementing Structured ASICs • An array of fixed-size PLAs is ideally suited for implementing Structured ASIC type designs. - “A METAL and VIA Mask Customizable VLSI Design Scheme using an Array of Dynamic PLAs”, N.Jayakumar, S.Khatri [ICCAD’04]

  8. PLA structure – PrechargedNOR-NOR ORPLANE ANDPLANE

  9. PLA structure – PrechargedNOR-NOR • Inputs run vertically • Wordlines run horizintally • Outputs run vertically • A dummy wordline and a dummy output line are provided for self-timing.

  10. PLA structure – PrechargedNOR-NOR completion is the last signal to switch. Input latches to latch data from previous level

  11. AsynchronousMicropipeline Structure • Each PLA has • Data Inputs –D (input) • Data Outputs – O (output) • Hand-shaking control signals - P1, P2 (input) • Controls asynchronous handshake • PLA evaluation/precharge done signal – completion (output) • Switches high when evaluation completes, switches low when precharge completes. • Internal clock signal – INTCLK (output) • Generated from completion, P1 and P2 to control operation of the PLA. • INTCLK = low → PLA precharges • INTCLK = high → PLA evaluates level n level 2 level 1

  12. Handshaking Logic • PLA p (at level k) precharges (INTCLK goes low) if its P1 rises • PLA q at next higher level has latched the output data of p. • PLA p evaluates (INTCLK goes high) if its P2 rises and its completion signal is low • PLA p is currently in the precharged state (its completion signal is low). • PLA r at next lower level has completed evaluation and has new data ready (P2 for PLA p has risen). • Handshaking logic is therefore as shown below:

  13. Micro-Pipeline Operation • Initially all PLAs are precharged. • Drive primary inputs (D of level 1 PLAs). • P2 signals of level 1 PLAs are asserted. • After evaluation is done, completion signals of level 1 PLAs go high. • Therefore level 2 PLAs start evaluating. • Data gets latched at input of level 2 PLAs, INTCLK of level 2 PLAs go high. • This causes level 1 PLAs to start precharging. • When evaluation of level 2 PLAs is done, their completion signals go high • This causes level 3 PLAs to start evaluating level n level 2 level 1

  14. Micro-Pipeline Operation • This goes on till the PLAs at level n finish evaluation (indicated by their completion signal going high). • Consumer circuit latches the output and asserts P1 of level n PLAs • This cause level n PLAs to precharge. • When completion of level n-1 PLAs goes high and level n PLAs have precharged, then level n PLAs can evaluate again. level n level 2 level 1

  15. Non-micropipelined vs Micropipelined • Delay for non-micropipelined NPLA = Tpchg + n x (Teval) • Delay of micropipelined PLA = (Teval + Tpchg+ handshaking time) level n level 2 level 1

  16. Verilog Simulation of Micropipelining • We simulated the handshaking protocol in verilog. • Verified correct operation. • If consumer circuit holds off asserting P1 for level n PLAs, the entire pipeline stalls. • Note that when level i is in precharge, level i+1 is in evaluation and vice-versa.

  17. Synthesis-Algorithm • First levelize the given multi-level network N • Generate a DFS of network nodes and sort in increasing order of levels • Greedily include new nodes from multi level network, into a current PLA. • Assume current PLA p has nodes {n} in it. • Candidate nodes {m} for inclusion in PLA p are: • Nodes in the fanout of nodes in {n}. • Nodes at the same level as nodes in {n}. • We evaluate favorability of nodes in {m} is as: favorability(m) = 2 * (#common fanins (m,{n}) + (#common fanouts (m,{n}. • The first term favors sharing of inputs with existing nodes {n}, while the second term favors sharing of outputs. • Sharing of inputs was empirically determined to be more useful in yielding smaller PLA counts. • We include the node with the highest favorability value. 5 5 4 3 2 2 1 1 1

  18. Synthesis-Algorithm • Current PLA p is grown until it violates size constraints • Nodes {n} in the current PLA are converted into a two-level network N. • We run espresso on N. • If the number of inputs, outputs and height of this two-level network are bounded, then PLA p is grown • If not, then we start growing a new PLA. • Build a PLA dependency graph • Each vertex corresponds to a unique PLA • Each edge connects the output of a PLA to the input of another PLA • Node being included in current PLA p are constrained by the following: • the node being included should not violate size constraints of a PLA. • the inclusion of this node should not result in a cyclic PLA dependency graph • If such a node is not available pick the next most favorable node. 5 5 4 3 2 2 1 1 1

  19. Synthesis-Algorithm • After synthesis, the output of a PLA at level i may drive PLAs at level > i+1 • Such a case will cause micro-pipelining to fail. • Insert Stutter blocks for signals which traverse one or more levels of PLAs. • Stutter blocks are banks of latches to delay signals which traverse more than 1 levels of PLAs. • Multiple stutter blocks are inserted for signals traversing multiple levels. PLA4 Stutter block PLA5 PLA3 PLA1 PLA2

  20. Experiments • 65nm technology. • VDD = 0.2V • PLA size : 16 inputs, 14 outputs, 24 rows • Delay, Energy results from SPICE using 65nm BPTM model cards. • Comparison made with non-micropipelined PLA. • Thoughput of PLA = 1/(Teval+Tpchg+2.Heval+Hpchg) • Teval = Evaluation time for a PLA (~210ns) • Tpchg = Precharge time for a PLA (~155ns) • Heval = Handshake time before start of evaluation (~60ns) • Hpchg = Handshake time before start of precharge (~25ns)

  21. Results - Delay • Delay = 1/throughput for micropipelined. • Delay is constant since PLA size is fixed.

  22. Results – Area • Area estimates based on layout of PLAs along with stutter blocks.

  23. What about Energy consumption? • Non-micropipelined NPLAs precharge together and then evaluate in a domino fashion. • Energy wasted due to leakage in the “Precharged” and the “Evaluated” states. • Micropipelined PLAs spend little time in the “Precharged” or “Evaluated” states. Timing Diagram for a non-micropipelined NPLA

  24. Results – Energy • Results show energy consumption for one computation through the NPLA circuit. • Significant reduction in energy consumption is observed.

  25. Conclusions • We have proposed an asynchronous micropipelined design approach that reclaims some of the speed penalty associated with subthreshold circuit design. • Ideally suited for data-flow type applications. • We implemented: • Handshaking protocol for micropipelining. • Circuit Design aspects of the approach. • Logic synthesis for micropipelined NPLAs. • We validated the approach with Verilog andSpice simulations. • Results show that: • Design can be sped up by ~ 7X. • Area Overhead is ~ 47%. • Energy consumption is lower by ~ 4X. • Techniques described can be used for regular operating conditions (VDD > VT) as well.

  26. Thank you. Questions?

More Related