1 / 33

A Deep Sub-Micron VLSI Design Flow using Layout Fabrics

A Deep Sub-Micron VLSI Design Flow using Layout Fabrics. Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley. Our VLSI Design Flow. Logic netlist.

alika
Download Presentation

A Deep Sub-Micron VLSI Design Flow using Layout Fabrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Deep Sub-Micron VLSI Design Flow using Layout Fabrics Sunil P. Khatri University of Colorado, Boulder Amit Mehrotra University of Illinois, Urbana-Champaign Robert K Brayton Alberto L Sangiovanni-Vincentelli University of California, Berkeley

  2. Our VLSI Design Flow Logic netlist Logic Optimization Optimized logic netlist Technology Mapping Placement Routing Layout

  3. Motivation • Modern IC processes • Feature size well below 1 micron • Certain electrical effects increasingly important • Cross-talk • Electromigration • Self Heat • Statistical variations • Logic abstraction eroded • Existing design paradigms need to be rethought

  4. C C 1 1 C 1 C C C 2 2 2 C 2 v a a C 1 C 1 v C C 2 2 C 2 a a v a v a a v a a C a v a C 1 C 1 C 1 C C 1 1 1 v C v v 1 C C C C 2 2 C 2 2 C C 2 C C 2 2 2 2 a C C a 2 2 a Research Focus • The cross-talk issue • Tackled in an ad-hoc manner • Increases turn-around time • Verified cross-talk trends • Accurate 3-D capacitance extraction • Delay variation 2.47:1 (200 mm wires, 10X drivers, 0.1 mm technology)

  5. Outline • Previous Approaches • New idea: The Fabric Approach • Fabric1 (in DAC-1999) • Standard-cell based design • Fabric3 (in ICCAD-2000) • Network of PLA based design • Further Tasks • Summary

  6. Previous Approaches • [ALPHA 97] : • Metal layers 3 and 6 dedicated to power • Not viable in future processes • [Rubio 94]: • Functional analysis based on layout • Post-layout methods don’t scale • [Kirkpatrick 94, 96] : • Concept of digital sensitivity • Requires don’t-care and image computations

  7. V S V G S S S V S • G S V Solution: Layout Fabrics • We handle cross-talkby design • A new layout and design paradigm • Repeating dense wiring fabric (DWF) pattern at minimum pitch

  8. Research Contribution • Verify cross-talk trends • Fabric1 [KMBSO99] (in DAC) • Incorportated into traditional design flow • Fabric3 [KBS00] (in ICCAD-00) • Network of PLAs • Detailed electrical characterization • Synthesis, wire removal algorithms • Both utilize DWF pattern • 1.02:1 cross-talk delay variation

  9. Layout Fabrics • Advantages • Pre-characterized parasitics • Uniform, low cross-coupling capacitance • 40X lower, 2% delay variation • Uniform, low signal inductance • Automatic power and ground routing • Uniform, low power and ground resistance • Can effectively implement regular structures • Disadvantages • 5% increase in total capacitance • Area penalty • Power increase

  10. Capacitance in DWF • Experimental setup • “Strawman” process model, copper wires, low-K dielectric • Capacitances from 3-D field solver (space3d) • Simulated three wires in spice • 0.1 micron process, Metal2 wires • Length 200 microns, 10x minimum drivers • Non-DWF • Delay variation 2.47:1 • Signal integrity problems for fast slew rates • With DWF • 40X reduction in cross-coupling capacitance • Delay variation 1.02:1, no signal integrity problem

  11. Inductance in the DWF • Low and uniform in DWF • Current return path is at minimum spacing • In regular layout style, varies greatly • Problems reported for clock signals • Compared inductance of Metal8 trace • Verified using ASITIC Inductance (nH / micron)

  12. VDD/GND Resistance in DWF • Check resistance at various points in DWF • Compare with standard cell case • Varies greatly • Measured at end of row • L/W = 1000/8 VDD/GND resistance (ohms)

  13. Buffer Insertion in DWF • Easily performed • VDD and GND available all over routing area

  14. Fabric1 - Introduction • DWF pattern utilized chip-wide • Library cells implemented in this pattern Std Cell Fabric Cell • Synthesis, placement and routing use standard cell methodology

  15. Fabric1 - Results

  16. Fabric1 - Results

  17. Fabric3 • Network of Programmable Logic Arrays • Combine many logic nodes into a PLA • Routing area utilizes DWF pattern • PLA implements a multi-output function • example : f = a b + c ; g = a b + c a b b c f g a b c a b AND plane OR Plane

  18. g f b a b a clk Fabric3 PLA Core Layout

  19. PLA Standard Cell PLAs v/s Standard Cells • PLAs are denseand fast

  20. PLA Characteristics • Why is the PLA area and delay so low? • Wiring localized within PLA • PLA core transistor sizes are minimum • No p-transistor to n-transistor diffusion spacing • “Gigahertz” chip utilized pre-charged PLAs • High performance • Quick implementation • Didn’t use a network of PLAs

  21. g f e d a c b Network of PLAs • PLAs are pre-charged • Inputs to all PLAs must settle before evaluation begins

  22. Network of PLAs • For correct operation: • PLA dependency graph must be acyclic • Evaluation of PLAi after completion of slowest PLAj in its “fanin” • Self-timed design style • Each PLA generates a completion signal • Overhead of one wordline, one output • Delay formula to find slowest PLAj

  23. Decomposition • Algorithm collapses wiring into PLAs • Input: multi-level combinational network W bound H bound • Output: Correct network of PLAs • Our algorithm greedily grows a PLA until either bound is violated • Attempt to reduce wires by selecting fanouts for inclusion in the PLA being grown

  24. Choice of W, H • Choice of W • Driven by synthesis constraints • Large W means larger runtimes • espresso and folding done in inner loop • Use W between 25 and 50 • Choice of H • Driven by power considerations • Large H also affects synthesis runtimes • Used H between 15 and 40

  25. g 4 g g 4 3 f f 3 f g 4 g 4 g 3 f e d 4 g 2 2 e d 4 e d g 2 2 4 3 f g 4 3 f 3 f g 3 f 4 3 f e d 2 2 c 3 b f a e d 1 1 1 c 2 b a c b 2 a 1 1 1 e d 2 2 e d 2 2 e d 2 2 e d 2 2 c b a e 1 d 1 1 2 2 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 c b a 1 1 1 Fabric3 - Decomposition

  26. Place/Route Flow • PLA generation using perl script • Layout generated on the fly • 2 Layer experiments: • Placement using vpr • FPGA placement tool • All PLAs have approximately same size • Routing using wolfe • interface to TimberWolfSC and yacr • 3-6 Layer experiments: • Placement using CADENCE qplace • Routing using CADENCE router

  27. Fabric3 - Area Results

  28. Fabric3 - Timing Results

  29. Fabric3 - Results • Timing results essentially unchanged • For C3540, delay variation due to cross-talk is 3.45:1 (Stdcell) versus 1.07:1 (Fabric3)

  30. Fabric3 layout (2 Layer)

  31. Future Tasks • Better algorithms: • Better ways of decomposing original netlist • Refining the fabric: • Alternative denser fabrics • Encoding PLA inputs [Schmookler80] • Connecting gates to PLA outputs • Alternative implementation of logic blocks: • Different PLA styles • Alternative circuits

  32. Summary • Layout fabricsto eliminate cross-talkin DSM VLSI design • New layout and design paradigm • Fix cross-talk by design • Highly regular and predictable • Network of PLA based design flow • PLA decomposition algorithms • Minimal area penalty • 15% timing improvement

  33. Thank you!!

More Related