Digital Integrated CircuitsA Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Modified and integrated by Davide Bertozzi DesignMethodologies Array-based design
Array-based Pre-diffused Pre-wired (Gate Arrays) (FPGA's) Late-Binding Implementation • Till now, all methodologies require a complete run through the fabrication process • very high NRE (nonrecurring expense) • Array-based implementations have less manufacturing costs • attractive for small series • lower performance/density, higher power
Gate Array — Sea-of-gates • wafers of pre-diffused transistors are pre-manufactured • desired interconnections added to determine the overall function of the chip - just a few metallization steps more, applied onto pre-diffused wafers in a week or less - manufacturing irregarding of final application (standard masks) PMOS Uncommited Cell NMOS Committed Cell(4-input NOR) The channelless layout is called “sea-of-gates” (which also does not have predefined contacts)
Gate Array — Primitive cells PMOS Uncommited Cell NMOS • How to determine - composition of primitive cells? - need to ensure maximum transistor exploitation - size of primitive transistors? - flexibility to drive arbitrary loads Static design decisions affect a wide range of designs!!
Sea-of-gate Primitive Cells Alternative cell structures Using oxide-isolation Using gate-isolation Long rows of transistors sharing the same diffusion area Some transistors must be tied to Vdd or GND for isolation between neighboring gates Isolated cells consist of N transistors In principle, gate-isolation leads to higher transistor density
Using oxide-isolation Sea-of-gate Primitive Cells Transistor sizing challenge Interconnect-oriented nature of GAs (prop. delay dominated by interconn. capacitance) • Favors larger device sizes - large area overhead when unused • Connect smaller devices in parallel (e.g., 2 rows of small NMOS TNs, to connect in parallel when needed) • Small devices for pass transistor logic or memory cells smaller smaller Utilization factors largely depend on application - from 100% (regular structures) to lower than 75%. Mapping a design onto a gate array is largely automated
Example: Base Cell of Gate-Isolated GA Base cell: 1 pMOS 1 nMOS Cell height: 21 tracks From Smith97
Example: Flip-Flop in Gate-Isolated GA From Smith97
Sea-of-gates Memories can be implemented on top of gate arrays - inefficient (similar to standard cells) GAs integrated with memory macros (embedded gate array) Random Logic Memory Subsystem LSI Logic LEA300K (0.6 mm CMOS) Courtesy LSI Logic
Loss of interest Comparison • Gate Array: - Lower manifacturing cost - larger area - interconnect-centric programming (!) - regular and fixed layout: load factors, wiring parasitics,… can be accurately estimated • Standard cell: - Higher manifacturing cost - lower area - less emphasis on routing - load factors and parasitics are only known after placement, routing and extraction Some Gate Array design approaches also leverage regularity and predictability of interconnects
The return of gate arrays? Array of prediffused cells with a superimposed wiring grid Via programmable gate array(VPGA) Via-programmable cross-point metal-6 metal-5 programmable via Exploits regularity of interconnects [Pileggi02]
Prewired Arrays Solution: programming in the field, outside the silicon foundry! Classification of prewired arrays (or field programmable gate arrays, FPGA): • Based on Programming Technique • Fuse-based (program-once) • Non-volatile • RAM based • Programmable Logic Style • Array-Based • Look-up Table based • Programmable Interconnect Style • Channel-routing • Mesh networks
Prewired Arrays Starting from a regular array of cells….. • How do we implement programmable logic? How can we commit logic to perform any possible boolean function? • How do we store the program/configuration that commits the programmable array to a certain logic function?
Configuration storage • Fuse-based FPGA - Use of fuses (to be blown) or antifuses (to be short-circuited) - small area overhead vs one-time-programmable • Nonvolative FPGA - program stored in EEPROM/Flash - functionality retained until next programming round - Additional process steps (e.g., ultrathin oxides), high programming voltages • Volatile FPGA - program stored in RAM cells - at power up, configuration re-loading from external non-volatile memory - RAM cells programmed as a giant shift register - linear programming vs multi-cell programming - regular CMOS process is OK - logic function can be dynamically modified on the fly during execution (partial reconfiguration capability)
Antifuse-Based FPGA antifuse polysilicon 10nm ONO dielectric n antifuse diffusion + Open by default, closed by applying current pulse (melting of the dielectric) The opposite holds for FUSES From Smith97
Prewired Arrays ….starting from a regular array of cells….. • How do we implement programmable logic? How can we commit logic to perform any possible boolean function? • Array-based approach • Cell-based approach • How do we store the program/configuration that dedicates the programmable array to a certain logic function?
I I I I I I 5 4 3 2 1 0 Programmable I I I I 3 2 1 0 I I I I I I OR array 5 4 3 2 1 0 Fixed AND array O O O O O 3 2 1 0 O 0 0 Indicates programmable connection Indicates fixed connection Array-Based Programmable Logic (programmable logic devices, PLD) Include input in the minterm Include minterm in the output Programmable OR array Fixed OR array Programmable AND array Programmable AND array O O O O O O 3 2 1 Fixed, trade-off flexibility for density and power 3 2 1 PLA PROM PAL
1 X X X 2 1 0 : programmed node NA NA f f 1 0 Programming a PROM A large fraction of the PROM is unused! Complex logic functions determine: • low performance • low programming density And in general, no registers nor flip flops! PLD less and less attractive
More Complex PAL How can I implement sequential logic with PLDs? x Outputs can be fed back as a subset of the inputs Programmable D,T,J-K or clocked S-R flip flop i inputs, j minterms/macrocell, k macrocells From Smith97
Multi-level logic advantages Reduced sum of products form: x = A D F + A E F + B D F + B E F + C D F + C E F + G 6 x 3-input AND gates + 1 x 7-input OR gate (may not exist!) 25 wires (19 literals plus 6 internal wires) A 1 D F A 2 E F A B B 1 3 D C F D B x x 2 3 4 E 4 E 7 F F C G 5 D Factored form: x = (A + B + C) (D + E) F + G 1 x 3-input OR gate, 2 x 2-input OR gates, 1 x 3-input AND gate 10 wires (7 literals plus 3 internal wires) F C E 6 F G Such optimizations are unsopported by PLAs
Array-based Programmable Logic • + REGULAR STRUCTURE • accurate parasitic, area, power, speed estimates • + SUITABLE FOR 2-LEVEL LOGIC • E.g. functions with a large fan-in • ..or functions that map well into 2-level logic (e.g.,FSMs) • - HIGHER OVERHEAD • capacitance of intermediate nodes • negatively affects performance and power • risk of underutilization, especially PLAs (and waste of power) The alternative is CELL-BASED PROGRAMMABLE LOGIC….
Configuration A B S F= 0 0 0 0 0 X 1 X 0 Y 1 Y 0 Y X XY X 0 Y XY Y 0 X XY Y 1 X X Y + 1 0 X X 1 0 Y Y 1 1 1 1 2-input mux as programmable logic block A mux used as logic function generator A 0 F B 1 S By properly connecting inputs A,B and S to variables X and Y, 10 different logic functions can be obtained
Logic Cell of Actel Fuse-Based FPGA More complex logic gates with multiple Muxes Used in Actel fuse-based FPGA Any 2 or 3 inputs logic functions; some 4 inputs logic functions; a Latch
Look-up Table Based Logic Cell EXOR inference The Look-up table stores the truth table of a logic function (with n inputs, any logic function of n inputs can be implemented)
Extensions for sequential cells Sel LUT D Q CLK LUT-Based Logic Cell
Sizing LUTs Source: Altera white paper: FPGA Architecture • Small size LUT increases the level of logic implementation and, hence, increases circuit delay. • Large size LUT increases silicon area and cost since some of their inputs are not used in logic implementation.
4 C ....C 1 4 xx xxxx xxxx xxxx Bits D xxxx 4 control Logic xx xx D xx xx function x x 3 xx of xx D 2 xxx D 1 Logic xx xx x function x x of x x xxx F 4 Bits xxxx Logic control xx xx F xx 3 xx function x x xx F of xx 2 xxx F 1 xx xx x xxxxx x H x P Multiplexer Controlled CLB for Xilinx 4000 Series by Configuration Program LUT-Based Logic Cell Complex cells by adding more LUTs, increasing LUT size and inserting flip-flops and Muxes Courtesy Xilinx
Array-Based Programmable Wiring Interconnect Point Pass transistor with memory cell M (Flash or SRAM) Programmed interconnection Input/output pin Cell Horizontal tracks Vertical tracks
Crossing points • Pass transistor - large number of transistors and control signals - High fan-out wires delay and power • Fuse/antifuse - Fuses: long programming times (few connections usually needed) - Antifuses require less programming - one time programmable Array-based wire programming has been successful only in the write-once class of FPGAs
Mesh-based Interconnect Network Each logic cell output routed north, west, south or east Connectivity through RAM-programmable switching or connect matrices Switch Box Connect Box InterconnectPoint Courtesy Dehon and Wawrzyniek
Transistor Implementation of Mesh • The transistor induces a treshold-voltage drop which limits performance • level restorers, zero-Vth transistors, boosted control signals,.. • Inefficient for global interconnects Courtesy Dehon and Wawrzyniek
Hierarchical Mesh Network • Most mesh-based FPGA architectures offer alternative wiring resources allowing for effective global wiring • Reduced fanout and reduced resistance Courtesy Dehon and Wawrzyniek
ALTERA EPLD Block Diagram Nonvolatile FPGA Logic cells are PLA elements (called Logic Array Block, LABs) 16 macrocells per LAB Primary inputs Macrocell Courtesy Altera
Altera MAX From Smith97
t PIA LAB1 LAB2 PIA t PIA LAB6 Altera MAX Interconnect Architecture column channel row channel LAB Array-based (MAX 3000-7000) Simple, predictable does not scale well Mesh-based (MAX 9000) Wide channels (48 to 96 wires) Beyond 560 macrocells Courtesy Altera
Xilinx 4000 Interconnect Architecture Combines look-up table based approach with mesh-based interconnect Low delay inter-CLB connections 12 Quad 8 Single 4 Double 3 Long Direct 2 CLB Connect 3 Long 12 4 4 8 4 8 4 2 Quad Long Global Long Double Single Global Carry Direct Clock Clock Chain Connect Can also be configured as array of memory cells Distributed over long distances Courtesy Xilinx
RAM-based FPGA Horizontal and vertical routing channels easily recognizable 1000 CLB: 32x32 array 25000 equivalent gates 422 kbits programming RAM CLB at 250 MHz Multi-CLB adder: 20-50 MHz 1 32 bit adder: 62 CLB Xilinx XC4025 Courtesy Xilinx
Heterogeneous Programmable Platforms Centered around an FPGA FPGA Fabric Embedded memories Embedded PowerPc Hardwired multipliers Xilinx Vertex-II Pro High-speed I/O (3.125 Gbps transceivers) Courtesy Xilinx
FPGA Reconfigurable Data-path Interface ARM8 Core Berkeley Pleiades Processor Centered around an ARM7 core - ARM8: system manager - Intensive computations offloaded to a reconfigurable datapath (adders, multipliers, ASIP,..) - FPGA for bit manipulation • 0.25um 6-level metal CMOS • 5.2mm x 6.7mm • 1.2 Million transistors • 40 MHz at 1V • 2 extra supplies: 0.4V, 1.5V • 1.5~2 mW power dissipation