1 / 41

Xilinx FPGA Architecture Overview

Xilinx FPGA Architecture Overview. Virtex/Spartan-II Top-level Architecture. Gate-array like architecture Configurable logic blocks Implement logic here! I/O blocks 16 signal standards Block RAM On-chip memory for higher performance Clocks & Delay-Locked Loop Interconnect resources

coppola
Download Presentation

Xilinx FPGA Architecture Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Xilinx FPGA Architecture Overview

  2. Virtex/Spartan-II Top-level Architecture • Gate-array like architecture • Configurable logic blocks • Implement logic here! • I/O blocks • 16 signal standards • Block RAM • On-chip memory for higher performance • Clocks & Delay-Locked Loop • Interconnect resources • Three-state internal buses

  3. Logic Cell Capacity • A better first-order alternative to gate counting • Better comparisons among different FPGAs • Logic cell definition: • 4-input look-up table + dedicated flip-flop • Logic cells per CLB: • Xc4000/Spartan 2.375 (2 4-LUTs, 1 3-LUT, 2 FFs) • Virtex/Spartan-II 4.5 (4 4-LUTs, 1 F5MUX, 4 FFs)

  4. Combinational Logic Function (LUT) Flip- Flop Outputs Inputs Configurable Logic Block (CLB) • Combinational logic generated in a lookup table (LUT) • Any function of available inputs • LUT output feeds CLB output or D input of flip-flop

  5. LUT LUT LUT LUT Virtex/Spartan-II Function Generators • Four 4-input function generators • Independent inputs (4 functions of 4 inputs) • MUXF5 combines 2 LUTs to form • 4x1 multiplexer • Or any 5-input function • MUXF6 combines 2 slices to form • 8x1 multiplexer • Or any 6-input function CLB Slice MUXF6 MUXF5 Slice MUXF5

  6. Lookup Table • Generates any function of its inputs • Typically 4 inputs • Logically equivalent to a 16 x 1 ROM InputsOutput 0000 0 0001 1 0010 1 0011 0 LUT

  7. CLB Lookup Table Targeting LUT-based Logic • LUT limit is on inputs, not complexity • Reducing inputs/function (fan-in) to fit CLBs improves density and speed • Automatically done by Xilinx synthesis and implementation tools • Inverters are free

  8. O1 O1 I1 N1A I1 N1 N1B N1 must go to two places, so O1 may require a second level of logic Duplicating first gate allows N1A to always be collapsed inside a single lookup table Duplicating Logic Can Improve Results • Collapsing of logic into CLBs affects number of levels required and therefore speed • The gates you use will determine mapping • Nets with a fanout >1 may be outside a CLB

  9. AND2 Defining Lookup Tables With Gate Primitives • Example of gate primitive • Up to five inputs with all combinations of inversion • AND2B1 indicates 1 “bubbled” or inverted input • Up to nine inputs non-inverted • Add external INV primitives if desired

  10. Stores data (D) on rising edge of clock (K) Clock enable (CE) Asynchronous clear (C) K CE C D Q X x 1 x 0 1 0 d d 0 x 0 x q D Q CE K C Flip-Flops

  11. Additional Flip-Flop Controls • Reset (Clear) and/or Set • Global initialization (GSR) • Use to initialize all flip-flops • Programmable clock polarity • Clock enable can be left unconnected

  12. Virtex/Spartan-II CLB Slice • 1 CLB holds 2 slices • Each slice has two sets of • Four-input LUT • Any 4-input logic function • Or 16-bit x 1 RAM • Or 16-bit shift register • Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic • Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or Async. Control

  13. Dedicated Multiplier Logic • Highly efficient ‘Shift & Add’ implementation • For a 16x16 multiplier • 30% reduction in area • 1 less logic level

  14. On-chip RAM • All Xilinx FPGAs use RAM-based programming • Adding Write Enable to LUT creates on-chip SelectRAM memory

  15. Data Write Enable Output Write Clock Address Data Write Enable Single-Port Output Write Clock Write Address/ Single-Port Read Address Dual-Port Output Dual-Port Read Address SelectRAM Benefits • Single-Port • Synchronous • Simple timing • Dual-Port

  16. Memory Bandwidth and Flexibility • Virtex/Spartan-II On-Chip SelectRAM+ Memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory Deep/Wide SDRAM ZBTRAM SSRAM SGRAM DSP Coefficients Small FIFOs Shallow/Wide 4Kx1 2Kx2 1Kx4 512x8 256x16 16x1 Distributed RAM Block RAM External RAM bytes kilobytes megabytes 200 MHz Memory Continuum

  17. Spartan-II Memory • CLB LUTs provide small distributed RAM (16 bits/LUT) • Block RAM provides 4K bits each • Dual read/write port. Each port has… • Independent Clock, R/W, and Enable • Independently configurable data width from 4K x 1 to 256 x 16 W R Port A Spartan-II Dual-R/WPort Block RAM Port B R W W W R R

  18. I/O Block (IOB) • Periphery of identical I/O blocks • Input, output, or bi-directional • Direct or registered (or latched input) • Pullup/Pulldown • Programmable slew rate • Three-state output • Programmable thresholds IOB I Pad O TS Bonded to Package Pin Clocks

  19. IPAD IBUF Use Special IOB Primitives • User explicitly defines what resources in the IOB are to be used • I/Os are defined with • 1 pad primitive • At least 1 function primitive • 1 input element, 1 output element or both • Inverters may also be pulled into IOBs

  20. Locking Down I/O Locations • LOC=Pxx attribute defines I/O pad location(s) • Avoid locking IOBs early • Makes routing more difficult • Use IOB LOC= to lock pins late in design cycle once PCB is built • Can lock IOBs if floorplanning the connected CLBs

  21. IPAD IBUF Use Pullups/Pulldowns • Pullup automatically connected on unused IOBs • User can specify PULLUP or PULLDOWN primitive on used IOBs • Inputs should not be left floating • Add Pullup to design inputs that may be left floating to reduce power and noise

  22. Faster Setup With NODELAY • Delay included by default • Compensates for clock routing delay to prevent hold time • NODELAY attribute removes delay element • Creates hold time Example IOB External Data External Clock Routed Clock Pad Q D Delay X External Data Input Buffer X Delay Data External Clock Routing Delay Pad

  23. FAST OPAD OBUF Slew Rate Control • Slew rate controls output speed • Default slow slew rate reduces noise & ground bounce • Use fast slew rate wherever speed is important • FAST parameter on output logic primitive

  24. OBUFE OBUFT OE T OE T Output Three-State Control • Free inverter on output buffer control • Use OBUFE macro for active-high enable • Use OBUFT primitive for active-low enable

  25. STARTUP GTS GSR Global Three-State • 3-state control either local and/or via a dedicated global net • Global three-state controlled by STARTUP... primitive

  26. Virtex/Spartan-II I/O Block (Simplified)

  27. Multiple I/O Interface Standards • 16 to 20 I/O interface standards supported • CMOS, HSTL, SSTL, GTL, CTT, PCI • As many as eight banks on a device • Package dependent • Different banks can support different standards at the same time • Logic level translation • Boards with mixed standards

  28. High Performance Routing • Hierarchical Routing • Singles, Hexes, Longs • Sparse connections on longer interconnects for high speed • Routing delay depends primarily on distance • Direction independent • Device-size independent • Predictable for early design analysis Vector Based Interconnect 2ns 2ns 2ns 2ns CLB Array

  29. Flexible General-Purpose Interconnect • Flexible but slow if crosses many channels • Programmable switch matrix at each channel crossing • Connects across, changes direction or fans out

  30. Switch Matrix • Bidirectional pass transistors • High routing flexibility

  31. fn1 fn1 D Q D Q fn1 D Q Reduce Fanout • Higher fanout nets (>16 loads) are harder to route & slower • Consider duplicating source in schematic to improve routing or speed

  32. CLB CLB CLB CLB Long Lines for High Fanout Nets • Metal lines that traverse length & width of chip • Lowest skew • Ideal for high fan-out signals • Ideal for clocking • Requires vertical or horizontal alignment of loads

  33. Internal Three-State Buses • Two 3-state drivers per CLB • OR-AND logic implementation in place of 3-state drivers • With no drivers enabled, bus is a logic 1 • Low power • No danger of contention when multiple BUFTs enabled • No physical pullups or large capacitance to drive

  34. General Clock Support • Use clock buffers for highest fanout clocks • Drive high-speed long line resources • Lowest skew across a device • No internal hold times • Use generic BUFG primitive • Allows software to choose best type of buffer • Allows easy migration across families • Four dedicated global low skew buffers • Dedicated input pin (clock distribution only) • Additional shared resources (i.e., long lines) • Distribute low-skew/high-fanout signals (10ns max.) • Four delay-locked loops on each device • All-digital implementation • Two global buffers associated with each DLL pair

  35. Configuration • Schematic or HDL description is converted to a configuration file by the Xilinx development system • Configuration file is loaded into FPGA on power-up • Stored in configuration latches • Controls CLBs, IOBs, interconnect, etceteras

  36. Configuration Bitstream • Binary programming file • Length depends only on device, not utilization • Typically 1 ms per bit (total from a few ms to <1s) • FPGA can load its configuration automatically on power-up, or under microprocessor control • Can be loaded directly into device/configuration PROM

  37. Configuration Modes • Bit-serial configuration • Simple, uses few device pins • Controlled by FPGA (Master) or externally (Slave) • Xilinx serial proms available • Byte-parallel configuration • Can drive PROM addresses (Master) • Can be microprocessor-controlled

  38. Configuration Pins • Configuration starts on power-up • Mode pin(s) checked to determine method • Usable as extra I/O after configuration • All I/O not used for configuration are disabled • Reconfiguration possible by pulling PROGRAM pin low

  39. READBACK CLK DATA TRIG RIP Readback • Configuration data can be read back serially • Allows verification of programming • Readback data can include user-register values • Allows in-circuit functional verification • Requires READBACK... symbol

  40. Boundary Scan • IEEE 1149.1-compatible boundary scan (JTAG) • Available before configuration • Configuration & readback possible via boundary scan logic

  41. Power Consumption • CMOS SRAM technology provides low standby power • Operating power is mostly dynamic • Proportional to transition frequency of internal nodes • Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power

More Related