1 / 42

Introduction To VIRTEX II Architecture

Introduction To VIRTEX II Architecture. Presented By: Ankur Agarwal. Xilinx Design Flow. Plan & Budget. Create Code/ Schematic. HDL RTL Simulation. Implement. Functional Simulation. Synthesize to create netlist. Translate. Map. Place & Route. Attain Timing Closure.

Download Presentation

Introduction To VIRTEX II Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal

  2. Xilinx Design Flow Plan & Budget Create Code/ Schematic HDL RTL Simulation Implement Functional Simulation Synthesize to create netlist Translate Map Place & Route Attain Timing Closure Timing Simulation Create Bit File

  3. Xilinx Architecture features • High performance at 2.5, 3.3V and 5V • Technology Independence • EDIF, VHDL, Verilog, SDF Interface • Footprint compatibility • Devices with each family are compatible with each other • Pin locking

  4. VIRTEX • Up to 2 Million System Gates at 100+ MHz • Features: • Distributed and Block RAM available • Low Power • Delay Logic Loops • 2.5V Internal Operation with support of common power

  5. Package Speed Grade No. of Gates Family (4000, 9500) Spartan starts with XCS Naming Conventions • XC4028XL-3-BG256 Sub-Family (3V = XL, 5V = no XL)

  6. Complex Programmable Logic Device (CPLD) Field-Programmable Gate Array (FPGA) Architecture PAL/22V10-like Gate array-like More Combinational More Registers + RAM Density Low-to-medium Medium-to-high 0.5-10K logic gates 1K to 3.2M system gates Performance Predictable timing Application dependent Up to 250 MHz today Up to 200 MHz Interconnect “Crossbar Switch” Incremental CPLD and FPGA

  7. I/O Blocks (IOBs) Programmable Interconnect Configurable Logic Blocks (CLBs) Tristate Buffers Global Resources Overview of Xilinx FPGA Architecture

  8. SONET / SDH LVDS DCM PCI-X DDR SDRAM DDR Distri RAM CAM FIFO QDR SRAM PCI 18Kb BRAM DDR Shift Registers CAM DDR Multiplier BLVDS Backplane Block Diagram of VIRTEX-II Architecture

  9. LUT FF CLB Resources • Basic resource unit is the Logic Cell • 1 CLB contains 2 - 4 Logic Cells, depending on device family • Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop • LUT capacity limited by number of inputs, not complexity of function • LUTs can be used as ROM or synchronous RAM • Flip-flop can be configured as a transparent latch in Virtex and Spartan-II

  10. COUT COUT YB YB Carry & Control Logic Carry & Control Logic Look-Up Table Look-Up Table Y Y G4 G3 G2 G1 G4 G3 G2 G1 S S D D Q Q O O CK CK EC EC R R F5IN F5IN BY SR BY SR XB XB Look-Up Table Carry & Control Logic Look-Up Table Carry & Control Logic X X S S F4 F3 F2 F1 F4 F3 F2 F1 D D Q Q O O CK CK EC EC R R CIN CLK CE CIN CLK CE SLICE SLICE Closer Look at a CLB Structure • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

  11. CLB Switch Matrix 18Kb BRAM MULT 18x18 Switch Matrix Switch Matrix IOB Switch Matrix Switch Matrix DCM Switch Matrix Switch Matrix Interconnect Technology Offered by VIRTEX-II • Interconnect an array of switch matrices • All Virtex II features can access routing resources through the switch matrix • Simplify design and place & route

  12. Simplified SLICE Structure • Each Slice has four outputs: • Two registered outputs • Two non-registered outputs • Two BUFTs associated, accessible by all 16 CLB outputs • Carry Logic for fast addition • Two independent carry chain per CLB

  13. MSB Carry Logic Routing LSB Fast Carry Logic • Each CLB contains separate logic and routing for the fast generation of carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters • Carry logic is independent of normal logic and routing resources

  14. TBUF TBUF COUT COUT Switch Matrix Slice S3 X1Y1 Slice S2 X1Y0 SHIFT Slice S1 X0Y1 Slice S0 X0Y0 Fast Connects CIN CIN CLB (Configurable Logic Blocks) • Each CLB is connected to one switch matrix • Providing access to general routing resources • High level of logic integration • Wide-input functions: • 16:1 multiplexer in 1 CLB or any function • 32:1 multiplixer in 2 CLBs • (1 level of LUT) • Fast arithmetic functions • 2 look-ahead carry chains • per CLB column • Addressable shift registers in LUT • 16-b shift register in 1 LUT • 128-b shift register in 1 CLB (dedicated shift chain)

  15. Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions Truth Table 4-input logic function A B LUT = Z C D Four-Input LUT

  16. MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (LUT5) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (LUT6) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB Slice MUXF6 MUXF5 Slice MUXF5 LUT LUT LUT LUT Multiplexers

  17. F8 F6 F7 F6 MUXF8 combines the 2 MUXF7 outputs (Two CLB) F5 F5 F5 F5 Slice S3 Slice S2 MUXF6 combines Slices X1Y0 & X1Y1 Slice S1 MUXF7 combines the 2 MUXF6 outputs Slice S0 MUXF6 combines Slices X0Y0 & X0Y1 CLB CLB Multiplexers CLB Multiplexer Location

  18. SOP ORCY ORCY ORCY • Wide AND-OR functions (Sum Of Products) CY CY CY Slice S3 Slice S3 Slice S3 SOP SOP Slice S2 Slice S2 Slice S2 Slice S1 Slice S1 Slice S1 Slice S0 Slice S0 Slice S0 CLB CLB CLB Horizontal Cascade Chain

  19. Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth LUT D D D D Q Q Q Q IN CE CE CE CE CE CLK LUT = OUT DEPTH[3:0] Shift Register

  20. 12 Cycles 64 64 Operation A Operation B 4 Cycles 8 Cycles Operation C 3 Cycles 3 Cycles 9-Cycle imbalance Shift Register • Register FPGA • Allows for addition of pipeline stages to increase throughput • Data paths must be balanced to keep desired functionality

  21. Shift Register Look-Up Table • High density integration of shift registers • DSP applications use SRL16 for delay matching • CDMA wireless and video applications require shift registers Multiple SRLC16 cascadable to any length

  22. Digital Clock Manager • High-Speed 420 MHz clock generation: • Clock de-skew on-chip and off-chip

  23. DCM Delay-Locked Loop • Clock phase de-skew • Duty cycle correction • Temperature compensation • RST input • LOCKED output • Attributes: • DUTY_CYCLE_CORRECTION • DLL_FREQUENCY_MODE • CLKDV_DIVIDE = 1.5 to 16.0 • STARTUP_WAIT • CLK_FEEDBACK = CLK0 or CLK2X • Up to 4 clock outputs per DCM CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal Control signal Digital Clock Manager: DCM

  24. DCM • Frequency Synthesis • CLKFX is any M / D product of CLKIN frequency • M = 2 to 32, D = 1 to 32 • Default: M=4, D=1 (4X CLKIN) • Always nominal 50/50 duty-cycle • Attributes: • CLKFX_MULTIPLY (integer) • CLKFX_DIVIDE (integer) • DFS_FREQUENCY_MODE CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal After LOCKED: FreqCLKFX = (M/D) x FreqCLK IN Control signal Advanced Frequency Synthesis

  25. DCM Fine Phase Shifting • Applies to all CLK outputs • Phase shift = fraction CLKIN period • Fixed or variable modes • Inputs in variable mode: • PSINCDEC input =Increase /Decrease • PSEN = Enable Phase Shift • PSCLK synchronizes Phase Shift • PSDONE output • Attributes: • CLOCKOUT_PHASE_SHIFT = NONE, FIXED, VARIABLE • PHASE_SHIFT (signed integer) -255 to +255 CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal Control signal High Resolution Phase Shifting

  26. Up to 16 Dedicated Low Skew Clocks Global Clocks

  27. 8 BUFGMUX Unused Branches are Disable (Power Saving) • 16 Global Clock Multiplexers • Eight on the top • Eight on the bottom • Switch “glitch free” from 1 clock to the other • 8 Clocks selectable per quadrant NE NW 8 8 8 max 8 BUFGMUX NW NE 16 Clocks 16 Clocks 8 8 SE SW SW 8 BUFGMUX SW 8 BUFGMUX Clock Distribution

  28. D Q CLK2 D Q BUFG BUFG CLK1 Use Global Buffers to Reduce Clock Skew • Global buffers are connected to dedicated routing. • This routing network is balanced to minimize skew • All Xilinx FPGAs have global buffers • Introduces clock skew between CLK1 and CLK2 • Uses an extra BUFG to reduce skew on CLK2 • Design contains 2 clock signals

  29. BUFG I O • Three modes: • Clock buffer • Low skew clock distribution • BUFG primitive • Clock enable • Stop the clock High or Low • BUFGCE (stop Low) • Clock multiplexer “glitch-free” • Switch from one clock to another • BUFGMUX • unrelated clocks BUFGCE I O CE I0 O BUFGMUX I1 S No pulse width shorter than 1/2 of the period Global Clocks: BUFGMUX

  30. On-Chip SelectRAMTM Memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory CAM Deep/Wide Up to 400 Mbps/pin DDR & QDR DSP Coefficients Small FIFOs CAM Shallow/Wide 18 kb Blocks 128x1 Terabit Memory Continuum Block RAM External RAM/CAM Distributed RAM megabytes kilobytes bytes Memory

  31. Embedded 18 kb Block RAM • Up to 3 Mb on-chip block RAM • High internal buffering bandwidth • Reduced I/O count and more embedded memory

  32. CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read RAM16X1S D WE WCLK = O A0 A1 A2 A3 LUT LUT LUT RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 or RAM16X2S D0 D1 WE = WCLK O0 A0 O1 RAM16X1D A1 A2 D A3 WE or WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 Distributed RAM

  33. Fast arithmetic functions • Optimized to implement multiply / accumulate modules 18 x 18 Embedded Multiplier

  34. Embedded 18-bit x 18-bit multiplier • 2’s complement signed operation • Multipliers are organized in columns 18 x 18 Multiplier Data_A (18 bits) Output (36 bits) Data_B (18 bits) 18 x 18 Multiplier

  35. Q D Three-State EC FF Enable Three-StateControl Clock SR Set/Reset Q D Output EC FF Enable Output Path SR Direct Input FF Enable Input Path Q D Registered Input EC SR Basic I/O Block Structure

  36. I/O Signal Type Single-Ended Differential LVTTL LVCMOS HSTL SSTL LVDS Bus LVDS LVPECL NOTE: Only the popular IO types shown here I/O Signal Types

  37. DDR registers can be clocked by • Clock and not (clock) if the duty cycle is 50/50 • CLK0 and CLK180 DLL outputs CLK DATA_1 D1A D1B D1C DATA_2 D2A D2B D2C Dual Data Rate D1A D2A D1B D2B D1C IOB: Double Data Rate Registers

  38. Vtt = 0.75V Vtt = 0.75V R=50  R=50  Zo = 50 Vref = 0.75V Built-In HSTL II Support • What is the advantage of using HSTL Class II? • High-speed IO interface • Bi-directional • Double parallel termination

  39. Digitally Controlled Impedance • Dynamically adjusted termination resistors • Provides drivers that matched to the impedance of the traces • Provides on-chip termination • Transmitter or receiver • On-Chip termination advantages: • No termination resistors on board • Improve signal integrity by eliminating stub reflection • Eliminates the need for source termination (single-ended I/O) • Reduces board routing headaches and component count

  40. Virtex-II Family: Four and Six Columns Block RAM & Multiplier Device XC2V250

  41. Virtex-II Family Members 6 Columns BRAM & Multipliers 2 Columns BRAM & Multipliers 4 Columns BRAM & Multipliers

  42. VIRTEX-II Packaging • FF and BF are flip-chip ball grid arrays packages • Pinout compatibility inside same color rectangle

More Related