Introduction To VIRTEX II Architecture

Introduction To VIRTEX II Architecture Presented By: Ankur Agarwal

Xilinx Design Flow Plan & Budget Create Code/ Schematic HDL RTL Simulation Implement Functional Simulation Synthesize to create netlist Translate Map Place & Route Attain Timing Closure Timing Simulation Create Bit File

Xilinx Architecture features • High performance at 2.5, 3.3V and 5V • Technology Independence • EDIF, VHDL, Verilog, SDF Interface • Footprint compatibility • Devices with each family are compatible with each other • Pin locking

VIRTEX • Up to 2 Million System Gates at 100+ MHz • Features: • Distributed and Block RAM available • Low Power • Delay Logic Loops • 2.5V Internal Operation with support of common power

Package Speed Grade No. of Gates Family (4000, 9500) Spartan starts with XCS Naming Conventions • XC4028XL-3-BG256 Sub-Family (3V = XL, 5V = no XL)

Complex Programmable Logic Device (CPLD) Field-Programmable Gate Array (FPGA) Architecture PAL/22V10-like Gate array-like More Combinational More Registers + RAM Density Low-to-medium Medium-to-high 0.5-10K logic gates 1K to 3.2M system gates Performance Predictable timing Application dependent Up to 250 MHz today Up to 200 MHz Interconnect “Crossbar Switch” Incremental CPLD and FPGA

I/O Blocks (IOBs) Programmable Interconnect Configurable Logic Blocks (CLBs) Tristate Buffers Global Resources Overview of Xilinx FPGA Architecture

SONET / SDH LVDS DCM PCI-X DDR SDRAM DDR Distri RAM CAM FIFO QDR SRAM PCI 18Kb BRAM DDR Shift Registers CAM DDR Multiplier BLVDS Backplane Block Diagram of VIRTEX-II Architecture

LUT FF CLB Resources • Basic resource unit is the Logic Cell • 1 CLB contains 2 - 4 Logic Cells, depending on device family • Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop • LUT capacity limited by number of inputs, not complexity of function • LUTs can be used as ROM or synchronous RAM • Flip-flop can be configured as a transparent latch in Virtex and Spartan-II

COUT COUT YB YB Carry & Control Logic Carry & Control Logic Look-Up Table Look-Up Table Y Y G4 G3 G2 G1 G4 G3 G2 G1 S S D D Q Q O O CK CK EC EC R R F5IN F5IN BY SR BY SR XB XB Look-Up Table Carry & Control Logic Look-Up Table Carry & Control Logic X X S S F4 F3 F2 F1 F4 F3 F2 F1 D D Q Q O O CK CK EC EC R R CIN CLK CE CIN CLK CE SLICE SLICE Closer Look at a CLB Structure • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

CLB Switch Matrix 18Kb BRAM MULT 18x18 Switch Matrix Switch Matrix IOB Switch Matrix Switch Matrix DCM Switch Matrix Switch Matrix Interconnect Technology Offered by VIRTEX-II • Interconnect an array of switch matrices • All Virtex II features can access routing resources through the switch matrix • Simplify design and place & route

Simplified SLICE Structure • Each Slice has four outputs: • Two registered outputs • Two non-registered outputs • Two BUFTs associated, accessible by all 16 CLB outputs • Carry Logic for fast addition • Two independent carry chain per CLB

MSB Carry Logic Routing LSB Fast Carry Logic • Each CLB contains separate logic and routing for the fast generation of carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters • Carry logic is independent of normal logic and routing resources

TBUF TBUF COUT COUT Switch Matrix Slice S3 X1Y1 Slice S2 X1Y0 SHIFT Slice S1 X0Y1 Slice S0 X0Y0 Fast Connects CIN CIN CLB (Configurable Logic Blocks) • Each CLB is connected to one switch matrix • Providing access to general routing resources • High level of logic integration • Wide-input functions: • 16:1 multiplexer in 1 CLB or any function • 32:1 multiplixer in 2 CLBs • (1 level of LUT) • Fast arithmetic functions • 2 look-ahead carry chains • per CLB column • Addressable shift registers in LUT • 16-b shift register in 1 LUT • 128-b shift register in 1 CLB (dedicated shift chain)

Implements combinatorial logic Any 4-input logic function Cascaded for wide-input functions Truth Table 4-input logic function A B LUT = Z C D Four-Input LUT

MUXF5 combines 2 LUTs to create 4x1 multiplexer Or any 5-input function (LUT5) Or selected functions up to 9 inputs MUXF6 combines 2 slices to form 8x1 multiplexer Or any 6-input function (LUT6) Or selected functions up to 19 inputs Dedicated muxes are faster and more space efficient CLB Slice MUXF6 MUXF5 Slice MUXF5 LUT LUT LUT LUT Multiplexers

F8 F6 F7 F6 MUXF8 combines the 2 MUXF7 outputs (Two CLB) F5 F5 F5 F5 Slice S3 Slice S2 MUXF6 combines Slices X1Y0 & X1Y1 Slice S1 MUXF7 combines the 2 MUXF6 outputs Slice S0 MUXF6 combines Slices X0Y0 & X0Y1 CLB CLB Multiplexers CLB Multiplexer Location

SOP ORCY ORCY ORCY • Wide AND-OR functions (Sum Of Products) CY CY CY Slice S3 Slice S3 Slice S3 SOP SOP Slice S2 Slice S2 Slice S2 Slice S1 Slice S1 Slice S1 Slice S0 Slice S0 Slice S0 CLB CLB CLB Horizontal Cascade Chain

Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth LUT D D D D Q Q Q Q IN CE CE CE CE CE CLK LUT = OUT DEPTH[3:0] Shift Register

12 Cycles 64 64 Operation A Operation B 4 Cycles 8 Cycles Operation C 3 Cycles 3 Cycles 9-Cycle imbalance Shift Register • Register FPGA • Allows for addition of pipeline stages to increase throughput • Data paths must be balanced to keep desired functionality

Shift Register Look-Up Table • High density integration of shift registers • DSP applications use SRL16 for delay matching • CDMA wireless and video applications require shift registers Multiple SRLC16 cascadable to any length

Digital Clock Manager • High-Speed 420 MHz clock generation: • Clock de-skew on-chip and off-chip

DCM Delay-Locked Loop • Clock phase de-skew • Duty cycle correction • Temperature compensation • RST input • LOCKED output • Attributes: • DUTY_CYCLE_CORRECTION • DLL_FREQUENCY_MODE • CLKDV_DIVIDE = 1.5 to 16.0 • STARTUP_WAIT • CLK_FEEDBACK = CLK0 or CLK2X • Up to 4 clock outputs per DCM CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal Control signal Digital Clock Manager: DCM

DCM • Frequency Synthesis • CLKFX is any M / D product of CLKIN frequency • M = 2 to 32, D = 1 to 32 • Default: M=4, D=1 (4X CLKIN) • Always nominal 50/50 duty-cycle • Attributes: • CLKFX_MULTIPLY (integer) • CLKFX_DIVIDE (integer) • DFS_FREQUENCY_MODE CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal After LOCKED: FreqCLKFX = (M/D) x FreqCLK IN Control signal Advanced Frequency Synthesis

DCM Fine Phase Shifting • Applies to all CLK outputs • Phase shift = fraction CLKIN period • Fixed or variable modes • Inputs in variable mode: • PSINCDEC input =Increase /Decrease • PSEN = Enable Phase Shift • PSCLK synchronizes Phase Shift • PSDONE output • Attributes: • CLOCKOUT_PHASE_SHIFT = NONE, FIXED, VARIABLE • PHASE_SHIFT (signed integer) -255 to +255 CLK0 CLKIN CLK90 CLKFB CLK180 RST CLK270 CLK2X DSSEN CLK2X180 CLKDV PSINCDEC PSEN CLKFX PSCLK CLKFX180 LOCKED STATUS[7:0] PSDONE Clock signal Control signal High Resolution Phase Shifting

Up to 16 Dedicated Low Skew Clocks Global Clocks

8 BUFGMUX Unused Branches are Disable (Power Saving) • 16 Global Clock Multiplexers • Eight on the top • Eight on the bottom • Switch “glitch free” from 1 clock to the other • 8 Clocks selectable per quadrant NE NW 8 8 8 max 8 BUFGMUX NW NE 16 Clocks 16 Clocks 8 8 SE SW SW 8 BUFGMUX SW 8 BUFGMUX Clock Distribution

D Q CLK2 D Q BUFG BUFG CLK1 Use Global Buffers to Reduce Clock Skew • Global buffers are connected to dedicated routing. • This routing network is balanced to minimize skew • All Xilinx FPGAs have global buffers • Introduces clock skew between CLK1 and CLK2 • Uses an extra BUFG to reduce skew on CLK2 • Design contains 2 clock signals

BUFG I O • Three modes: • Clock buffer • Low skew clock distribution • BUFG primitive • Clock enable • Stop the clock High or Low • BUFGCE (stop Low) • Clock multiplexer “glitch-free” • Switch from one clock to another • BUFGMUX • unrelated clocks BUFGCE I O CE I0 O BUFGMUX I1 S No pulse width shorter than 1/2 of the period Global Clocks: BUFGMUX

On-Chip SelectRAMTM Memory Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory CAM Deep/Wide Up to 400 Mbps/pin DDR & QDR DSP Coefficients Small FIFOs CAM Shallow/Wide 18 kb Blocks 128x1 Terabit Memory Continuum Block RAM External RAM/CAM Distributed RAM megabytes kilobytes bytes Memory

Embedded 18 kb Block RAM • Up to 3 Mb on-chip block RAM • High internal buffering bandwidth • Reduced I/O count and more embedded memory

CLB LUT configurable as Distributed RAM A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read RAM16X1S D WE WCLK = O A0 A1 A2 A3 LUT LUT LUT RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 or RAM16X2S D0 D1 WE = WCLK O0 A0 O1 RAM16X1D A1 A2 D A3 WE or WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 Distributed RAM

Fast arithmetic functions • Optimized to implement multiply / accumulate modules 18 x 18 Embedded Multiplier

Embedded 18-bit x 18-bit multiplier • 2’s complement signed operation • Multipliers are organized in columns 18 x 18 Multiplier Data_A (18 bits) Output (36 bits) Data_B (18 bits) 18 x 18 Multiplier

Q D Three-State EC FF Enable Three-StateControl Clock SR Set/Reset Q D Output EC FF Enable Output Path SR Direct Input FF Enable Input Path Q D Registered Input EC SR Basic I/O Block Structure

I/O Signal Type Single-Ended Differential LVTTL LVCMOS HSTL SSTL LVDS Bus LVDS LVPECL NOTE: Only the popular IO types shown here I/O Signal Types

DDR registers can be clocked by • Clock and not (clock) if the duty cycle is 50/50 • CLK0 and CLK180 DLL outputs CLK DATA_1 D1A D1B D1C DATA_2 D2A D2B D2C Dual Data Rate D1A D2A D1B D2B D1C IOB: Double Data Rate Registers

Vtt = 0.75V Vtt = 0.75V R=50  R=50  Zo = 50 Vref = 0.75V Built-In HSTL II Support • What is the advantage of using HSTL Class II? • High-speed IO interface • Bi-directional • Double parallel termination

Digitally Controlled Impedance • Dynamically adjusted termination resistors • Provides drivers that matched to the impedance of the traces • Provides on-chip termination • Transmitter or receiver • On-Chip termination advantages: • No termination resistors on board • Improve signal integrity by eliminating stub reflection • Eliminates the need for source termination (single-ended I/O) • Reduces board routing headaches and component count

Virtex-II Family: Four and Six Columns Block RAM & Multiplier Device XC2V250

Virtex-II Family Members 6 Columns BRAM & Multipliers 2 Columns BRAM & Multipliers 4 Columns BRAM & Multipliers

VIRTEX-II Packaging • FF and BF are flip-chip ball grid arrays packages • Pinout compatibility inside same color rectangle

Introduction To VIRTEX II Architecture

Introduction To VIRTEX II Architecture

Presentation Transcript

Introduction to Software Architecture

Introduction to Architecture

Introduction to J2EE Architecture

Introduction to Architecture

Introduction to Processor Architecture

Introduction to Architecture

Introduction to Web Architecture

Introduction to Architecture

Introduction to Architecture

Introduction To Computer Architecture

Introduction To Computer Architecture

Virtex II Pro based SoPC design

Virtex II Pro based SoPC design

Introduction to Hardware/Architecture

Introduction to Security Architecture

Implementing Codesign in Xilinx Virtex II Pro

Implementing Codesign in Xilinx Virtex II Pro

Introduction To Computer Architecture