Introduction to virtex ii architecture
Download
1 / 42

Introduction To VIRTEX II Architecture - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Introduction To VIRTEX II Architecture. Presented By: Ankur Agarwal. Xilinx Design Flow. Plan & Budget. Create Code/ Schematic. HDL RTL Simulation. Implement. Functional Simulation. Synthesize to create netlist. Translate. Map. Place & Route. Attain Timing Closure.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction To VIRTEX II Architecture' - maggie-thomas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to virtex ii architecture

Introduction To VIRTEX II Architecture

Presented By:

Ankur Agarwal


Xilinx design flow
Xilinx Design Flow

Plan & Budget

Create Code/

Schematic

HDL RTL

Simulation

Implement

Functional

Simulation

Synthesize

to create netlist

Translate

Map

Place & Route

Attain Timing

Closure

Timing

Simulation

Create

Bit File


Xilinx architecture features
Xilinx Architecture features

  • High performance at 2.5, 3.3V and 5V

  • Technology Independence

    • EDIF, VHDL, Verilog, SDF Interface

  • Footprint compatibility

    • Devices with each family are compatible with each other

    • Pin locking


Virtex
VIRTEX

  • Up to 2 Million System Gates at 100+ MHz

  • Features:

    • Distributed and Block RAM available

    • Low Power

    • Delay Logic Loops

    • 2.5V Internal Operation with support of common power


Naming conventions

Package

Speed Grade

No. of Gates

Family (4000, 9500)

Spartan starts with XCS

Naming Conventions

  • XC4028XL-3-BG256

Sub-Family (3V = XL, 5V = no XL)


Cpld and fpga

Complex Programmable Logic Device (CPLD)

Field-Programmable Gate Array (FPGA)

Architecture PAL/22V10-like Gate array-like

More Combinational More Registers + RAM

Density Low-to-medium Medium-to-high

0.5-10K logic gates 1K to 3.2M system gates

Performance Predictable timing Application dependent

Up to 250 MHz today Up to 200 MHz

Interconnect “Crossbar Switch” Incremental

CPLD and FPGA


Overview of xilinx fpga architecture

I/O Blocks (IOBs)

Programmable

Interconnect

Configurable

Logic Blocks (CLBs)

Tristate

Buffers

Global

Resources

Overview of Xilinx FPGA Architecture


Block diagram of virtex ii architecture

SONET / SDH

LVDS

DCM

PCI-X

DDR

SDRAM

DDR

Distri

RAM

CAM

FIFO

QDR

SRAM

PCI

18Kb

BRAM

DDR

Shift

Registers

CAM

DDR

Multiplier

BLVDS

Backplane

Block Diagram of VIRTEX-II Architecture


Clb resources

LUT

FF

CLB Resources

  • Basic resource unit is the Logic Cell

    • 1 CLB contains 2 - 4 Logic Cells, depending on device family

  • Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop

    • LUT capacity limited by number of inputs, not complexity of function

    • LUTs can be used as ROM or synchronous RAM

    • Flip-flop can be configured as a transparent latch in Virtex and Spartan-II


Closer look at a clb structure

COUT

COUT

YB

YB

Carry

&

Control

Logic

Carry

&

Control

Logic

Look-Up

Table

Look-Up

Table

Y

Y

G4

G3

G2

G1

G4

G3

G2

G1

S

S

D

D

Q

Q

O

O

CK

CK

EC

EC

R

R

F5IN

F5IN

BY

SR

BY

SR

XB

XB

Look-Up

Table

Carry

&

Control

Logic

Look-Up

Table

Carry

&

Control

Logic

X

X

S

S

F4

F3

F2

F1

F4

F3

F2

F1

D

D

Q

Q

O

O

CK

CK

EC

EC

R

R

CIN

CLK

CE

CIN

CLK

CE

SLICE

SLICE

Closer Look at a CLB Structure

  • Each slice has 2 LUT-FF pairs with associated carry logic

  • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs


Interconnect technology offered by virtex ii

CLB

Switch

Matrix

18Kb

BRAM

MULT

18x18

Switch

Matrix

Switch

Matrix

IOB

Switch

Matrix

Switch

Matrix

DCM

Switch

Matrix

Switch

Matrix

Interconnect Technology Offered by VIRTEX-II

  • Interconnect an array of switch matrices

  • All Virtex II features can access routing resources through the switch matrix

    • Simplify design and place & route


Simplified slice structure
Simplified SLICE Structure

  • Each Slice has four outputs:

    • Two registered outputs

    • Two non-registered outputs

    • Two BUFTs associated, accessible by all 16 CLB outputs

  • Carry Logic for fast addition

    • Two independent carry chain per CLB


Fast carry logic

MSB

Carry Logic

Routing

LSB

Fast Carry Logic

  • Each CLB contains separate logic and routing for the fast generation of carry signals

    • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters

  • Carry logic is independent of normal logic and routing resources


Clb configurable logic blocks

TBUF

TBUF

COUT

COUT

Switch

Matrix

Slice S3

X1Y1

Slice S2

X1Y0

SHIFT

Slice S1

X0Y1

Slice S0

X0Y0

Fast Connects

CIN

CIN

CLB (Configurable Logic Blocks)

  • Each CLB is connected to one switch matrix

    • Providing access to general routing resources

  • High level of logic integration

    • Wide-input functions:

      • 16:1 multiplexer in 1 CLB or any function

      • 32:1 multiplixer in 2 CLBs

      • (1 level of LUT)

    • Fast arithmetic functions

      • 2 look-ahead carry chains

      • per CLB column

    • Addressable shift registers in LUT

      • 16-b shift register in 1 LUT

      • 128-b shift register in 1 CLB (dedicated shift chain)


Four input lut

Implements combinatorial logic

Any 4-input logic function

Cascaded for wide-input functions

Truth Table

4-input logic function

A

B

LUT

=

Z

C

D

Four-Input LUT


Multiplexers

MUXF5 combines 2 LUTs to create

4x1 multiplexer

Or any 5-input function (LUT5)

Or selected functions up to 9 inputs

MUXF6 combines 2 slices to form

8x1 multiplexer

Or any 6-input function (LUT6)

Or selected functions up to 19 inputs

Dedicated muxes are faster and more space efficient

CLB

Slice

MUXF6

MUXF5

Slice

MUXF5

LUT

LUT

LUT

LUT

Multiplexers


Clb multiplexers clb multiplexer location

F8

F6

F7

F6

MUXF8 combines the 2 MUXF7 outputs (Two CLB)

F5

F5

F5

F5

Slice S3

Slice S2

MUXF6 combines Slices X1Y0 & X1Y1

Slice S1

MUXF7 combines the 2 MUXF6 outputs

Slice S0

MUXF6 combines Slices X0Y0 & X0Y1

CLB

CLB Multiplexers CLB Multiplexer Location


Horizontal cascade chain

SOP

ORCY

ORCY

ORCY

  • Wide AND-OR functions (Sum Of Products)

CY

CY

CY

Slice S3

Slice S3

Slice S3

SOP

SOP

Slice S2

Slice S2

Slice S2

Slice S1

Slice S1

Slice S1

Slice S0

Slice S0

Slice S0

CLB

CLB

CLB

Horizontal Cascade Chain


Shift register

Each LUT can be configured as shift register

Serial in, serial out

Dynamically addressable delay up to 16 cycles

For programmable pipeline

Cascade for greater cycle delays

Use CLB flip-flops to add depth

LUT

D

D

D

D

Q

Q

Q

Q

IN

CE

CE

CE

CE

CE

CLK

LUT

=

OUT

DEPTH[3:0]

Shift Register


Shift register1

12 Cycles

64

64

Operation A

Operation B

4 Cycles

8 Cycles

Operation C

3 Cycles

3 Cycles

9-Cycle imbalance

Shift Register

  • Register FPGA

    • Allows for addition of pipeline stages to increase throughput

  • Data paths must be balanced to keep desired functionality


Shift register look up table
Shift Register Look-Up Table

  • High density integration of shift registers

    • DSP applications use SRL16 for delay matching

    • CDMA wireless and video applications require shift registers

Multiple SRLC16 cascadable to any length


Digital clock manager
Digital Clock Manager

  • High-Speed 420 MHz clock generation:

    • Clock de-skew on-chip and off-chip


Digital clock manager dcm

DCM

Delay-Locked Loop

  • Clock phase de-skew

  • Duty cycle correction

  • Temperature compensation

  • RST input

  • LOCKED output

  • Attributes:

    • DUTY_CYCLE_CORRECTION

    • DLL_FREQUENCY_MODE

    • CLKDV_DIVIDE = 1.5 to 16.0

    • STARTUP_WAIT

    • CLK_FEEDBACK = CLK0 or CLK2X

  • Up to 4 clock outputs per DCM

  • CLK0

    CLKIN

    CLK90

    CLKFB

    CLK180

    RST

    CLK270

    CLK2X

    DSSEN

    CLK2X180

    CLKDV

    PSINCDEC

    PSEN

    CLKFX

    PSCLK

    CLKFX180

    LOCKED

    STATUS[7:0]

    PSDONE

    Clock signal

    Control signal

    Digital Clock Manager: DCM


    Advanced frequency synthesis

    DCM

    • Frequency Synthesis

      • CLKFX is any M / D product of CLKIN frequency

      • M = 2 to 32, D = 1 to 32

      • Default: M=4, D=1 (4X CLKIN)

      • Always nominal 50/50 duty-cycle

      • Attributes:

        • CLKFX_MULTIPLY (integer)

        • CLKFX_DIVIDE (integer)

        • DFS_FREQUENCY_MODE

    CLK0

    CLKIN

    CLK90

    CLKFB

    CLK180

    RST

    CLK270

    CLK2X

    DSSEN

    CLK2X180

    CLKDV

    PSINCDEC

    PSEN

    CLKFX

    PSCLK

    CLKFX180

    LOCKED

    STATUS[7:0]

    PSDONE

    Clock signal

    After LOCKED:

    FreqCLKFX = (M/D) x FreqCLK IN

    Control signal

    Advanced Frequency Synthesis


    High resolution phase shifting

    DCM

    Fine Phase Shifting

    • Applies to all CLK outputs

    • Phase shift = fraction CLKIN period

    • Fixed or variable modes

    • Inputs in variable mode:

      • PSINCDEC input =Increase /Decrease

      • PSEN = Enable Phase Shift

      • PSCLK synchronizes Phase Shift

    • PSDONE output

    • Attributes:

      • CLOCKOUT_PHASE_SHIFT =

        NONE, FIXED, VARIABLE

      • PHASE_SHIFT (signed integer)

        -255 to +255

    CLK0

    CLKIN

    CLK90

    CLKFB

    CLK180

    RST

    CLK270

    CLK2X

    DSSEN

    CLK2X180

    CLKDV

    PSINCDEC

    PSEN

    CLKFX

    PSCLK

    CLKFX180

    LOCKED

    STATUS[7:0]

    PSDONE

    Clock signal

    Control signal

    High Resolution Phase Shifting



    Clock distribution

    8 BUFGMUX

    Unused Branches are Disable (Power Saving)

    • 16 Global Clock Multiplexers

      • Eight on the top

      • Eight on the bottom

      • Switch “glitch free” from 1 clock to the other

    • 8 Clocks selectable per quadrant

    NE

    NW

    8

    8

    8 max

    8 BUFGMUX

    NW

    NE

    16 Clocks

    16 Clocks

    8

    8

    SE

    SW

    SW

    8 BUFGMUX

    SW

    8 BUFGMUX

    Clock Distribution


    Use global buffers to reduce clock skew

    D

    Q

    CLK2

    D

    Q

    BUFG

    BUFG

    CLK1

    Use Global Buffers to Reduce Clock Skew

    • Global buffers are connected to dedicated routing.

    • This routing network is balanced to minimize skew

    • All Xilinx FPGAs have global buffers

    • Introduces clock skew between CLK1 and CLK2

    • Uses an extra BUFG to reduce skew on CLK2

    • Design contains 2 clock signals


    Global clocks bufgmux

    BUFG

    I

    O

    • Three modes:

      • Clock buffer

        • Low skew clock distribution

        • BUFG primitive

      • Clock enable

        • Stop the clock High or Low

        • BUFGCE (stop Low)

      • Clock multiplexer “glitch-free”

        • Switch from one clock to another

        • BUFGMUX

        • unrelated clocks

    BUFGCE

    I

    O

    CE

    I0

    O

    BUFGMUX

    I1

    S

    No pulse width shorter than 1/2 of the period

    Global Clocks: BUFGMUX


    Memory

    On-Chip SelectRAMTM Memory

    Large FIFOs

    Packet Buffers

    Video Line Buffers

    Cache Tag Memory

    CAM

    Deep/Wide

    Up to

    400 Mbps/pin

    DDR & QDR

    DSP Coefficients

    Small FIFOs

    CAM

    Shallow/Wide

    18 kb

    Blocks

    128x1

    Terabit Memory Continuum

    Block RAM

    External RAM/CAM

    Distributed RAM

    megabytes

    kilobytes

    bytes

    Memory


    Embedded 18 kb block ram
    Embedded 18 kb Block RAM

    • Up to 3 Mb on-chip block RAM

    • High internal buffering bandwidth

    • Reduced I/O count and more embedded memory


    Distributed ram

    CLB LUT configurable as Distributed RAM

    A LUT equals 16x1 RAM

    Implements Single and Dual-Ports

    Cascade LUTs to increase RAM size

    Synchronous write

    Synchronous/Asynchronous read

    Accompanying flip-flops used for synchronous read

    RAM16X1S

    D

    WE

    WCLK

    =

    O

    A0

    A1

    A2

    A3

    LUT

    LUT

    LUT

    RAM32X1S

    D

    WE

    WCLK

    A0

    O

    A1

    A2

    A3

    A4

    or

    RAM16X2S

    D0

    D1

    WE

    =

    WCLK

    O0

    A0

    O1

    RAM16X1D

    A1

    A2

    D

    A3

    WE

    or

    WCLK

    A0

    SPO

    A1

    A2

    A3

    DPRA0

    DPO

    DPRA1

    DPRA2

    DPRA3

    Distributed RAM


    18 x 18 embedded multiplier

    18 x 18 Embedded Multiplier


    18 x 18 multiplier

    18 x 18

    Multiplier

    Data_A

    (18 bits)

    Output

    (36 bits)

    Data_B

    (18 bits)

    18 x 18 Multiplier


    Basic i o block structure

    Q

    D

    Three-State

    EC

    FF Enable

    Three-StateControl

    Clock

    SR

    Set/Reset

    Q

    D

    Output

    EC

    FF Enable

    Output Path

    SR

    Direct Input

    FF Enable

    Input Path

    Q

    D

    Registered Input

    EC

    SR

    Basic I/O Block Structure


    I o signal types

    I/O Signal Type

    Single-Ended

    Differential

    LVTTL

    LVCMOS

    HSTL

    SSTL

    LVDS

    Bus LVDS

    LVPECL

    NOTE: Only the popular IO types shown here

    I/O Signal Types


    Iob double data rate registers

    CLK

    DATA_1

    D1A

    D1B

    D1C

    DATA_2

    D2A

    D2B

    D2C

    Dual Data Rate

    D1A

    D2A

    D1B

    D2B

    D1C

    IOB: Double Data Rate Registers


    Built in hstl ii support

    Vtt = 0.75V

    Vtt = 0.75V

    R=50 

    R=50 

    Zo = 50

    Vref = 0.75V

    Built-In HSTL II Support

    • What is the advantage of using HSTL Class II?

      • High-speed IO interface

      • Bi-directional

    • Double parallel termination


    Digitally controlled impedance
    Digitally Controlled Impedance

    • Dynamically adjusted termination resistors

      • Provides drivers that matched to the impedance of the traces

      • Provides on-chip termination

      • Transmitter or receiver

    • On-Chip termination advantages:

      • No termination resistors on board

      • Improve signal integrity by eliminating stub reflection

      • Eliminates the need for source termination (single-ended I/O)

      • Reduces board routing headaches and component count



    Virtex ii family members
    Virtex-II Family Members Multiplier Device

    6 Columns

    BRAM & Multipliers

    2 Columns

    BRAM & Multipliers

    4 Columns

    BRAM & Multipliers


    Virtex ii packaging
    VIRTEX-II Packaging Multiplier Device

    • FF and BF are flip-chip ball grid arrays packages

    • Pinout compatibility inside same color rectangle


    ad