Annapolis wildstar fpga board
Download
1 / 57

Annapolis Wildstar FPGA Board - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Annapolis Wildstar FPGA Board. Charles Ross Monica Chawathe. Wildstar Board. Starfire Board. WildStar Board (Simplified). 2M. 2M. 2M. 2M. 1M. 1M. Virtex 2000E “1”. Virtex 2000E “0”. Virtex 2000E “2”. Host. 1M. 1M. 2M. 2M. 2M. 2M. LAD Bus.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Annapolis Wildstar FPGA Board' - chase


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Annapolis wildstar fpga board

Annapolis Wildstar FPGA Board

Charles Ross

Monica Chawathe




Wildstar board simplified
WildStar Board (Simplified)

2M

2M

2M

2M

1M

1M

Virtex

2000E

“1”

Virtex

2000E

“0”

Virtex

2000E

“2”

Host

1M

1M

2M

2M

2M

2M

LAD Bus

3 Virtex 2000E FPGAs, 12 Memories (20 MB)


Annapolis wildstar fpga board

Host

LAD Bus


Starfire board simplified
StarFire Board (Simplified)

1M

1M

1M

Virtex

1000

“1”

Host

1M

1M

1M

LAD Bus

1 Virtex 1000 FPGA, 6 Memories (6 MB)


Memory layout
Memory Layout

  • Local

    • Always 32-bit words

    • Two on PE 1

    • Two on PE 2

  • Mezzanine

    • 32 or 64, depending on source (PEx / PE0)

      • Both address and word size

    • 4 between PE 1 & 0

    • 4 between PE 2 & 0

  • Latency: 4 cycles


Mezzanine memory
Mezzanine Memory

  • 32 vs 64 (Same memory)

  • Switch Modes

    • 00 Straight

    • 01 Crossed

    • 10 Lo Thru

    • 11 Hi Thru

Mem

Mem

64

32

PEx

PE0


Pex 1 and 2
PEx (1 and 2)

Right

Local

Right

Mezz

STUFF

Right

Left

Left

Local

Left

Mezz

LAD


Annapolis wildstar fpga board
PE0

PE1

Right

Mezz

PE2

Right

Mezz

STUFF

Right

Left

PE1

Left

Mezz

PE2

Left

Mezz

LAD


Clocks 4 of them
Clocks – 4 of them!?

  • K, M, P, U

    • KClock LAD Transactions (K?)

    • MClock Memory Transactions

    • PClock Processing Clock

    • UClock User Clock

  • Okay, but why? What are they?


Kclock lad
KClock – LAD

  • PE  Host

  • 33MHz or 66MHz

    • 33MHz – Easy to Place and Route

    • 66MHz – 2X Host Bandwidth

    • Host and Chip must agree!!

      • Set in VHDL and Host Code

    • Clock is actually based on PCI Clock

      • Varies per host

      • Ours is approx. 33.23MHz / 66.46MHz

  • Asynchronous to all other clocks


Mclock memory
MClock – Memory

  • Speed of Memory IO

    • Both Local & Mezzanine

  • User Selectable

    • 25MHz – 133MHz Wildstar

    • 25MHz – 100MHz Starfire


Pclock processing
PClock – Processing

  • Based on MClock

    • Divisor between 1-16

    • Slower than MClock (Or Equal)

  • Can “Speed up” Memory I/O

    • Decoupling may allow different Speeds

    • Increase M, increase Divisor

    • Ex: Slow Component in Application (30MHz)

      • M=30Mhz & Divisor = 1  P=30MHz

      • M=60Mhz & Divisor = 2  P=30MHz

        • 2 Memory Accesses per Clock


Pclock processing more
PClock – Processing (More)

  • Optional

    • We normally don’t use it for ease

      • MClock is used Directly

        • Less Logic than “P=M/1”

      • No need to jump Clock Boundaries

    • Chip must either

      • Not care what the ratio is

      • Know at compile what ratio will be


Uclock user clock
UClock – User Clock

  • User Selectable

    • 0.32MHz – 133MHz Wildstar

    • 0.32MHz – 100MHz Starfire

  • We have never used it

    • 3 is plenty, isn't it?

  • Asynchronous to all other clocks


Hardware components
Hardware Components

  • Roll your own

    • Manual LAD addressing (33/66 Differ)

    • Manual Memory use Contention

    • Manual EVERYTHING!

    • CAN be very fast ~140 MHz

  • Annapolis Supplied Components

    • MUCH Easier

    • Slower (Approx. 40-60 MHz)


Lad bus
LAD Bus

  • 33MHz / 66MHz Selectable

    • Changes the communication protocol

      • Amt of Latency, etc..

  • Component Addressing scheme

    • 0x0000-0x7FFF – Component Within PE

    • Higher Bits Address Board and PE

      • Ignore them

        • unless you “roll your own” LAD code


Lad bus more
LAD Bus (More)

  • The Addressing of the LAD bus

    • A lot like subnet masks in IP Networking

    • MASK

      • Which bits address the component

      • Which bits are intra-component

    • BASE

      • Where does this component begin

    • ADDR&MASK==BASE “Are you talkin’ to ME?”

      • ADDR&(~MASK) = “What address in me?”

    • Examples:

      • B: 0x4800 M:0x7F00  0x4800 ~ 0x48FF

      • B: 0x3200 M:0x7C00  0x3200 ~ 0x35FF


Inside the chips
Inside the Chips

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

Annapolis

Provided

User

Provided

LAD


Lad mux
LAD-MUX

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Lad mux1
LAD-MUX

  • Gives LAD access to components

    • Bridges gap between IO Pins and “Logical” LAD

  • Handles Protocols for you

    • 66 and 33

  • ONE per chip


Reset
Reset

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Reset1
Reset

  • Allows Host to RESET the Chip

    • Causes clocks to destabilize momentarily

    • Causes chip to return to known init state

      • (If you write your VHDL right)

      • All Annapolis components are written right


Clocks
Clocks

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Clocks1
Clocks

  • Provides user access to

    • All 4 Clocks (or Clock x2)

    • When clocks are stable

      • “DLL locked” Signals

  • Clocks on a Virtex use DLLs

    • Delay-Locked Loop

    • not Dynamic Link Library

      • Shame on you windows users!


Register file
Register File

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Register file1
Register File

  • Provides host access to 1-D array of 32-bit registers

    • Size must be a power of 2

  • Can be used for:

    • Ready – “The host says I can go now”

    • Done – “Hey Host, I am done!”

    • Small 32-bit IO – “The answer is 42!”

    • Run time parameters – “Threshold is 63”


Lad to mem bridge
LAD to Mem Bridge

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Lad to mem bridge1
LAD to Mem Bridge

  • Provides host with access to the memories

    • Mezzanine or Local Memories

    • 2 Kinds, 32 and 64

  • Transfers happen in bursts

    • 256 DWORDS for 32 bit memories

    • 512 DWORDS for 64 bit memories

    • (its all transparent to the user though)


Memory mux
Memory-Mux

Your Application

Some

Memory

Mem

Mux

.

.

.

.

.

.

.

.

.

.

LAD-Mem

Bridge

RegFile

Reset

Some

Memory

Mem

Mux

Clocks

.

.

.

.

.

LAD-Mem

Bridge

LAD

Mux

LAD


Memory mux1
Memory-Mux

  • Provide multiple clients with access to the memories

    • Arbitrates between clients

      • Priority

        • Number of the client decides priority

        • Maximum utilization

        • Might starve some clients

      • Fair

        • Round Robin

        • Wastes some cycles

        • Each Client gets 1/n


Memory access
Memory Access

  • Address of DWORD or QWORD

  • Data_Out To Memory

  • Data_In From Memory

  • Write Direction of Request

  • Request “I want memory”

  • Acknowledge “Okay!”

  • Data_Valid 4/5 Cycle Delayed Ack (See Bugs Later)

    • 32 bit Memories Only

  • Low/High Enable “This half is useful”

    • 64 bit Memories Only

  • High/Low_Data_Valid 4/5 Cycle Delayed (Ack & Low/High Enable)

    • 64 bit Memories Only






Others useful
Others - Useful

  • RAM Blocks

    • Host and Client Access to on-chip memories

    • 256 32-bit words

  • Interrupts to host

  • Systolic Buses

    • 2 36-bit busses between PE1 and PE2

      • top and bottom

    • Bi-directional

      • Tri-state

  • PE0 Standard Buses

    • 2 2-bit busses between PE0 and Pex

    • Bi-directional

      • Tri-state


Others useless
Others – Useless

  • LED (there are 2 LEDs per Chip)

    • Red and Green

    • Cant see them…

  • IO Card

    • 114 bit IO

    • We don’t have one

  • Test Pins

    • 18 bits

    • No testing our board, please! =)


Software api
Software API

  • Annapolis Supplied

  • Driver Functions

    • Open, Close, Set Clocks, DMA, Read, Write, Download Configurations, Interrupt, Readback, etc..

  • Convenience Functions

    • Interface code to the“Lad to Memory Bridges”


Open close
Open/Close

  • Grabs the board exclusively

    • Uses kernel mutex

    • CAN do it in shared mode, but DONT

  • Can set LAD Speed as well

    • See “Bugs” Later


Chip configuration
Chip Configuration

  • Programs a PE from a memory array containing the bitstream

    • x86 files

  • Can de-program as well

    • Why bother?

      • As long as everyone “Plays nice”

  • BE CAREFUL WHAT YOU PROGRAM!

    • if you program a PE with a bitstream that is corrupted, or not for the correct chip, or mangled in some way you can release the magic smoke from the chips!

    • $40,000 board!


Set clock speeds
Set Clock Speeds

  • UClock speed

  • MClock speed

    • and PClock divisor


Register io
Register IO

  • Reads/Writes to the LAD Address space

    • to communicate with anything plugged into a LAD MUX

      • Reset

      • Register Files

      • Etc.


Memory io
Memory IO

  • for LAD to MEM Bridges

  • Abstracts the IO Bursts, addressing, etc.

  • Create Memory Objects

  • Read/Write/Copy/Set

  • Release


Others you wont need
Others You Wont Need

  • Display (4 Char LCD on the board)

  • Interrupts

  • Temperature / Power

  • Readback / Singleshot

  • DMA

  • Versions / Hardware Config

  • Etc..


Tools
Tools

  • You write Host code (in c)

    • compile with gcc, etc.

    • Link in the libraries and such

  • You write Chip code (in VHDL)

    • Simulate and Verify with ModelSim

    • Synthesize with Synplify

      • Linux / Solaris / WinNT

    • Place and Route with Xilinx foundation tools

      • WinNT / Linux (with wine)


Modelsim
ModelSim

  • VHDL Simulation tool

  • Annapolis provides

    • Host simulation components

    • VHDL Description of the WHOLE board

      • LAD

      • Memories (Local & Mezzanine)

      • Busses

      • Etc

  • You provide

    • VHDL to run inside the chip (May contain Annapolis components as well)

  • Talk to me if you want to use ModelSim to debug!


Synplify
Synplify

  • Synplicity Inc.

  • Converts VHDL (or Verilog) into an EDIF

    • EDIF = description of your program in terms of virtex parts (4 input LUTs, FlipFlops, Ramblocks, Etc)

  • Fast

    • 1-30 minutes


Place and route
Place and Route

  • Maps to lower level components

  • Lays them out

  • Routes between them

  • Slow

    • 10 minutes – 2 days

  • Provides a bitstream (.bit file)

    • directly converted to .x86 for config


Paths environment
Paths & Environment

  • Need environment variables and path additions

    • add this to the end of your your .cshrc:source ~cs670/WildExamples/cshrc_additions

    • If you use bash, sh, zsh, etc..

      • You’re on your own!

      • Look at the file, figure it out!

        OR

      • Use csh or tcsh!


Examples
Examples

  • ~cs670/WildExamples/csu_example

    • Basic CSU made example using only PE1

    • Copies 1Mb from Left Right Local Mem

  • ~cs670/WildExamples/annap_example

    • All the Annapolis supplied examples

    • May need path adjusting, etc..

    • Not meant to work as is

    • Useful to get a feel for other stuff


Hints
Hints

  • Timing

    • Count MClock, and put it in a RegFile

      • Cycles / Freq = Time

    • Host timing is too coarse

  • “Start / Stop” and “Working / Done”

    • Use a RegFile – Easier than Interrupts

      • (Haven’t gotten them to work with LAD Mux)


Manuals
Manuals

  • Ask Sanjay! =)

    • 1 copy of our HUGE Starfire / Wildstar manuals

  • I have the original…

    • You may use it near my desk…

    • If it wanders from my cube

      • Broken Legs


Annapolis wildstar fpga board
HELP!

  • Bugs? - “99% correct is 100% Wrong”

    • 1 – Reread your VHDL and host code

      • Silly bugs are easy to make, and spot

    • 2 – Simulate it

      • You can see the signals. It almost always agrees with the actual hardware

    • 3 – Simulate again

      • No Really… Simulate it!

    • 4 – Look in the manuals

      • Helpful sometimes…

    • 5 – rossc@cs.colostate.edu


Annapolis wildstar fpga board
BUGS!!!!

  • Querying the LAD bus speed in host code will return 66MHz if the LAD Bus was *EVER* at 66MHz since last reboot… even if it is *CURRENTLY* at 33MHz!

    • DON’T USE IT, EVER!

  • The Data_Valid Signals are WRONG! They appear to be delayed 5 cycles instead of 4 in the real code. They are correct in simulation.

    • Use a 4 cycle delay on (Req and Ack) Instead!

    • Use the simulation to ensure your delayed signal matches


Lets look at it
Lets Look at it!

  • Lemme open emacs…

  • VHDL

  • Host Code

  • Execution

  • Simulation

    • Little Wiggly Green Wires!