Design framework for partial run time fpga reconfiguration
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Design Framework for Partial Run-Time FPGA Reconfiguration PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

ERSA 2008 Las Vegas, NV July 14–17, 2008. Design Framework for Partial Run-Time FPGA Reconfiguration. Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida. Outline. Introduction

Download Presentation

Design Framework for Partial Run-Time FPGA Reconfiguration

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design framework for partial run time fpga reconfiguration

ERSA 2008

Las Vegas, NV

July 14–17, 2008

Design Framework for Partial Run-Time FPGA Reconfiguration

Chris Conger, Ann Gordon-Ross,

and Alan D. George

Presented by: Abelardo Jara-Berrocal

HCS Research Laboratory

College of Engineering

University of Florida


Outline

Outline

  • Introduction

  • Partial Reconfiguration (PR) Overview

  • Proposed Design Methodologies

  • Framework analysis

  • Conclusions


Introduction fully reconfigurable systems

Does’nt fit

Module A

Module B

Module A

Module C

Module A

Module A

Module C

Module C

Module A

Module A

Module C

Module C

Module C

Module B

Module B

Module B

Module B

Module B

Module B

Introduction – Fully reconfigurable systems

Battery

FPGA

Config 1

Configuration lines

disabled

disabled

enabled

System controller

General purpose I/O

Config 2

disabled

enabled

Bitstreams storage

disabled

Required design

Shared memory

External I/O

Config 3

Config 1 Request

Config 2 Request

1. Device too small for complex designs

2. Big full bitstreams (long reconfiguration time)

3. Complete system operation is halted prior to reconfiguration

Design station


Introduction the virtex 4 pr architecture

Introduction – The Virtex 4 PR architecture

  • Newer Xilinx FPGA families offer partial reconfiguration feature

  • A rectangular region of the FPGA can be reconfigured without affecting the remaining FPGA area

    • System can continue operating without interruption

)

Reconfigurable region 1

Reconfigurable region 2


Introduction a sample pr architecture

Module A

ICAP

Module C

disabled

Module A

Module A

Controller (Microblaze)

Flash controller

Module B

disabled

Module B

Module B

Module C

Module C

Introduction – A sample PR architecture

Battery

FPGA

disabled

enabled

JTAG

Base system configuration

Bitstreams storage

enabled

External I/O

Reconfigurable area

Static area

Module A request

1. System controller does not need to be placed in an external device

2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz)

3. Smaller partial bitstreams

4. No need to halt complete system when reconfiguring a module

5. Time multiplexing of FPGA resources, load and unload HW modules on demand


Introduction current pr design flow

Module A

ICAP

Module C

Controller (Microblaze)

Flash controller

Module B

Modules: A and B

PRR 1

Static modules

PRR 2

Modules: C

Introduction – Current PR Design Flow

  • Steps

    • Partition the system into modules

    • Define static modules and reconfigurable modules

    • Decide the number of PR regions (PRRs)

    • Decide PRR sizes, shapes and locations

    • Map modules to PRRs

    • Define PRR interfaces, instantiate slice macros for PRR interfaces

  • Optimization problems

    • Design partitioning

    • Number of PRRs

    • PRR sizes, shapes and locations

    • Mapping PRMs to PRRs

    • Type and placement of PRR interfaces

Design partitioning

Design floorplanning and budgeting

Static modules

Reconfigurable Modules (PRMs)

FPGA

Static region

2

# of PRRs?

1


Introduction early access pr design flow

Introduction – Early Access PR Design Flow

  • Introduced by Xilinx in FPL’06

  • Major improvements:

    • Automatic implementation scripts

    • Rectangular regions (not full column reconfiguration)

    • Static nets can cross reconfigurable regions

    • Slice macros replace bus macros

  • Partitioning and floorplanning steps are manually executed

    • Design guidelines for these steps are not provided

  • Placement and PRRs constraints

    Reconfigurable design specifications

    PRM Bitstreams

    Xilinx PR Implementation Flow

    Design floorplanning and budgeting

    Design partitioning

    (manual)

    Full Initial Bistream

    (automatic)

    Potential for development of automatic CAD tools


    Introduction current pr design tools limitations

    Introduction – Current PR design tools limitations

    • PR design is a very specialized task

    • Only a physical level of support is provided

      • Architectural knowledge of the target device is a must

      • Not very flexible, many design constraints

    • Partitioning and floorplanning steps are manually executed

      • No performance sensitive design guidelines are provided

      • No automatic heuristics based design flow is available too

    • Lack of abstraction from low level details discourages designers from using PR

      • Difficult for many end users

    In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.


    Pr overview taxonomy of pr systems design flows

    PR Overview – Taxonomy of PR systems design flows

    PR System Design Flow

    Multipurpose

    Special purpose

    • Highly specialized systems design

    • All PRMs that will exist on the system are known at design time

    • Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it

    • Output is:

      • Floorplan defining a static region and a set of optimized PRRs

      • The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping)

    • Not optimized for a specific application

    • PRMs required by the application are not known when designing the base system

    • Goal is to design a flexible and reusable base design that can be used for several different PR systems

    • Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces

    • Generated floorplan is used as input template for the PRMs implementation


    Proposed design methodology special purpose

    Proposed Design Methodology: Special-Purpose

    • Partition the system into several hardware modules

    • Synthesize the hardware modules

    • Use a control flow graph (CFG) and a states table to represent:

      • Application states and the transitions between them (execution path coverage)

      • Set of modules required in each application state

    Let’s see an example


    Proposed design methodology special purpose1

    Proposed Design Methodology: Special-Purpose

    • Define region partitioning constraints

    S3

    S2

    C

    F

    S1

    G

    S4

    D

    S5

    E

    Establishing constraints

    Reconfigurable

    Static

    1. A, B are present in all states (static modules)

    2. C, F, G and D are reconfigurable modules (PRMs)

    3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C)

    4. F, G, D and E can be placed in the same PRR

    5. C, D and E can be placed in the same PRR


    Proposed design methodology special purpose2

    2

    Proposed Design Methodology: Special-Purpose

    • Define the number of PRRs to be used

      • Optimization variable

      • Number is computed based on CFG and states table

    1 ?

    4 ?

    # PRRs =

    • Define a PRMs to PRRs mapping

      • Optimization problem

      • Combinatorial design space

      • Design space is reduced usign design constraints

    Static Region:

    PRR 1:

    PRR 2:

    A, B

    C, D, E

    F, G

    Possible solution (not necessarily the optimal)


    Proposed design methodology special purpose3

    Proposed Design Methodology: Special-Purpose

    • And when do we size our PRRs?

      • Don’t worry, it is our next step 

    Module A

    Module B

    Required static region resources (Resources are added)

    Module C

    Module D

    Modules profile

    Required PRR 1 Resources (Maximum of each resource type)

    Module E

    Module F

    Slices

    BRAMs

    DSP48s

    Required PRR 2 Resources (Maximum of each resource type)

    Module G


    Proposed design methodology special purpose4

    Proposed Design Methodology: Special-Purpose

    • Define the PRR sizes, shapes, locations inside the FPGA fabric

      • Floorplanning optimization problem

      • Proper metrics for PRR performance analysis are required

      • Design guidelines for efficient PRR floorplanning are also a necessity

    PRR 1 Resources

    PRR1

    Static region

    Final optimized custom base system floorplan

    PRR 2 Resources

    PRR2

    FPGA

    • Define PRR interfaces

      • Place slice macros

    Reconfigurable region with enough resources for PRR1

    We do the same for PRR2


    Proposed design methodology special purpose5

    Proposed Design Methodology: Special-Purpose

    • Methodology outputs

    Custom base system

    PRMs to PRRs mapping

    • They are used as input files for the automatic Xilinx PR Design Flow


    Proposed design methodology special purpose6

    Proposed Design Methodology: Special-Purpose

    • Opportunity to automate this flow through design tools

    • Optimization variables

      • Number of PRRs

      • PRRs sizes, shapes, and locations

      • PRMs to PRRs mapping

      • Other additional optimization variables can be defined

    • Several possible cost functions:

      • Area wastage

      • Power usage

      • Application latency

      • Throughput


    Framework analysis prr geometries

    Framework analysis – PRR Geometries

    • PR system design flows require:

      • Proper metrics for PRR performance analysis

      • Design guidelines for efficient PRR floorplanning

    • Study of the effects of varying PRR shape over

      • Maximum Clock Frequency

      • Partial Bitstream Size

    • Five separate test cores:

      • Beamforming (DSP/slice)

      • CFAR (slice/memory)

      • AES (register)

      • ARM7 softcore (hybrid)

      • Sine/Cosine LUT (memory)

    • Performed on V4SX55 thus far

    Aspect ratio =

    PRR Height / PRR Width


    Framework analysis beamforming 125 mhz 40

    Framework analysis – Beamforming (~125 MHz, 40%)

    • 5022 slices

    • 16 DSP48s

    • 17 RAMB16s

    • Baseline, non-PR performance = 1614 kB, 127.845 MHz

    Clock frequency (MHz)

    Bitstream size (kB)

    Aspect ratio

    Aspect ratio


    Framework analysis cfar 100 mhz 16

    Framework analysis – CFAR (~100 MHz, 16%)

    • 2610 slices

    • 2 DSP48s

    • 34 RAMB16s

    • Baseline, non-PR performance = 1001 kB, 103.616 MHz

    Clock frequency (MHz)

    Bitstream size (kB)

    Aspect ratio

    Aspect ratio


    Framework analysis aes 80 mhz 13 75

    Framework analysis – AES (~80 MHz, 13.75%)

    • 3634 slices

    • 3943 registers

    • 4 RAMB16s

    • Baseline, non-PR performance = 1393 kB, 80.483 MHz

    Bitstream size (kB)

    Clock frequency (MHz)

    Aspect ratio

    Aspect ratio


    Framework analysis arm7 40 mhz 6 8

    Framework analysis – ARM7 (~40 MHz, 6.8%)

    • 1826 slices

    • 16 DSP48s

    • 10 RAMB16s

    • Baseline, non-PR performance = 872 kB, 40.985 MHz

    Bitstream size (kB)

    Clock frequency (MHz)

    Aspect ratio

    Aspect ratio


    Framework analysis sine cosine lut

    Framework analysis – Sine/Cosine LUT

    • 107 slices

    • 27 RAMB16s

    • Baseline, non-PR performance = 571 kB, 204.918 MHz

    Bitstream size (kB)

    Clock frequency (MHz)

    Aspect ratio

    Aspect ratio


    Framework analysis prr geometries1

    Framework analysis – PRR Geometries

    • Slice-intensive designs show best bitstream size/clock frequency performance with aspect ratio around 2-4

      • Roughly equivalent to aspect ratio of the FPGA as a whole

    • Non-slice intensive designs show best bitstream performance with aspect ratio >> 4

      • Due to columnar distribution of RAMB16/DSP48 resources on chip

      • Clock frequency relatively insensitive to aspect ratio

      • Not shown in graph: resource wastage also improved

    • Results are more pronounced for high frequency designs

    • However, aspect ratio not the only design consideration

      • Placement on a chip relative to other regions, pins, or resources may affect (restrict) choice of PRR shape


    Conclusions contributions of this work

    Conclusions - Contributions of this work

    • Taxonomy for PR systems design flows and a design methodology for efficient development of each type

    • Identification of relevant optimization variables and constraints

      • Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning

      • Propose their incorporation in a future automatic design tool

    • Study of the effects of varying PRR shape

      • Maximum Clock Frequency

      • Partial Bitstream Size

      • Multiple classes of cores/designs

        • Memory-intensive

        • DSP-intensive

        • Combinational Logic-intensive

        • Register-intensive

        • Etc.

    • PRR floorplanning guidelines definitions and delivery


    Questions

    Questions


  • Login