A physical resource management approach to minimizing fpga partial reconfiguration overhead
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead. Heng Tan and Ronald F. DeMara University of Central Florida. Agenda. Introduction Previous work on reducing Partial Reconfiguration overhead Develop a multi-step physical area management strategy

Download Presentation

A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

A physical resource management approach to minimizing fpga partial reconfiguration overhead

A Physical Resource Management Approach toMinimizing FPGA Partial Reconfiguration Overhead

Heng Tan and Ronald F. DeMaraUniversity of Central Florida



  • Introduction

  • Previous work on reducing Partial Reconfiguration overhead

  • Develop a multi-step physical area management strategy

  • Experimental case studies:

    • 4 simple circuits: serial, parallel, block multiplier resource,high fanout carry chain LUT multiplier

    • 2 large circuits: SECDED, MD5 step function (new)

  • Conclusion and future work



For partial reconfiguration operations, especially those running on a SOC configuration,

  • storage space and

  • reconfiguration speed

    are crucial.

     Find way to optimize them . . .

Previous work

Previous Work

  • Architecture-level [Ganesan2000]

    • Pipelining

    • overlap execution of one temporal partition with reconfiguration of another

  • Logical level [Raghuraman2005]

    • Minimization of frames

    • relating the number of frames that need to be downloaded into FPGAs to the number of minterms of a specially constructed logic function

  • Hardware level [Compton2002][Hauck1999]

    • Requires dedicated HW component

    • Compression or defragmentation performed externally at runtime

  • Physical-level resource management?

    • Not specifically addressed for partial reconfiguration modules in literature

    • But can be done … and … also cascaded with above techniques for multiplicative benefit

Module based flow

Module Based Flow

Basic concept and capability for PR proposed by Xilinx

  • Including Fixed modules and PR modules

  • Using Bus Macro

  • Suitable for potential full automation

  • Primary PR technique to study

Column level configuration format

Column-Level Configuration Format

  • For Virtex II/Pro series

  • Including IOB, IOI, CLB, GCLK, BlockRAM, and BlockRAM Interconnect

  • Labeling with 32-bit addresses composed of a Block Address (BA), a Major Address (MJA), a Minor Address (MNA), and a byte number

  • Only contents of CLB column are studied in this research

Bitstream compression

Bitstream Compression

  • CLB columns program the configurable logic blocks, routing, and most interconnect resources.

  • Each CLB column contains 2 columns of slices.

  • 22 frames are utilized within the bitstream for each column of slices, describing the logic and routing information respectively.

  • Each frame occupies 424 bytes.

  • In bitstream of PR modules, a compression technique is already used by Xilinx to represent the unused CLB frame.

  • For unused CLB frames, only 10 bytes are used instead of 424 to describe empty contents.

Optimization strategy

Optimization Strategy

  • Primary goal: minimize the number of columns of slices utilized, including routing resources

  • Secondary goal: do not incur increase in propagation delay after area optimization

  • Physical area management procedure:

    • Region Allocation: define size and boundary

    • Pin Assignment: top/bottom edge preferred

    • Column Alignment: fill column of slices at a time

    • Choke-Point Elimination: resolve high fanout cases

    • Repeat

Case study

Case Study

  • The hardware platform is Xilinx Virtex II Pro VP7 device.

  • Module-based partial reconfiguration flow is adopted to generate the partial reconfiguration bitstream.

  • The Xilinx ISE 6.3 is used to support the module based flow.

  • The area constraints are entered directly into User Constrain File (.ucf)before map and routing.

  • Four representative small cases and two larger size cases studies are tested for the strategy.

  • Similar or identical external pin arrangement.

  • Hypothesis: larger circuits can achieve more savings

4 lut design optimization parallel

4-LUT Design Optimization (parallel)

  • Simple 4-LUT elements

  • Parallel logic path with direct input from external signals

  • LUTs feed outputs straight though to flip flops

  • Best strategy is by locating them in a single column close to the external pins

Shifter optimization serial

Shifter Optimization (serial)

  • All logic elements are cascaded in contiguous string of CLBs

  • Attributes of this serial circuit functionality will have best arrangement from input to output in single column serially

Block multiplier optimization

Block Multiplier Optimization

  • Block multiplier resource involved

  • Requires balancing the routing between two paths.

  • One path is between the block multiplier and the LUTs. The other path is from the LUTs to the external pins.

  • Decreased savings compared to use of LUT-only circuits

Lut multiplier optimization

LUT Multiplier Optimization

  • High fan-out occurs because of the carry chains

  • Single column style is not optimal

  • Arrange related LUTs around each other using adjacent columns

Larger case studies

Larger Case Studies

  • SECDED as full PR module and MD5 step functions were developed as PR module vs. SHA family

  • During the optimization process, not every slice has been specifically placed because of the large number of resources involved

  • Only the slices on the critical path are constrained.

  • These are comparatively larger modules, increased bitstream savings of 33% and 30% are achieved

Benchmark results

Benchmark Results

Area optimization results

Area Optimization Results

  • A physical resource area management strategy is proposed to minimize the reconfiguration overhead.

  • Experiments show that up to one-third of size reduction can be achieved for partial reconfiguration modules.

  • The maximum propagation delay has also been decreased slightly in most cases (not increased).

  • On the other hand, the larger the module is, the more complicated and time consuming the process becomes.

  • Providing autonomous area optimization capabilities is future work for integrating into our Multi-Layer Runtime Reconfiguration Architecture FGPA framework

Multi layer runtime reconfiguration arch mrra

Multi-Layer Runtime Reconfiguration Arch. (MRRA)

Current work direct bitstream management

Current Work:Direct Bitstream Management

  • Change one-bit full adder to a one-bit full subtracter

  • Both have three one-bit inputs and two one-bit outputs.

  • Both used 2 LUTs with identical logic interconnections between LUTs and I/O signals.

  • Only difference between them is actually only one truth table stored inside one LUT, changing from 0xE8 to 0x8E

Combined MD5 / SHA-1

Step Function – Area Utilization

Original AlgorithmModule Based Frame Based

SHA-1 192(slices)65(slices)32(slices)

MD5 881(slices)168(slices) 32(slices)

Combined 1068(slices)324(slices) 32(slices)

  • Login