a physical resource management approach to minimizing fpga partial reconfiguration overhead n.
Skip this Video
Download Presentation
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead

Loading in 2 Seconds...

play fullscreen
1 / 18

A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead - PowerPoint PPT Presentation

  • Uploaded on

A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead. Heng Tan and Ronald F. DeMara University of Central Florida. Agenda. Introduction Previous work on reducing Partial Reconfiguration overhead Develop a multi-step physical area management strategy

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead' - eliana-pickett

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a physical resource management approach to minimizing fpga partial reconfiguration overhead

A Physical Resource Management Approach toMinimizing FPGA Partial Reconfiguration Overhead

Heng Tan and Ronald F. DeMaraUniversity of Central Florida

  • Introduction
  • Previous work on reducing Partial Reconfiguration overhead
  • Develop a multi-step physical area management strategy
  • Experimental case studies:
    • 4 simple circuits: serial, parallel, block multiplier resource,high fanout carry chain LUT multiplier
    • 2 large circuits: SECDED, MD5 step function (new)
  • Conclusion and future work

For partial reconfiguration operations, especially those running on a SOC configuration,

  • storage space and
  • reconfiguration speed

are crucial.

 Find way to optimize them . . .

previous work
Previous Work
  • Architecture-level [Ganesan2000]
    • Pipelining
    • overlap execution of one temporal partition with reconfiguration of another
  • Logical level [Raghuraman2005]
    • Minimization of frames
    • relating the number of frames that need to be downloaded into FPGAs to the number of minterms of a specially constructed logic function
  • Hardware level [Compton2002][Hauck1999]
    • Requires dedicated HW component
    • Compression or defragmentation performed externally at runtime
  • Physical-level resource management?
    • Not specifically addressed for partial reconfiguration modules in literature
    • But can be done … and … also cascaded with above techniques for multiplicative benefit
module based flow
Module Based Flow

Basic concept and capability for PR proposed by Xilinx

  • Including Fixed modules and PR modules
  • Using Bus Macro
  • Suitable for potential full automation
  • Primary PR technique to study
column level configuration format
Column-Level Configuration Format
  • For Virtex II/Pro series
  • Including IOB, IOI, CLB, GCLK, BlockRAM, and BlockRAM Interconnect
  • Labeling with 32-bit addresses composed of a Block Address (BA), a Major Address (MJA), a Minor Address (MNA), and a byte number
  • Only contents of CLB column are studied in this research
bitstream compression
Bitstream Compression
  • CLB columns program the configurable logic blocks, routing, and most interconnect resources.
  • Each CLB column contains 2 columns of slices.
  • 22 frames are utilized within the bitstream for each column of slices, describing the logic and routing information respectively.
  • Each frame occupies 424 bytes.
  • In bitstream of PR modules, a compression technique is already used by Xilinx to represent the unused CLB frame.
  • For unused CLB frames, only 10 bytes are used instead of 424 to describe empty contents.
optimization strategy
Optimization Strategy
  • Primary goal: minimize the number of columns of slices utilized, including routing resources
  • Secondary goal: do not incur increase in propagation delay after area optimization
  • Physical area management procedure:
      • Region Allocation: define size and boundary
      • Pin Assignment: top/bottom edge preferred
      • Column Alignment: fill column of slices at a time
      • Choke-Point Elimination: resolve high fanout cases
      • Repeat
case study
Case Study
  • The hardware platform is Xilinx Virtex II Pro VP7 device.
  • Module-based partial reconfiguration flow is adopted to generate the partial reconfiguration bitstream.
  • The Xilinx ISE 6.3 is used to support the module based flow.
  • The area constraints are entered directly into User Constrain File (.ucf)before map and routing.
  • Four representative small cases and two larger size cases studies are tested for the strategy.
  • Similar or identical external pin arrangement.
  • Hypothesis: larger circuits can achieve more savings
4 lut design optimization parallel
4-LUT Design Optimization (parallel)
  • Simple 4-LUT elements
  • Parallel logic path with direct input from external signals
  • LUTs feed outputs straight though to flip flops
  • Best strategy is by locating them in a single column close to the external pins
shifter optimization serial
Shifter Optimization (serial)
  • All logic elements are cascaded in contiguous string of CLBs
  • Attributes of this serial circuit functionality will have best arrangement from input to output in single column serially
block multiplier optimization
Block Multiplier Optimization
  • Block multiplier resource involved
  • Requires balancing the routing between two paths.
  • One path is between the block multiplier and the LUTs. The other path is from the LUTs to the external pins.
  • Decreased savings compared to use of LUT-only circuits
lut multiplier optimization
LUT Multiplier Optimization
  • High fan-out occurs because of the carry chains
  • Single column style is not optimal
  • Arrange related LUTs around each other using adjacent columns
larger case studies
Larger Case Studies
  • SECDED as full PR module and MD5 step functions were developed as PR module vs. SHA family
  • During the optimization process, not every slice has been specifically placed because of the large number of resources involved
  • Only the slices on the critical path are constrained.
  • These are comparatively larger modules, increased bitstream savings of 33% and 30% are achieved
area optimization results
Area Optimization Results
  • A physical resource area management strategy is proposed to minimize the reconfiguration overhead.
  • Experiments show that up to one-third of size reduction can be achieved for partial reconfiguration modules.
  • The maximum propagation delay has also been decreased slightly in most cases (not increased).
  • On the other hand, the larger the module is, the more complicated and time consuming the process becomes.
  • Providing autonomous area optimization capabilities is future work for integrating into our Multi-Layer Runtime Reconfiguration Architecture FGPA framework
current work direct bitstream management
Current Work:Direct Bitstream Management
  • Change one-bit full adder to a one-bit full subtracter
  • Both have three one-bit inputs and two one-bit outputs.
  • Both used 2 LUTs with identical logic interconnections between LUTs and I/O signals.
  • Only difference between them is actually only one truth table stored inside one LUT, changing from 0xE8 to 0x8E

Combined MD5 / SHA-1

Step Function – Area Utilization

Original AlgorithmModule Based Frame Based

SHA-1 192(slices) 65(slices)32(slices)

MD5 881(slices) 168(slices) 32(slices)

Combined 1068(slices) 324(slices) 32(slices)