1 / 26

Runtime Adaptation on Dataflow HPC Platforms

NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013). Runtime Adaptation on Dataflow HPC Platforms. R. Cattaneo , C. Pilato , M. Mastinu , M.D. Santambrogio Politecnico di Milano – Dip. di Elettronica , Informazione e Bioingegneria O. Kadlcek , O. Pell

ayala
Download Presentation

Runtime Adaptation on Dataflow HPC Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013) Runtime Adaptation on Dataflow HPC Platforms R. Cattaneo, C. Pilato, M. Mastinu, M.D. Santambrogio Politecnico di Milano – Dip. di Elettronica, Informazione e Bioingegneria O. Kadlcek, O. Pell Maxeler Technologies Ltd., London, UK

  2. Context Definition • The portion of the application that needs to be accelerated is usually implemented in the hardware • Resource limitations can become a bottleneck • In some contexts, the HPC application should be able to adapt to the environment • Partial dynamic reconfiguration is a well-know technique to change the behavior at run timewhile reusing the same logicacross different tasks

  3. Reconfigurable Computing “Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much higher performance than software, while maintaing a higher level of flexibility than hardware” (K. Compton and S. Hauck, Reconfigurable Computing: a Survey of Systems and software,2002)

  4. Reasons Behind • Some applications require performance that cannot be achieved by software • Some applications require to be flexible, modifiable, adaptable. Traditional hardware cannot achieve these results • Reconfigurable Computing platforms allow to be altered after their deployment, turning into a high-performance device able to meet resources constraints, adaptability constraints and reliability constraints

  5. Maxeler Architecture • Maxeler systems are based on the interaction between a CPU and an FPGA • Maxeler exploits FPGAs only as devices devoted to hardware acceleration Why do not try enhancing the flexibility and performance of Maxeler platforms by exploiting some intrinsic characteristics of the FPGAs?

  6. Objectives Rationale • Dynamic Partial Reconfiguration is a technique that can be applied to cope with problems such as the lack of available resources and the system adaptability and reliability • Maxeler architectures are very efficient for computation but they do not support the use of Dynamic Partial Reconfiguration Goals • Designing a new tool flow able to support Dynamic Partial Reconfiguration in Maxeler architectures to offer adaptivity in the HPC domain

  7. Canny edge detector

  8. Reconfiguration in FPGAs FPGA • Useful Definitions • Full Bitstream • Reconfigurable partitions • Reconfigurable modules • Partial Bitstream • Configurations Full bitstream

  9. Maxeler Architecture

  10. Example application SLiC SLiC Manager

  11. MaxCompiler flow MaxIDE Java runtime Java compilation VHDL BIT file

  12. Preliminary Considerations • Hierarchical design VS flat design • NGDBuild, Map, PAR, Bitgen, are run as many times as the number of configurations • Need for the PXML file to lead the process

  13. Proposed Approach • Focusing on Kernels instead of Manager • Kernels in the same Reconfigurable Block must have the same characteristics; • In every Configuration, exactly one Kernel must be assigned to each Reconfigurable Bock; • The same Kernel can not be placed in two different Reconfigurable Blocks. • Preserving as much as possible MaxCompiler/Xilinx tool flow structure • Mask the details to the designer

  14. Reconfiguration on Kernels

  15. User interface: DFE code PRManager Main ... Configuration A = ... Configuration B = ... build(A,B) • Reconfigurable Block = Reconfigurable Partition • Kernel = Reconfigurable Module

  16. Considerations

  17. User interface: Host code DFE max_reconfig_partial_bitstream

  18. Case Study: Edge Detection • Canny edge detection is applied to a video • There are two Reconfigurable Blocks and a total of four filters • each filter represents a Reconfigurable Module • Initially, the first two filters are applied • Then, the device is partially reconfigured and the other two filters are applied DFE 19

  19. MaxWorkstation • The targeted platform is MaxWorkstation • It contains a Intel i7 870 quad core CPU with 16 GB RAM • The Intel CPU is connected to the DFE via PCI Express • The DFE has 24 GB RAM, and it is a MAX3 board - XilinxV6

  20. Experimental Results • Methodology applied to a video taken from “Mission Impossible” • combined with a set of compiler extensions for the automatic code generation of the kernels • details are totally hidden to the designer [VIDEO]

  21. Conclusions and Future Work • The proposed approach integrated Partial Dynamic Reconfiguration in a dataflow architecture • The process is totally transparent to the designer • Future works will focus on the current limitations: • Reconfigurable Areas constraints can be specified only as multiple of clock regions • During the partial reconfiguration of some Reconfigurable Blocks, all the Kernels are in reset status

  22. Implementation: design flow The build process is divided in four main stages

  23. First build stage • When the build process starts, MaxDC, XST and NGCBuild are run for each Reconfigurable Block and for the static part independently; • The result of this first stage is a large number of netlist files.

  24. Second build stage • The second stage consist in running NGDBuild, MAP, Par, pr_verify and Bitgen for each configuration • PXML file is automatically generated • The static part is implemented only in the first configuration • The reconfigurable modules are implemented only the first time they appear in a Configuration

  25. Final stage • Once the full bitstream and all the partial ones have been generated, they are encapsulated in the .Max file • The first Configuration passed to the build method is choosen as the “default” Configuration • This means that its full bitstream will be loaded in the CFPGA when the program starts

More Related