1 / 1

Iakovos Mavroidis 1 , Luciano Lavagno 2 , Ioannis Mavroidis 1 , Ioannis Papaefstathiou 1 ,

global shared memory . external DDR . CUDA program. FPGA . Design Space Exploration. dcache. CUDA host program. SW kernels. HW kernels. multi-core CPU. Accelerator . DMA. SW Kernels. High-level synthesis . Porting to multi-core . DDR controller . HW Kernels. AXI4 . Compilation.

landen
Download Presentation

Iakovos Mavroidis 1 , Luciano Lavagno 2 , Ioannis Mavroidis 1 , Ioannis Papaefstathiou 1 ,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. global shared memory external DDR CUDA program FPGA Design Space Exploration dcache CUDA host program SW kernels HW kernels multi-core CPU Accelerator DMA SW Kernels High-level synthesis Porting to multi-core DDR controller HW Kernels AXI4 Compilation local mems CUDA Host Program local mems dcache cntrl HW accelerator executable RTL AXI4 CUDA code transform SystemC AST CUDA AST icache icache front end icache multi-core System Integration physical design synthesis Microblaze0 Microblaze1 ... MicroblazeN SystemC code RTL back end lmb lmb lmb scratch pad scratch pad scratch pad AXI4_Lite uart timer mutex HW kernel invocation CUDA kernel logical threads ... thread blocks MCUDA schedule thread pool schedule Xilkernel core core core core Iakovos Mavroidis1, Luciano Lavagno2, Ioannis Mavroidis1, Ioannis Papaefstathiou1, Mihai Lazarescu2, Andrés Otero3, Eduardo de la Torre3, Florian Schäfer4 1Microprocessor and Hardware Lab, Technical University of Crete, Chania, Greece 2Department of Electronics, Politecnico di Torino, Torino, Italy 3Centre of Industrial Electronics, Universidad Politécnica de Madrid, Madrid, Spain 4FSResult GmbH, Germany http://www.fastcuda.eu/ • FASTCUDA Design Flow • FASTCUDA Architecture FASTCUDA aims to bridge the gap between GPUs and FPGAs by taking advantage of the software parallelization effort that has gone into porting scientific applications to GPUs, also on FPGA-based platforms. Shared memory, multi-core CPU communicating with an FPGA-based accelerator. • Block Diagram • Toolset (GUI) CUDA to Verilog synthesis • Execution Time vs. Area Matrix multiplication of two 256x256 element matrices on multi-core processor using 32KB scratchpad per core, 64KB shared data cache on HW accelerator • Software Stack • High Level Synthesis Flow The research leading to these results has received funding from the European Union 7th FrameworkProgramme [FP7/2007-2013], under the FASTCUDA project (grant agreement no 286770).

More Related