Compiled code acceleration on FPGAs

Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering University of California Riverside

Comparison of a dual core Opteron (2.5 GHz) to Virtex 4 & 5 FPGA on dp fp Balanced allocation of adders, multipliers and registers Use both DSP and logic for multipliers, run at lower speed Logic & wires for I/O interfaces Why? Are FPGA: A New HPC Platform? David Strensky, FPGAs Floating-Point Performance -- a pencil and paper evaluation, in HPCwire.com Future of Computing - W. Najjar

ROCCC Riverside Optimizing Compiler for Configurable Computing • Code acceleration • By mapping of circuits to FPGA • Achieve same speed as hand-written VHDL codes • Improved productivity • Allows design and algorithm space exploration • Keeps the user fully in control • We automate only what is very well understood Future of Computing - W. Najjar

Challenges • FPGA is an amorphous mass of logic • Structure provided by the code being accelerated • Repeatedly applied to a large data set: streams • Languages reflect the von Neumann execution model: • Highly structured and sequential (control driven) • Vast randomly accessible uniform memory Future of Computing - W. Najjar

Procedure, loop and array optimizations Instruction scheduling Pipelining and storage optimizations DSP CPU C/C++ GPU High level transformations Low level transformations Code generation Hi-CIRRF Lo-CIRRF Java Custom unit VHDL FPGA SystemC CIRRF Compiler Intermediate Representation for Reconfigurable Fabrics Binary ROCCC Overview • Limitations on the code: • No recursion • No pointers Future of Computing - W. Najjar

Input memory (on or off chip) Mem Fetch Unit Output Buffer Input Buffer Multiple loop bodies Unrolled and pipelined Mem Store Unit Output memory (on or off chip) A Decoupled Execution Model • Decoupled memory access from datapath • Parallel loop iterations • Pipelined datapath • Smart buffer (input) does data reuse • Memory fetch and store units, data path configured by compiler • Off chip accesses platform specific Future of Computing - W. Najjar

So far, working compiler with … • Extensive optimizations and transformations • Traditional and FPGA specific • Systolic array, pipelined unrolling, look-up tables • Compile + hardware support for data reuse • > 98% reduction in memory fetches on image codes • Efficient code generation and pipelining • Within 10% of hand-optimized HDL codes • Import of existing IP cores • Leverages huge wealth, integrated with C source code • Support for dynamic partial reconfiguration Future of Computing - W. Najjar

Indices of A[] coefficients Example: 3-tap FIR #define N 516 void begin_hw(); void end_hw(); int main() { int i; const int T[5] = {3,5,7}; int A[N], B[N]; begin_hw(); L1: for (i=0; i<=(N-3); i=i+1) { B[i] = T[0]*A[i] + T[1]*A[i+1] + T[2]*A[i+2]; } end_hw(); } Future of Computing - W. Najjar

Memory interface Memory interface CPU SRAM FPGA FPGA CPU CPU Fast Network SRAM SRAM CPU CPU Memory Memory FPGA FPGA RC Platform Models 1 2 3 Future of Computing - W. Najjar

What we have learned so far • Big speedups are possible • 10x to 1,000x on application codes, over Xeon and Itanium, molecular dynamics, bio-informatics, etc. • Works best with streaming data • New paradigms and tools • For spatio-temporal concurrency • Algorithms, languages, compilers, run-time systems etc Future of Computing - W. Najjar

Future? Very wide use of FPGAs • Why? • High throughput (> 10x) AND low power (< 25%) • How? • Mostly in Models 2 and 3, initially • Model2: See Intel QuickAssist, Xtremedata & DRC • Model 3: SGI, SRC & Cray • Contingency • Market brings price of FPGAs down • Availability of some software stack • for savvy programmers, initially • Potential • Multiple “killer apps” (to be discovered) Future of Computing - W. Najjar

Conclusion We as a research community should be ready Stamatis was Thank you Future of Computing - W. Najjar

Compiled code acceleration on FPGAs

Compiled code acceleration on FPGAs

Presentation Transcript

acceleration

Acceleration

Tutorial #1 - your first compiled code

Acceleration

Acceleration

A few notes on MicroSemi FPGAs

Code Compiled with javac

ACCELERATION

Acceleration

Acceleration

Developing Video Applications on Xilinx FPGAs

[Compiled by:]

FPGAs

Acceleration

Acceleration

Acceleration

FPGAs for the Masses: Hardware Acceleration without Hardware Design

Implementing An Associative Processor on FPGAs

Actel FPGAs

Hardware Acceleration of Applications Using FPGAs

Acceleration

Customizing computations with STK Plug-ins using compiled code