1 / 2

Evaluating and Improving an OpenMP-based Circuit Design Tool

Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick. FPGAs

Download Presentation

Evaluating and Improving an OpenMP-based Circuit Design Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick • FPGAs • A field-programmable gate array (FPGA) is a programmable logic device which can be configured to implement any logical function • They are made up of: configurable logic blocks programmable interconnects • FPGAs are programmed with a schematic or hardware description language (HDL) design • Design Flow • FPGAs and application-specific integrated circuits (ASICs) are designed according to HDL hardware design flow • Traditional HDLs include VHDL and Verilog CLB architecture CLB pin layout images: http://en.wikipedia.org/wiki/fpga • The Handel-C Language • Handel-C is a behavioral HDL by Celoxica • It is made up of: • A subset of ANSI-C language elements • Extensions for concurrency • A set of variable width primitive types • A set of architectural types such as interfaces and rams • Each assignment statement takes 1 clock cycle • Example 8-bit multiplier in Handel-C: • Shared Memory and OpenMP • A shared memory system has multiple processing cores with access to a common, shared memory • Shared memory can be accessed by each processor simultaneously • Communication and synchronization is achieved through shared variables • OpenMPis an API for shared memory parallel programming in C/C++ • Parallelism is specified explicitly through a set of pragmadirectives • Run-time library functions control environment settings such as the number of threads set clock = external; void main (void) { int 8 result; interface bus_in (int 8 a, int 8 b) input (); interface bus_out () output (int 8 data_out = result); result = input.x * input.y; }

  2. Representing a C program • Source code is parsed and represented as an abstract syntax tree • OpenMP-Handel-C Translator • Wong et al. created the OpenMP-Handel-C translator [1] • It is based on C-Breeze, a C compiler infrastructure • Their modifications include: • Addition of new abstract syntax tree nodes for OpenMPpragmas • Addition of the OpenMP grammar to the GNU Flex/Bison-based parser • Modifications to C-Breeze’s built-in C-to-C translator enabling C-to-Handel-C translation based on a set of porting rules • The OpenMP abstract syntax tree nodes generate Handel-C code that implement the supported OpenMP directives • Data types supported for translation are: int, char, and long int is_even (int x) { if (x % 2 == 0) return 1; else return 0; }Example source program and AST representation • Translator Limitations • No OpenMP run-time library functions • Number of threads is fixed at compile time • Nested parallelism is not supported • Parallel reduction variables must be 32-bit integers • All variables of type int map to 32-bit registers, which may use more resources than necessary [1] Leow, Y.Y.; Ng, C.Y.; Wong, W.F. Generating Hardware from OpenMP Programs. IEEE International Conference on Field-Programmable Technology 2006 / FPT 2006. 73-80. • Benchmark Methodology • An initial set of tests have been developed: • A Mandelbrot set generator • Miller-Rabin primality test • Systolic sequence alignment • The translated OpenMP programs are compiled to VHDL in Celoxica’s DK 5.0, and then the VHDL is synthesized into hardware using Xilinx’s ISE 9.1 • Resource usage and performance data is recorded • Variable Bit Width • Better control over resource usage should lead to better performance • A new compiler directive was implemented to allow variable bit width • Register widths are automatically adjusted when translating expressions whose widths don’t match #pragma handelc width 8 int x; #pragma handelc function return 8 params (8, 16) int my_function (int param1, int param2); Example C program fragment with bit width annotations int 8 x; inline int 8 my_function (int 8 param1, int 16 param2); Translated C program fragment • Preliminary Results • The Mandelbrot set was generated with a resolution of 640x480 pixels • Varying bit width settings were used for program variables • Resulting resource usage and performance data was collected • Ran out of hardware resources for the 48-bit version after 6 threads • Resource usage and execution time decreased • Future Work • Complete the remaining benchmark tests • Implementation of OpenMP library functions such as omp_get_thread_id() • Study the feasibility of a tool that determines the optimal number of threads • Integrate the improved translator with other tools being developed by the Reconfigurable Computing Research Group

More Related