1 / 16

JIT FPGA Ideas

JIT FPGA Ideas. Frank Vahid Dept. of CS&E University of California, Riverside Associate Director, Center for Embedded Computer Systems, UC Irvine. Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona

Download Presentation

JIT FPGA Ideas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JIT FPGA Ideas Frank Vahid Dept. of CS&E University of California, Riverside Associate Director, Center for Embedded Computer Systems, UC Irvine Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of Florida, Gainesville Scotty Sirowy (current) David Sheldon (current) Chen Huang (current) This research was supported in part by the National Science Foundation, the Semiconductor Research Corporation, Intel, Freescale, IBM, and Xilinx

  2. SystemC Bytecode for FPGAs • Demo

  3. FPGA Common Presence • Caches, FPUs, GPUs, FPGAs • App developers may expect FPGA presence • How create/distribute apps that make good use of FPGA if present? Binary µP Cache FPU GPU FPGA µP

  4. “Spatial” Algorithms for FPGAs • Example – Count patterns • Sequential algorithm • Hash table • 10s cycles per pattern • Spatial algorithm • Pipelined stages • Essence is the connectivity of components, not the sequencing of instructions bus int patterns[1,000]; int counts[1,000]; while (1) { WaitForPattern(); CurrPattern = X; hash = HashFct(CurrPattern); item = Find(patterns, CurrPattern, hash); if (item) { counts[item]++; } } CurrPattern count pattern logic Level 1 count pattern logic Level 2 . . . count pattern logic Level m

  5. Bytecode • Modern portability approach • Java, C# Compiler Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture bytecode VM VM VM Pentium Opteron Atom

  6. SystemC Bytecode? SystemC Compiler SystemC bytecode VM VM VM Opteron + FPGA Pentium FPGA

  7. UCR SystemC Bytecode and Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); void getPixel(){ … dataReady.write(1); } void mainComp(){ int i, j; for(i = 0; i < 3; i++){ for(j = 0; j < 3; j++){ sumX = sumX + mem.read()*GX[i][j] } } … edge.write(sumX + sumY) } --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 … END UCR’s SystemC bytecode SystemC UCR’s SystemC-to-bytecode compiler Spatial Constructs MIPS-like sequential instructions

  8. Emulator Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs SystemC Bytecode Emulator SystemC bytecode Bytecode uploadable via USB drive FPGA Accelerators speedup emulation

  9. SystemC Bytecode Accelerators Emulator Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Accelerator Buttons Write Signal Memory Register File LEDs Bus, start, load logic RISC Datapath Local Mem • Implementation • MIPS-like multicycle RISC datapath • 100 MHz Clock • ~33 Million Instr/Sec • Communicates to core emulator memory mapped registers • Area: ~5000 slices • # of accelerators limited to # of masters allowed on bus • ~1200 lines of VHDL SystemC bytecode Accelerator 1 Accelerator 2 Accelerator 3 FPGA

  10. Emulator Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Dynamic SystemC Accelerator Management • Only a limited number of SystemC accelerators can fit on an FPGA fabric • Dynamically map processes to accelerators based on process usage • Involves online algorithms SystemC bytecode 42 11 12 43 10 44 Accelerator 1 Accelerator 2 Accelerator 3 FPGA Image Filter Example

  11. Emulator Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA Just-in-Time Synthesis Send SystemC bytecode to synthesis server SystemC bytecode Dynamically reconfigure some or all of the FPGA FPGA Specific Bitstream Possible to even perform synthesis on-chip – “warp processing” (previous UCR work)

  12. Transmuting Coprocessors • Demo

  13. FPGA is a Size-Limited Coprocessing Resource App executions change. Must decide which coprocessors should be FPGA-resident at a given time – transmuting coprocessors Speedup with previous apps Upload app profile info Select coproc. set, generate new FPGA bitstream FPGA implements coprocessors Send back new bitstream, re-program FPGA

  14. Transmuting Coprocessor Demo • Three image filters: • Blur filter (S/L): Blur the image • Sobel filter (S/L): Find the edge of the image • Emboss filter(S/L): Emboss the image • Platform: • Virtex 2P(XC2VP30): PPC + Coprocessors • PPC Frequency: 100Mhz • Coproc. Frequency: 50Mhz 30x 120x

  15. Image (128*128 pixels and 24bit color): 24 BRAMs Soft version: Read (Image BRAM)Execution (PPC)Write (Display BRAM) Coprocessor version: Read (Image BRAM)Execution(Coproc)Write (Display BRAM) Dock: send the profile information through UART. Demo architecture UART Push button Image BRAM PLB PPC Peripherals Coproc Interface to external Instruction BRAM Display BRAM EDK VGA control ISE VGA display

  16. Coprocessor configurations • Microprocessor only • Small blur+ small sobel • Small blur + small emboss • Small sobel + small emboss • Large blur • Large sobel • Large emboss • Choose the configuration according to app profile info. PPC Peripherals Blur (S) Blur (S) Sobel(s) Blur (L) Sobel (L) Emboss(L) Memory Sobel(S) Emboss(s) Emboss(s) Coprocessor region Virtex2P

More Related