1 / 23

Portable SystemC-on-a-Chip

Portable SystemC-on-a-Chip. Department of Computer Science and Engineering University of California, Riverside {ssirowy,bmiller, vahid}@cs.ucr.edu. Scott Sirowy, Bailey Miller, and Frank Vahid †. † Also with the Center for Embedded Computer Systems at UC Irvine.

hadleyc
Download Presentation

Portable SystemC-on-a-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Portable SystemC-on-a-Chip Department of Computer Science and Engineering University of California, Riverside {ssirowy,bmiller, vahid}@cs.ucr.edu Scott Sirowy, Bailey Miller, and Frank Vahid† †Also with the Center for Embedded Computer Systemsat UC Irvine This work was supported in part by the National Science Foundation and the Office of Naval Research

  2. go address data Edge Detector Memory Controller s1 s2 s6 s3 s4 s8 s9 s7 + + + + + + + + + + + + + 255 - - MIN Pixel Value Introduction: Prototyping Circuits and Systems Task: Create a custom ASIC/FPGA circuit to detect edges in an image

  3. + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems data go address Edge Detector Memory Controller s3 s4 s1 s2 s7 s8 s9 s6 - - 255 MIN Capture in HDL -- VHDL/Verilog File Entity Edge_Detector is Port { clk : in std_logic; rst : in std_logic; data: in std_logic_vec … }; …

  4. + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems data go address • SystemC • C++ based • Creation, instantiation, and connection of components • Precisely timed communication and execution among concurrently executing components • Supports both “software” and “hardware” constructs and semantics Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN Pixel Value Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos();

  5. data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • Simulation • Requires environment modeling • Sometimes hard! • Does not interact with real I/O Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Simulation on Desktop PC

  6. data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • Implementation • Mapping to microprocessor / coprocessor system • Interfacing Issues • Synthesis Issues • Size Constraints Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Mapping & Synthesis

  7. data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • In-System Emulation • Quickly-obtained simulation interaction with real I/O • Prior to time-consuming mapping and synthesis • But slower Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Emulation

  8. Processor Processor FPGA In-System Emulation of SystemC • How? • Port publicly available SystemC libraries to target platforms • SystemC executable has built-in event kernel • Libraries are large and require OS support SystemC Description

  9. Compiler VM VM VM Bytecode • Modern portability approach • Java, C# Java, C# Bytecode Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture Opteron Pentium Atom

  10. Compiler VM VM VM SystemC Bytecode? SystemC SystemC Bytecode Opteron + FPGA Pentium FPGA

  11. Portable SystemC-on-a-Chip Task: Create a custom circuit to detect edges in an image Processor Emulation Engine SystemC Bytecode Compiler SystemC Bytecode SystemC Description Processor Processor Emulation Engine Processor FPGA SystemC bytecode can run on any platform that supports the SystemC emulation engine, without the need for recompilation or synthesis Emulation Engine Emulation Accelerators

  12. SystemC Bytecode Compiler Pinapa Front End AST Link ELAB Bytecode Back End SystemC Bytecode Register Allocation Code Generation SystemC Bytecode Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); } • Pinapa Front End (Moy, EMSOFT’05) • Extracts architectural features and behavior of each process • Uses modified versions of GCC and the SystemC kernel • Bytecode Back End • Flattens original SystemC circuit • Generates SystemC bytecode that preserves architecture and behavioral information • Output is a human-readable text file SystemC Description

  13. SystemC Bytecode • Sequential Instructions • Based on the RISC MIPS instruction set • Efficient emulation (Davis 2003) • Spatial Instructions • Includes meta instructions for defining architectural features, bit width specific computations, and reading and writing signals --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 … END SystemC Bytecode Spatial Constructs MIPS-like sequential instructions

  14. SystemC Emulation Engine • Must support a basic SystemC interface • Clock • Reset • 16 I/O pins • 8KB Input Memory • 8KB Output Memory • UART • Platforms with more advanced I/O might support more features • Increased Memory • Extended General Purpose I/O Output I/O SystemC Circuit Clock UART Tx Reset Input Mem Addr Input I/O Input Mem Stream UART Rx Output Mem Addr Input Mem Data Output Mem Data

  15. USB Interface USB Download Interface SystemC Emulation Engine • Real I/O Peripherals • Representative of many systems • Emulation Engine Kernel • Virtual Machine • Discrete Event Kernel • Peripheral Access and Hooks • Optional USB Download Interface Emulation Engine Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory Buttons Write Signal Memory LEDs Emulation Engine Kernel and Support Peripherals I/O Peripherals

  16. Emulation Engine Acceleration • For some SystemC applications, emulation can be slow • An Edge Detection circuit required ~10 minutes to process a 320x240 image * Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs * on a 100 MHz/SRAM Microblaze SystemC Emulation Engine implementation

  17. Accelerator 1 Accelerator 2 Accelerator 3 Emulation Engine Acceleration • For some SystemC applications, emulation can be slow • An Edge Detection circuit required ~10 minutes to process a 320x240 image * Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs • If available, use platform FPGA to create bytecode accelerators • Execute SystemC bytecode natively FPGA Accelerators speedup emulation * on a 100 MHz Microblaze SystemC Emulation Engine implementation

  18. SystemC Bytecode Accelerators Accelerator Register File Bus, start, load logic RISC Datapath Local Mem • MIPS-like multicycle RISC datapath • Communicates to core emulator via memory-mapped registers • # of accelerators limited to # of masters allowed on bus Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA

  19. Accelerator Accelerator Accelerator Accelerator Accelerator Accelerator SystemC-on-a-Chip Implementation Virtex5 VLX110T * Virtex4 Ml403 Xilinx Spartan 3E Platform *Currently building PowerPC (50 MHz) Microblaze (100 MHz) Microblaze (50 MHz) Main Processor PLB PLB Bus Platform OPB SRAM+BRAM SRAM Main Memory BRAM # Emulation Accelerators >3 1-2 0-1 * Demo

  20. SystemC Bytecode Compiler Pinapa AST Link ELAB Back End SystemC-on-a-Chip Implementation • SystemC Bytecode compiler • 3,500 lines of code + Pinapa (20,000 lines) Emulation Engine Main Processor Input Memory Output Memory Instruction Memory • SystemC Emulation Engine • 3,000 lines of C + 8,000 lines of VHDL UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA

  21. SystemC-on-a-Chip Implementation Accelerator Register File Bus, start, load logic RISC Datapath Local Mem • SystemC Bytecode Accelerator • 2,000 lines of VHDL • Area: ~3000 Slices • Clock Frequency: 50-100 MHz Emulation Engine Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA

  22. SystemC-on-a-Chip Experiments Competitive with SystemC PC Simulation, but with the benefits of real I/O Emulation Engine Execution Time Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory Base Emulation on Virtex 4 USB Interface Base Emulation on Virtex 5 Buttons Write Signal Memory Emulation + Accelerators (Virtex 4) LEDs Emulation + Accelerators (Virtex 5) Execution Time Normalized to SystemC running on a 2.8 GHz Intel Xeon Accelerator 1 Accelerator 2 Accelerator 3

  23. Conclusions • Introduced SystemC Bytecode as a means to emulate SystemC for prototyping • For platforms with FPGA resources, introduced bytecode accelerators to speed up SystemC performance • Outperforms emulation by over 100X • As proof of concept, built 3 test platforms and tested multiple SystemC circuits without having to recompile or synthesize • Future Directions • Emulation architecture improvements • Synthesizing SystemC just-in-time?

More Related