Just-in-Time Compilation for FPGA Processor Cores
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Just-in-Time Compilation for FPGA Processor Cores PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on
  • Presentation posted in: General

Just-in-Time Compilation for FPGA Processor Cores. Andrew Becker 1 , Scott Sirowy 2 , Frank Vahid Department of Computer Science and Engineering University of California, Riverside {abecker | ssirowy | [email protected] 1. Now at EPFL 2. Now at ESRI.

Download Presentation

Just-in-Time Compilation for FPGA Processor Cores

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Just in time compilation for fpga processor cores

Just-in-Time Compilation for FPGA Processor Cores

Andrew Becker1, Scott Sirowy2, Frank Vahid

  • Department of Computer Science and Engineering

  • University of California, Riverside

  • {abecker | ssirowy | [email protected]

  • 1. Now at EPFL 2. Now at ESRI

This work was supported in part by the National Science

Foundation (CNS1016792) and by the Semiconductor Research

Corporation (GRC 2143.001)


Just in time compilation for fpga processor cores

Motivation

  • SystemC useful capture language

    • Concurrency, structure, timing

  • Simulation typical, but in-system I/O often useful

    • Design/synthesis to FPGA may take hours/days and require advanced tools

Switches/LEDs

Cameras/displays

In-system I/O

Simulation


Just in time compilation for fpga processor cores

Background

  • Want rapid design iteration with in-system I/O

    • Compile design description; avoid design/synthesis

    • Previously: Hybrid approach—SystemC bytecode

SystemC Code

Bytecode

class CLK_GEN : public sc_module {

sc_in<bool>

clock;

CLK_GEN(){

process(clock)

READ $1 dataRdy

BGT $1 $0 Start

J Done

Start: ADDI $2 $2 1

ADDI $3 $0 7

Simulator (no in-system I/O)

Design/synthesis (time-consuming)

Compiler

Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09]


Just in time compilation for fpga processor cores

Background

  • Emulate bytecode in engine on FPGA

    • Fast compilation

    • Bytecode also portable (FPGA-device independent)

FPGA

Bytecode

process(clock)

READ $1 dataRdy

BGT $1 $0 Start

J Done

Start: ADDI $2 $2 1

ADDI $3 $0 7

class CLK_GEN : public sc_module {

sc_in<bool>

clock;

CLK_GEN(){

Compiler

Emulation Engine

In-system I/O

Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09]


Just in time compilation for fpga processor cores

Emulation Engine

  • Discrete event simulator

    • C code on a processor

      • (Currently Microblaze soft-core; could be hard-core)

  • Support-circuits for architectural features, peripheral I/O

Peripheral Bus

Processor Core

UART

Event Kernel

LEDs

Instruction Mem.

Buttons

Read Signal

Memory

Frame Buffer

Write Signal

Memory


Just in time compilation for fpga processor cores

Caveat Emptor

  • Emulation is slow

    • On soft-core, is even slower than PC simulation

  • Won't meet many real-time constraints


Just in time compilation for fpga processor cores

This work – Speed up emulator

  • First analyzed emulator performance


Just in time compilation for fpga processor cores

Low-Hanging Fruit

  • 69% of time spent emulating bytecode

  • Two strategies to reduce

    • Reduce each instruction’s emulation time

    • Reduce instruction memory latency


Just in time compilation for fpga processor cores

First Step

  • Reduce instruction emulation time

    • Optimize event kernel?

Peripheral Bus

Processor Core

UART

Event Kernel

LEDs

Instruction Mem.

Buttons

Read Signal

Memory

Frame Buffer

Write Signal

Memory


Just in time compilation for fpga processor cores

First Step

  • Reduce instruction emulation time

    • Optimize event kernel?

    • Just-in-time (JIT) compile bytecode to native processor code, done transparently by event kernel

Peripheral Bus

Processor Core

UART

Event Kernel

LEDs

Instruction Mem.

Buttons

Read Signal

Memory

Frame Buffer

Write Signal

Memory


Just in time compilation for fpga processor cores

Just-in-Time Compilation of Bytecode

  • Implemented SystemC-bytecode to Microblaze JIT compiler

    • 3x speedup; still portable

    • Tunable delay/jitter

    • Still want more speed

Emulation Engine

Emulation Engine

Machine Code

Bytecode

Machine Code

Machine Code

Machine Code

process(clock)

READ $1 dataRdy

BGT $1 $0 Start

J Done

Start:ADDI $2 $2 1

ADDI $3 $0 7

IMM 0xDEAD

LWI $11 $0 0xBEEF

BGTI $11 Start

BRAI Done

Start:

JIT

Event Kernel


Just in time compilation for fpga processor cores

Further Improvement

  • Reduce instruction memory latency

    • Add dedicated small, fast memory for JIT code on a fast, local bus

      • Unique JIT possibility due to FPGA configurability


Just in time compilation for fpga processor cores

Architecture Changes

Peripheral Bus

Local Memory Bus

Processor Core

UART

LEDs

JIT Mem.

Instr. Mem.

Buttons

Read Signal

Memory

Frame Buffer

Write Signal

Memory

Emulation Engine


Just in time compilation for fpga processor cores

Even Further Improvement

  • 23% of time spent maintaining signal queue

  • What can be done?

    • Optimize signal queue maintenance code?


Just in time compilation for fpga processor cores

FPGA

FPGA

Extra Resources

Emulation Engine

Emulation Engine

Common Denominator

  • FPGA offers configurability

    • Engine designer can make tradeoffs

    • Trade hardware resources for speed


Just in time compilation for fpga processor cores

FPGA

FPGA

Extra Resources

Emulation Engine

Emulation Engine

Common Denominator

  • FPGA offers configurability

    • Engine designer can make tradeoffs

    • Trade hardware resources for speed

      • Add another soft-core?


Just in time compilation for fpga processor cores

Even Further Improvement

  • 23% of time spent maintaining signal queue

  • What can be done?

    • Optimize signal queue maintenance code?

    • Offload job to coprocessor

      • Again, unique JIT option due to FPGAconfigurability


Just in time compilation for fpga processor cores

Architecture Changes

Peripheral Bus

Local Memory Bus

Processor Core

UART

LEDs

Signal Queue

JIT Mem.

Instr. Mem.

Buttons

Read Signal

Memory

Frame Buffer

Emulation

Memory

Controller

Write Signal

Memory

Emulation Engine


Just in time compilation for fpga processor cores

Experimental Results


Just in time compilation for fpga processor cores

Conclusions

  • Approach rapid design iteration with in-system I/O

    • Uses

      • Education (typically loose timing constraints)

      • System prototypes that can tolerate real-time slowdown (e.g., slow frame rate)

    • Portable and flexible

      • Engine design sets speed, not compiler or CAD flow

  • This work: 15x speedup via normal JIT (3x) + FPGA-specific JIT (5x)

    • But, still orders of magnitude slower than design/synthesis

    • Future work: Bytecode accelerators, JIT synthesis


  • Login