1 / 18

Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4)

Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4). Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu. Past lectures

toviel
Download Presentation

Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu

  2. Past lectures Understood the principles of the hardware part of reconfigurable computing: programmable logic technology. Learned how to program reconfigurable fabrics using hardware definition languages (Verilog). Next lectures Understand the principles of the software part (which we have partly used) of reconfigurable computing. Learn how to program reconfigurable fabrics using system software languages (SystemC). Summary of current status

  3. Reconfigurable computing design flow System Specification partitioning SW HW compiling compile Verilog synthesis link so far we only experienced this portion mapping executable image place & route configuration file download to board

  4. Use High-Level Languages (HLLs) (C, C++, Java, MATLAB). Advantages: Since systems consist of both SW and HW, then we can describe the entire system with the same specification Fast to code, debug and verify the system is working Disadvantages: No concurrent support No notion of time (clock or delay) Different communication model than HW (uses signals) Missing data types (e.g., bit vectors, logic values) How can we overcome these disadvantages? System specification

  5. Augment the HLL (e.g. C++) with a new library that support additional hardware-like functionality (e.g. SystemC) Unified language across all stages of platform design Fast simulation There are already lots of tools for C++ → we will come to this part later in details Enable compilers to optimize code and extract concurrency from sequential code to map into FPGAs [from G. De Micheli] Using HLL for hardware/software specification

  6. Given a system specification, decompose or partition the specification into tasks (functional objects) and label each task as HW or SW such that the system cost / performance is optimized and all the constraints on resources / cost are satisfied. The exact performance depends on the computational model in hand Given the same application, a system with an FPGA on a slow bus results in a model with different performance parameter than a system with a FPGA as a coprocessor. Hardware-Software partitioning

  7. HW/SW partitioning SW int main() { …. .. } SW task model task task SW task task HW HW • Good partitioning criteria: • Minimize communication (traffic) between HW and SW and on the bus • Maximize concurrency (reduce stalling) where both the HW and SW run in parallel • Maximizes the utilization of the HW resources • → Minimize total execution runtime

  8. Profiling is a key step in HW/SW partitioning • Determining the candidate HW partitions by first profiling the specification tasks taking into account typical data sets • Given a candidate SW/HW partition • Estimate HW implementation • Determine the system performance and speedup over software • How can we generate candidate SW/HW partitions?

  9. task task task task task HW tasks SW tasks Execution time local optimal global optimal moves HW/SW partitioning algorithms Total size is constrained by number and size of available FPGA(s) • Kernighan/Lin – Fidducia/Mattheyses algorithm • Start with all task vertices free to swap/move (unlocked) • Label each possible swap/move with immediate change in execution time that it causes (gain) • Iteratively select and execute a swap/move with highest gain (whether positive or negative); lock the moving vertex (i.e., cannot move again during the pass), • Best solution seen during the pass is adopted as starting solution for next pass

  10. Rather than partition from the high-level description, it is possible to compile the program as SW and then partition the resultant executable binary into SW and HW parts. Advantages: No need to worry about which language is being used Can be used to develop dynamic runtime partitioners and synthesizers Main steps: Decompilation of binary to recover high-level information Partitioning and synthesis Binary updating to account for the SW parts that migrated to HW Low-level partitioning from software binaries

  11. Reconfigurable configurable has the ability to execute multiple operations in parallel through spatial distribution of the computing resources When compiling a SW-based sequential language like (C) into a concurrent language like Verilog, it is necessary to either Manually instruct the compiler to incorporate parallelism either through special instructions or compiler directives Automatically through the compiler How can the compiler automatically extract parallelism? Compilation

  12. Data-flow graphs (DFG) • A data-flow graph (DFG) is a graph which represents a data dependencies between a number of operations. • Dependencies arise from a various reasons • An input to an operation can be the output of another operation • Serialization constraints, e.g., loading data on a bus and then raising a flag • Sharing of resources • A dataflow graph represents operations and data dependencies • Vertex set is one-to-one mapping with tasks • A directed edge is in correspondence with the transfer of data from an operation to another one a b + c

  13. Consider the following example [Giovanni’94] Design a circuit to numerically solve the following differential equation in the interval [0, a] with step-size dx read (x, y, u, dx, a); do { xl = x + dx; ul = u – (3*x*u*dx) – (3*y*dx); yl = y + u*dx; c = xl < a; x = x1; u = u; y = yl; } while (c); write(y);

  14. Data-flow graph example xl = x + dx; ul = u – (3*x*u*dx) – (3*y*dx); yl = y + u*dx; c = xl < a; 3 x u dx 3 y u dx x dx + * * * * a y dx xl + < * * c u yl - - u1

  15. NOP Detecting concurrency from DFGs Extended DFG where vertices can represent links to link graph DFGs in a hierarchy of graphs + * * * * + < * * * - - NOP Paths in the graph represent concurrent streams of operations

  16. Control-flow information (branching and iteration) can be also represented graphically Data-flow graphs can be extended by introducing branching vertices that represent operations that evaluate conditional clauses Iteration can be modeled as a branch based on the iteration exit condition Vertices can also represent model calls Control / data-flow graphs (CDFG)

  17. NOP NOP NOP NOP NOP NOP + + BR CDFG example x = a * b; y = x * c; z = a + b; if (z ≥ 0) { p = m + n; q = m * n; } * * *

  18. Parallelism extraction and optimization from DFG Next lecture

More Related