1 / 26

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling. Seonil Choi, Ronald Scrofano, and Viktor K. Prasanna University of Southern California MAPLD 2002, September, 2002. funded by the DARPA Power-aware Computing and Communications program. Outline.

hua
Download Presentation

Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy-Efficient Design of Kernel Applications for FPGAs Through Domain-Specific Modeling Seonil Choi, Ronald Scrofano, and Viktor K. Prasanna University of Southern California MAPLD 2002, September, 2002 funded by the DARPA Power-aware Computing and Communications program

  2. Outline • Motivation • Design Methodology • Example Matrix Multiplication Designs • Results • MILAN

  3. FPGAs: Current Trends • Large FPGAs (40M+ gates) • Embedded multipliers, processors • Military and commercial systems using FPGAs • Digital Signal Processing: matrix operations, FFT, window operations, filtering • Image processing • Internet • Performance metrics • Energy, Latency, and Area

  4. Mapping Kernel Applicationsonto FPGAs • FPGAs lack a fixed structure comparable to that of general purpose processors • Too fine-grained to model at a high level • Very large design space • Many degrees of freedom • Cannot simulate all designs at a low level • Energy-efficient designs • Analyze efficiency early in design cycle • Analyze effect of algorithm changes • Consider energy efficiency vs. area and latency • Energy consumed by configurable logic blocks and routing • Look-up tables, flip flops, registers, RAM • Various length interconnects

  5. Kernel Application Energy-Efficient Design Design Methodology: Overview • Use domain-specific modeling • Explore the design space at a high level • Verify chosen designs at a low level • Select a set of designs 1. Domain Selection 2. Domain-Specific Modeling 3. Tradeoff Analysis and Manual Design Space Exploration 4. Low-Level Simulation of Candidate Designs

  6. Architecture FPGA Domain • A family of architectures and algorithms for a given kernel application • E.g. matrix multiplication on a linear array • Fixes architecture of FPGA • FPGA too fine-grained to model at high-level • No fixed structure comparable to that of a general purpose processor • Difficult to model at a high level • Domain imposes high-levelstructure • Facilitates high-levelmodeling and highlevel performance analysis

  7. 1. Domain Selection • Choose domains by analyzing algorithms and architectures for a given kernel • Tradeoffs in Energy, Area, Latency Kernel Various Architecture Families Domain 1 Domain 2 Domain n Domain Specific Modeling Domain Specific Modeling Domain Specific Modeling . . . System-wide Energy Function System-wide Energy Function System-wide Energy Function Design Space Exploration, Optimizations Design Space Exploration, Optimizations Design Space Exploration, Optimizations

  8. 2. Domain-Specific Modeling (1) • High-level model • Model parameters are specific to the domain • Identify only those parameters that make a significant impact on energy consumption • Others need not be studied • Design is abstracted to allow easier (but coarse) tradeoff analysis and design space exploration • Benefit: Rapid evaluation of architectures and algorithms without low-level simulation • Identify candidate designs that meet requirements Domain-Specific Model (parameterized) Domain (fixed architecture) FPGA (flexible architecture)

  9. Domain-Specific Modeling (2) Domain Components RModules Interconnects Component specific parameters (n, pe, f, sa) Function Estimation Component specific power function Component power state matrices System-wide energy function Specific design in the domain System-wide energy

  10. Architecture, parameters with ranges of a component VHDL code for sample designs MILAN Model Interpreters Low-level Simulators (XPower, ModelSim,…) Component specific power function Power function builder (curve fitting …) Power estimates Estimation of Power Functions (3) • Using sample implementations • VHDL coding, simulation, measuring power using XPower • Estimation method • Generate random input vectors for estimation • Repeat experiments for statistical significance

  11. Xilinx XST Synthesis Waveforms Component VHDL VHDL File Netlist Xilinx Place&Route ModelSim .ncdVHDL .ncd file .vcd file XPower Power Low-Level Simulation of Components (4) • Accurate power estimates for RModules and Interconnects • Randomly generated test input waveforms • Switching activity is a consideration • Results can be reused

  12. 3. Tradeoff Analysis and Manual Design Space Exploration • Vary model parameters to see the effect on performance. • Analyze tradeoffs • Weed out designs that are not promising

  13. 4. Low Level Simulation of Candidate Designs • Verify high-level estimation of energy and area for a design • Select the best design within the range of the estimation error among candidate designs • Similar to low-level simulation of components Xilinx XST Synthesis Candidate Designs Waveforms VHDL VHDL File Netlist Xilinx Place&Route Modelsim .ncdVHDL .ncd file .vcd file XPower Energy

  14. Example Problem: Matrix Multiplication • Multiply two n  n matrices as efficiently, in terms of energy, as possible • No hard area or latency constraints • Area and latency considered, but no specific constraints • Why matrix multiplication? • Fundamental to many applications in DSP • LU Decomposition • CFAR detection requires matrix-vector multiplication

  15. [xilinx.com] Optimized Design from Xilinx • Provides baseline for comparison • 3  3 block matrix multiplication • Low area

  16. FPGA PE Matrix Entries Cache MAC Design 1: Uniprocessor Architecture • Same area as Xilinx design • Block matrix multiplication • Single processing element (PE) • Model Parameters: cache size, precision, power states

  17. Design 2: Linear Array Architecture • Low latency • Array of processing elements (PEs) • Model parameters: number of PEs, precision, power states

  18. High-Level Comparisons (1)

  19. High-Level Comparisons (2)

  20. Experimental Procedure for Low-Level Simulations • Code designs in VHDL • Synthesize • Place and route • Simulate with test input waveforms • Measure power dissipation with XPower

  21. Linear Array Difference Xilinx Low-Level Simulation for Statistical Analysis of Energy Dissipation • Dependency of energy on input data switching activity • Simulation for statistical significance • 50 randomly generated sets of input matrices • Comparison with Xilinx design for 3x3 matrix multiplication • Confidence intervals give range around experimental value in which, with given confidence level, true value lies • With 95% confidence, our design consumes 32% less energy compared to the Xilinx design

  22. Xilinx ISE4.1i and XPower are used to measure the system-wide energy Xilinx Virtex-II XC2V1500 device is used Accuracy of the High-Level Model

  23. MILAN Objectives MILAN is a model-based, extensiblesimulation framework It provides a unified environment capable of: • modeling a large class of embedded systems and applications • driving design space exploration tools for rapid evaluation ofa large design space • seamlessly integrating different widely-used simulators into a single framework for hierarchical simulation • enabling rapid evaluation of different performance metrics such as energy, latency, and throughput

  24. The MILAN Architecture

  25. Design Space Application Model Resource Model Generic Modeling Environment (GME 2000) Design Space Exploration (analytical technique) Constraints Offline Estimates High-level Perf. Estimator Identify a set of designs Instruction Level Simulator Cycle Accurate Simulator RT-level Simulator Final Design Accuracy Level of abstraction Hierarchical Simulation Design Flow Using MILAN Application (Task Graph) Hardware Resources

  26. Concluding Remarks • Design methodology based on domain-specific modeling • High-level energy estimation • High-level tradeoff analysis • Matrix multiplication example • Improved energy-efficiency compared to Xilinx design (baseline) • Further References • “Energy-Efficient Matrix Multiplication on FPGAs” (FPL 2002) • “Energy Efficiency of FPGAs and Programmable Processors for Matrix Multiplication” (Manuscript)

More Related