1 / 22

Application-Specific Customization of Parameterized FPGA Soft-Core Processors

Application-Specific Customization of Parameterized FPGA Soft-Core Processors. David Sheldon a , Rakesh Kumar b , Roman Lysecky c , Frank Vahid a* , Dean Tullsen b a Department of Computer Science and Engineering University of California, Riverside

essien
Download Presentation

Application-Specific Customization of Parameterized FPGA Soft-Core Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldona, Rakesh Kumarb, Roman Lyseckyc, Frank Vahida*, Dean Tullsenb aDepartment of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine bDepartment of Computer Science and Engineering University of California, San Diego cDepartment of Electrical and Computer Engineering University of Arizona This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and by hardware and software donations from Xilinx

  2. FPGA Soft Core Processors HDL Description • Soft-core Processor • HDL description • Flexible implementation • FPGA or ASIC • Technology independent FPGA ASIC Spartan 3 Virtex 2 Virtex 4 David Sheldon, UC Riverside

  3. FPGA FPGA Soft Core Processors • Soft Core Processors can have configurable options • Datapath units • Cache • Bus architecture • Current commercial FPGA Soft-Core Processors • Xilinx Microblaze • Altera Nios μP FPU MAC Cache David Sheldon, UC Riverside

  4. FPGA Goal • Goal: Tune FPGA soft-core microprocessor for a given application Parameter Values μP App Parameter Values Synthesis Configured μP Configured μP time size David Sheldon, UC Riverside

  5. Barrel Shifter Base MicroBlaze Divider Microblaze – Xilinx FPGA Soft-Core All units not necessarily the fastest, due to critical path lengthening Multiplier FPU Cache Instantiatable units Significant tradeoffs David Sheldon, UC Riverside

  6. Problem • Need fast exploration • Synthesis runs can take an hour Parameter Values μP • This talk • Two approaches • Approach 1: Using Traditional CAD Techniques • Approach 2: Synthesis-in-the-loop • Results Synthesis Exploration ~20-60 mins Configured μP David Sheldon, UC Riverside

  7. MicroBlaze Constraints on Configurations • Size constraints may prevent use of all possible units Multiplier Barrel Shifter FPU Multiplier Divider FPU Cache Cache Max Area David Sheldon, UC Riverside

  8. MicroBlaze Approach 1: Traditional CAD Techniques Slow, includes synthesis Create model • Create a model of the problem • Solve model with extensive search heuristics • We will model this problem as a 0-1 knapsack problem Model Fast, considers 1000s of configurations Exploration FPU Multiplier Cache Max Area David Sheldon, UC Riverside

  9. Synthesis Synthesis FPU Barrel Shifter Multiplier Cache Divider FPU App perf perf perf perf perf size size size size size Base MicroBlaze MicroBlaze Approach 1: Traditional CAD Techniques Creating the model BS FPU MUL DIV CACHE Perf increment 1.1 0.9 1.2 1.0 1.3 Size increment 1.4 2.7 1.8 1.1 1.6 Perf/Size 0.96 0.34 0.63 0.93 0.80 David Sheldon, UC Riverside

  10. Micro- Blaze Approach 1: Traditional CAD Techniques • 0-1 knapsack model • Object’s benefit = Unit’s performance increment / size increment • Object’s weight = Unit’s Size • Knapsack’s size constraint = FPGA size constraint BS FPU MUL DIV CACHE Perf increment 1.1 0.9 1.2 1.0 1.3 Size increment 1.4 2.7 1.8 1.1 1.6 Perf/Size 0.96 0.34 0.63 0.93 0.80 David Sheldon, UC Riverside

  11. Approach 1: Traditional CAD Techniques • Solved the 0-1 knapsack problem using established methods • Toth, P., Dynamic Programming Algorithms for the Zero-One Knapsack Problem. Computing 1980 • Running time • 6 Microblaze configuration synthesis runs to create model • O(n*p) to solve model • n is the number of factors • p is the available area • Negligible (seconds) compared to synthesis runtimes (~hour) David Sheldon, UC Riverside

  12. Approach 1: Traditional CAD Techniques • Problems • 100’s of target FPGAs • Different hard core resources (multiplier, block RAM) • Model approach estimates size and performance for two or more units • MUL speedup 1.3, DIV speedup 1.6  estimate MUL+DIV speedup 1.9 • May really be 1.7 • Model inaccuracies may be large David Sheldon, UC Riverside

  13. Create model Model Exploration Exploration Synthesis size Execute Approach 2: Synthesis-in-the-Loop • Problem with traditional CAD approach • 100’s of target FPGAs • Model approach estimates size and performance for two or more units • Model inaccuracies may be large • Solution – Synthesis in the loop • No abstract model • Guided by actual size and performance data • But slow – can only explore a few configurations Synthesis-in-the-Loop 10’s of minutes perf David Sheldon, UC Riverside

  14. Barrel Shifter Floating Point Multiplier Cache Divider perf perf perf perf perf size size size size size BS FPU MUL DIV CACHE Perf increment 1.1 0.9 1.2 1.0 1.3 Size increment 1.4 2.7 1.8 1.1 1.6 Perf/Size 0.96 0.34 0.63 0.93 0.80 Approach 2: Synthesis-in-the-Loop • First pre-analyze units to guide heuristic • Same calculations as when creating model for knapsack David Sheldon, UC Riverside

  15. BS FPU MUL DIV CACHE Perf/Size 0.96 0.34 0.63 0.93 0.80 BS DIV CACHE MUL FPU Perf/Size 0.96 0.93 0.80 0.63 0.34 Approach 2: Synthesis-in-the-Loop • Build “impact-ordered tree” structure • Tree is specific to given application Application Specific Impact-ordering Impact BS 0.96 DIV 0.93 CACHE 0.80 Sort MUL 0.63 FPU 0.34 David Sheldon, UC Riverside

  16. Synthesis-in-the-Loop Exploration size perf Synthesis Execute Approach 2: Synthesis-in-the-Loop • Run tree-based search heuristic Perf/Size Useful BS Yes 0.96 Not Include Include DIV No 0.93 CACHE No 0.80 MUL Yes 0.63 FPU No 0.34 David Sheldon, UC Riverside

  17. Comparison of Approaches • Approach 1 – Traditional CAD • 6 synthesis runs to build model • O(np) knapsack solution • Examines thousands of configurations during exploration • Approach 2 – Synthesis in the loop • 11 synthesis runs (6 pre-analysis, 5 exploration) • Examines (at most) 5 configurations during exploration David Sheldon, UC Riverside

  18. Results • 10 EEMBC and Powerstone benchmarks • aifir, BaseFP01, bitmnp, brev, canrdr, g3fax, g721_ps, idct, matmul, tblook, ttsprk • Average results shown, on Virtex 2 Pro, for particular size constraint 800 Exhaustive App-Spec 600 Knapsack Tool Run Time (min) 400 Application-specific impact-ordered tree approach yields near-optimal results in acceptable tool runtime 200 0 Knapsack sub-optimality due to multi-unit estimation inaccuracy 1.5 2 2.5 1 Speedup David Sheldon, UC Riverside

  19. Results • Obtained results for six different size constraints • Results shown for a second size constraint • Similar findings for all six constraints 800 Exhaustive App-Spec 600 Knapsack Tool Run Time (min) 400 200 0 1.5 2 2.5 1 Speedup David Sheldon, UC Riverside

  20. Results • Also ran for different FPGA • Xilinx Spartan2 • Similar findings 300 Exhaustive 250 App-Spec 200 Knapsack Tool Run Time (min) 150 100 50 0 1.2 1.4 1.6 1 Speedup David Sheldon, UC Riverside

  21. Conclusions • Synthesis-in-the-loop approach outperformed traditional CAD approach • Better results • Slightly longer runtime • Application-specific impact-ordered tree heuristic served well for synthesis-in-the-loop approach • Future • Extend for highly-configurable soft-core processors, and for multiple processors competing for and/or sharing resources David Sheldon, UC Riverside

  22. Questions? David Sheldon, UC Riverside

More Related