1 / 35

RISPP: R otating I nstruction S et P rocessing P latform

RISPP: R otating I nstruction S et P rocessing P latform. Lars Bauer, Muhammad Shafique, Simon Kramer and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe. Outline. Motivation Related Work Our RISPP Approach: Special Instructions (SIs) composition

albina
Download Presentation

RISPP: R otating I nstruction S et P rocessing P latform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RISPP: Rotating Instruction SetProcessing Platform Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe

  2. Outline • Motivation • Related Work • Our RISPP Approach: • Special Instructions (SIs) composition • Forecasting SI usages • Run-time architecture • Results & Evaluation

  3. Development of Embedded Systems • Typical: • Static analysis of hot spots • Building tightly optimized system • Nowadays: • Increasing complexity • More functionality • Problem: • Statically chosen design point has to match all requirements • Typically inefficient for individual components (e.g. tasks or hot spots) nokia.com

  4. Possible Solution:Extensible Processors

  5. Related Work:Extensible Processors • S Kobayashi, K Mita, Y Takeuchi, M Imai: “Design space exploration for dsp applications using the ASIP development system PEAS-III”, ICASSP 2002 • A Hoffmann, T Kogel, A Nohl, G Braun, O Schliesbusch, O Wahlen, A Wieferink, H Meyr “A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language”, IEEE Trans. on CAD of Int. Circ. and Syst. 01 • K Atasu, L Pozzi, P Ienne “Automatic application-specific instruction-set extensions under microarchitectural constraints”, DAC, 2003 • F Sun, S Ravi, A Raghunathan, NK Jha “A scalable application-specific processor synthesis methodology”, ICCAD, 2003 • N Cheung, S Parameswaran, J Henkel “A quantitative study and estimation models for extensible instructions in embedded processors”, ICCAD, 2004 • …

  6. Problem: Various Hot-Spots

  7. Related Work:Reconfigurable Computing • K Compton, S Hauck “Reconfigurable computing: a survey of systems and software”, ACM Computing Surveys 2002 • F Barat, R Lauwereins “Reconfigurable instruction set processors: a survey”, RSP 2000 • RD Wittig, P Chow “OneChip: an FPGA processor with reconfigurable logic”, IEEE Symp. FCCM, 1996 • S Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E.M. Panainte, “The MOLEN polymorphic processor”, IEEE Transaction on Computers, 2004 • …

  8. Dynamic System Behavior • Extensible Processor: choosing points in designspace at design time • Reconfigurable Computing: typically fix at compile time when and how to deploy reconfigurable hardware • How to handle situations that areunknown at design- & compile- time? • (while still supporting various extensible instructions) • Depending on input data(e.g. different computational paths in video encoder) • Which tasks/applications will be executed together?

  9. Our New Concept:Basic Idea and Overview • At design time: fix the amount of reconfigurable hardware • At compile time: compose Special Instructions (SIs) out of high re-usable datapaths • At run time:dynamicallydetermine theimplementa-tion of a SI • Altogether:Rotate theInstructionSet

  10. Fundamental Idea:Atom / Molecule Model Example Atom Example Molecule Example Molecule • Key: • Multiple implementations per SI (Molecules) • Each Molecule is composed out of Atoms • Implementation hierarchy • Atoms are more reusable • Molecules are more specific • Advantage: Enables dynamic trade-off • Drawback: Higher design effort • Atom: elementary data path (smaller granularity) • Molecule: combination of Atoms (bigger granularity) • Special Instr.: Application specific assembly instruction

  11. Legend: Molecule Relation “is bigger or equal than” Infimum of the Molecules Supremum of the Molecules Formal Atom / Molecule Model: Example • Molecule relations are e.g. needed when Molecules comprise each other • In such cases we can first configure the smallest possible Molecule with required functionality and then upgrade to faster implementations # Atoms A2 (in general: n-dimensional) (3,5) (1,4) 1 # Atoms A1 1

  12. Formal Atom / Molecule Model: Details • Main data structure:Set of all Molecules • Meta-Molecule to implement two Molecules, such that they can be executed consecutively, i.e. temporal domain (Abelian Group) • Meta-Molecule for the common Atoms (indicator for compatibility) • Relation (Complete Lattice), with • Supremum: Meta-Molecule that is needed to implement all Molecules • Infimum: Meta-Molecule that is col-lectively needed for all Molecules

  13. Formal Atom / Molecule Model: Details • Determinant: number of Atoms needed to implement a Molecule • Upgrading: Atoms that are additionally needed to implement o, assuming m is already available

  14. Instruction Set Rotation Time For our examples: 0.84 – 0.95 ms • Loading time depends on: • Atom size • Reconfiguration bandwidth Execution and Reconfiguration times for SATD_4x4 for 1 frame: • Altogether: Hardware has to be available when needed start loading early

  15. “forecast SATD_4x4, 42” Executions of SATD_4x4 SI Forecasting: Example • Control-flow graph • Each node is a Base-Block (BB) • At compile time: • Determine points to forecast a SI • Add Forecast Instructions with forecast values (about the SI importance) to these points • At run time: • Use the Forecasts to determine the Instruction Set rotation • Dynamically update the importance of the forecasted SIs Time for Instruction set rotation Return fromsubroutine

  16. Inserting Forecast Points (FCs):General Idea of Algorithm Pre-computations from profiling data for each Special Instruction (SI) I. For every SI determine Forecast Candidates II. Optimize list of FC-Candidatesand select final forecasts III.

  17. I. Pre-Computations • Pre-computations are done on control-flow graph using profiling-information • Temporal Distance from Base Block to SI execution • Probability that the SI executions are reached • Number of executions of this SI (if it is executed)

  18. II. Forecast Decision Function (FDF)

  19. General Idea: While the forecasted SIs in a Base Blockconsume too many area:remove the forecast with the worst Achieved Speedup Exclusively used Atoms III. Optimize list of FC Candidates

  20. Main Tasks of theRun-Time Architecture • Monitoring Forecasts and Special Instructions: • Fine-tune the forecasted importanceto reflect varying run-time situation a) • Selecting Molecules to implement SIs: • Dynamically choose an SI implementationthat matches the current needs of the application b) • Realize the taken decisions: • Determine a loading sequence forthe Atoms & control the SI execution c)

  21. Run-time Architecture example • 2 Tasks are running alternating, sharing the available Atom Containers • Only one task may determine the content of an Atom container, but both can use them • [SASO’07]: “A Self-Adaptive Extensible Embedded Processor”(IEEE International Conference on Self-Adaptive andSelf-Organizing Systems Boston, July 9-11)

  22. Results & Evaluation:Flow of Test Application • Core part of Encoding Engineof ITU-T H.264 • Special Instructions (# executions per MacroBlock): • SATD_4x4 (256) • DCT_4x4 (16) • HT_4x4 (1) • Focus: Proof of concept, not automatic SI detection

  23. Designing an Atom for thethree transform operations • Consider constraints • Max size of data path • Number of I/O signals • Number of control signals • Increase re-usability • Combine similar data paths (MUX)

  24. Composing Molecules for SATD_4x4 Increasedre-usability

  25. max 15 10 5 0 Performance vs. Area Trade-off Area requirements [# loaded Atoms]

  26. Hardware Feasibility Study • Xilinx Virtex II 3000 xc2v3000-6ff1152 • Board: Xilinx HW-AFX-FF1152-200 • Floor-Planning with Plan Ahead

  27. Special Instruction Execution Timefor Different Resources

  28. Application Execution Time

  29. Time matters! Design Time Compile Time Run Time • Fix the avail-able reconfi-gurable hard-ware resour-ces • Determine Special Instructions • Determine composition out of Atoms / Molecules • Profile the application • Add Forecast Points to the application • Dynamically update the forecasted Importance of the SIs • Choose Molecule implemen-tation for SIs • The art is to find the right trade-off between design-/compile-time and run-time

  30. Summary & Conclusion • Hierarchical Special Instruction (SI) composition • Atom / Molecule model • Use resources more efficiently • Offer multiple SI implementations • Forecasting SI usages at compile time • Pre-computations from profiling and graph analysis • Forecast Decision Function • Push more decisions to run time • Which SI implementation (dynamic trade-off) • Adapting to run-time situation • There is a large potential for improving the way current Extensible Processors work

  31. Thank you foryour attention ! RISPP: Rotating Instruction SetProcessing Platform Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe http://ces.univ-karlsruhe.de Lars Bauer, CES, University of Karlsruhe, DAC 2007

  32. Atom-Container Interconnections

  33. Final Forecast III. Select final Forecasts • Optimization goals • As few FCs as possible (smaller code size, less executed cycles), as many as needed (provide all necessary information to the run time system • Choose FCs with a good trade-off between ‘sufficiently early’ and a ‘high execution probability’ • For each SI start Depth-First-Searches on the FC Candidates on the transposed Base Block graph (i.e. all edges reversed) Green BB:FC-Candidate

  34. RISPP Area Savings

  35. II. FDF-Details • Explanation and Parameter Description: • T: Time (Rot: for Rotation; SW: For SW Execution • p: Probability • E: Energy • α: Parameter for Energy vs. Speedup fine-tuning

More Related