Rispp r otating i nstruction s et p rocessing p latform
1 / 35

RISPP: R otating I nstruction S et P rocessing P latform - PowerPoint PPT Presentation

  • Uploaded on

RISPP: R otating I nstruction S et P rocessing P latform. Lars Bauer, Muhammad Shafique, Simon Kramer and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe. Outline. Motivation Related Work Our RISPP Approach: Special Instructions (SIs) composition

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' RISPP: R otating I nstruction S et P rocessing P latform' - albina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Rispp r otating i nstruction s et p rocessing p latform

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe


  • Motivation

  • Related Work

  • Our RISPP Approach:

    • Special Instructions (SIs) composition

    • Forecasting SI usages

    • Run-time architecture

  • Results & Evaluation

Development of embedded systems
Development of Embedded Systems

  • Typical:

    • Static analysis of hot spots

    • Building tightly optimized system

  • Nowadays:

    • Increasing complexity

    • More functionality

  • Problem:

    • Statically chosen design point has to match all requirements

    • Typically inefficient for individual components (e.g. tasks or hot spots)


Possible solution extensible processors
Possible Solution:Extensible Processors

Related work extensible processors
Related Work:Extensible Processors

  • S Kobayashi, K Mita, Y Takeuchi, M Imai: “Design space exploration for dsp applications using the ASIP development system PEAS-III”, ICASSP 2002

  • A Hoffmann, T Kogel, A Nohl, G Braun, O Schliesbusch, O Wahlen, A Wieferink, H Meyr “A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language”, IEEE Trans. on CAD of Int. Circ. and Syst. 01

  • K Atasu, L Pozzi, P Ienne “Automatic application-specific instruction-set extensions under microarchitectural constraints”, DAC, 2003

  • F Sun, S Ravi, A Raghunathan, NK Jha “A scalable application-specific processor synthesis methodology”, ICCAD, 2003

  • N Cheung, S Parameswaran, J Henkel “A quantitative study and estimation models for extensible instructions in embedded processors”, ICCAD, 2004

Related work reconfigurable computing
Related Work:Reconfigurable Computing

  • K Compton, S Hauck “Reconfigurable computing: a survey of systems and software”, ACM Computing Surveys 2002

  • F Barat, R Lauwereins “Reconfigurable instruction set processors: a survey”, RSP 2000

  • RD Wittig, P Chow “OneChip: an FPGA processor with reconfigurable logic”, IEEE Symp. FCCM, 1996

  • S Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E.M. Panainte, “The MOLEN polymorphic processor”, IEEE Transaction on Computers, 2004

Dynamic system behavior
Dynamic System Behavior

  • Extensible Processor: choosing points in designspace at design time

  • Reconfigurable Computing: typically fix at compile time when and how to deploy reconfigurable hardware

  • How to handle situations that areunknown at design- & compile- time?

    • (while still supporting various extensible instructions)

  • Depending on input data(e.g. different computational paths in video encoder)

  • Which tasks/applications will be executed together?

Our new concept basic idea and overview
Our New Concept:Basic Idea and Overview

  • At design time: fix the amount of reconfigurable hardware

  • At compile time: compose Special Instructions (SIs) out of high re-usable datapaths

  • At run time:dynamicallydetermine theimplementa-tion of a SI

  • Altogether:Rotate theInstructionSet

Fundamental idea atom molecule model
Fundamental Idea:Atom / Molecule Model

Example Atom

Example Molecule

Example Molecule

  • Key:

  • Multiple implementations per SI (Molecules)

  • Each Molecule is composed out of Atoms

    • Implementation hierarchy

    • Atoms are more reusable

    • Molecules are more specific

  • Advantage: Enables dynamic trade-off

  • Drawback: Higher design effort

  • Atom: elementary data path (smaller granularity)

  • Molecule: combination of Atoms (bigger granularity)

  • Special Instr.: Application specific assembly instruction

Formal atom molecule model example



Relation “is bigger or equal than”

Infimum of the Molecules

Supremum of the Molecules

Formal Atom / Molecule Model: Example

  • Molecule relations are e.g. needed when Molecules comprise each other

    • In such cases we can first configure the smallest possible Molecule with required functionality and then upgrade to faster implementations

# Atoms A2

(in general: n-dimensional)




# Atoms A1


Formal atom molecule model details
Formal Atom / Molecule Model: Details

  • Main data structure:Set of all Molecules

  • Meta-Molecule to implement two Molecules, such that they can be executed consecutively, i.e. temporal domain (Abelian Group)

  • Meta-Molecule for the common Atoms (indicator for compatibility)

  • Relation (Complete Lattice), with

    • Supremum: Meta-Molecule that is needed to implement all Molecules

    • Infimum: Meta-Molecule that is col-lectively needed for all Molecules

Formal atom molecule model details1
Formal Atom / Molecule Model: Details

  • Determinant: number of Atoms needed to implement a Molecule

  • Upgrading: Atoms that are additionally needed to implement o, assuming m is already available

Instruction set rotation time
Instruction Set Rotation Time

For our examples:

0.84 – 0.95 ms

  • Loading time depends on:

    • Atom size

    • Reconfiguration bandwidth

Execution and Reconfiguration times for SATD_4x4 for 1 frame:

  • Altogether: Hardware has to be available when needed start loading early

Si forecasting example

forecast SATD_4x4, 42”

Executions of SATD_4x4

SI Forecasting: Example

  • Control-flow graph

    • Each node is a Base-Block (BB)

  • At compile time:

    • Determine points to forecast a SI

    • Add Forecast Instructions with forecast values (about the SI importance) to these points

  • At run time:

    • Use the Forecasts to determine the Instruction Set rotation

    • Dynamically update the importance of the forecasted SIs

Time for Instruction set rotation

Return fromsubroutine

Inserting forecast points fcs general idea of algorithm
Inserting Forecast Points (FCs):General Idea of Algorithm

Pre-computations from profiling data for each Special Instruction (SI)


For every SI determine

Forecast Candidates


Optimize list of FC-Candidatesand select final forecasts


I pre computations
I. Pre-Computations

  • Pre-computations are done on control-flow graph using profiling-information

  • Temporal Distance from Base Block to SI execution

  • Probability that the SI executions are reached

  • Number of executions of this SI (if it is executed)

Iii optimize list of fc candidates

General Idea:

While the forecasted SIs in a Base Blockconsume too many area:remove the forecast with the worst

Achieved Speedup

Exclusively used Atoms

III. Optimize list of FC Candidates

Main tasks of the run time architecture
Main Tasks of theRun-Time Architecture

  • Monitoring Forecasts and Special Instructions:

    • Fine-tune the forecasted importanceto reflect varying run-time situation


  • Selecting Molecules to implement SIs:

    • Dynamically choose an SI implementationthat matches the current needs of the application


  • Realize the taken decisions:

    • Determine a loading sequence forthe Atoms & control the SI execution


Run time architecture example
Run-time Architecture example

  • 2 Tasks are running alternating, sharing the available Atom Containers

  • Only one task may determine the content of an Atom container, but both can use them

  • [SASO’07]: “A Self-Adaptive Extensible Embedded Processor”(IEEE International Conference on Self-Adaptive andSelf-Organizing Systems Boston, July 9-11)

Results evaluation flow of test application
Results & Evaluation:Flow of Test Application

  • Core part of Encoding Engineof ITU-T H.264

  • Special Instructions (# executions per MacroBlock):

    • SATD_4x4 (256)

    • DCT_4x4 (16)

    • HT_4x4 (1)

  • Focus: Proof of concept, not automatic SI detection

Designing an atom for the three transform operations
Designing an Atom for thethree transform operations

  • Consider constraints

    • Max size of data path

    • Number of I/O signals

    • Number of control signals

  • Increase re-usability

    • Combine similar data paths (MUX)

Composing molecules for satd 4x4
Composing Molecules for SATD_4x4


Performance vs area trade off






Performance vs. Area Trade-off

Area requirements

[# loaded Atoms]

Hardware feasibility study
Hardware Feasibility Study

  • Xilinx Virtex II 3000 xc2v3000-6ff1152

  • Board: Xilinx HW-AFX-FF1152-200

  • Floor-Planning with Plan Ahead

Special instruction execution time for different resources
Special Instruction Execution Timefor Different Resources

Time matters
Time matters!

Design Time

Compile Time

Run Time

  • Fix the avail-able reconfi-gurable hard-ware resour-ces

  • Determine Special Instructions

  • Determine composition out of Atoms / Molecules

  • Profile the application

  • Add Forecast Points to the application

  • Dynamically update the forecasted Importance of the SIs

  • Choose Molecule implemen-tation for SIs

  • The art is to find the right trade-off between design-/compile-time and run-time

Summary conclusion
Summary & Conclusion

  • Hierarchical Special Instruction (SI) composition

    • Atom / Molecule model

    • Use resources more efficiently

    • Offer multiple SI implementations

  • Forecasting SI usages at compile time

    • Pre-computations from profiling and graph analysis

    • Forecast Decision Function

  • Push more decisions to run time

    • Which SI implementation (dynamic trade-off)

    • Adapting to run-time situation

  • There is a large potential for improving the way current Extensible Processors work

Thank you for your attention
Thank you foryour attention !

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe


Lars Bauer, CES, University of Karlsruhe, DAC 2007

Iii select final forecasts

Final Forecast

III. Select final Forecasts

  • Optimization goals

    • As few FCs as possible (smaller code size, less executed cycles), as many as needed (provide all necessary information to the run time system

    • Choose FCs with a good trade-off between ‘sufficiently early’ and a ‘high execution probability’

  • For each SI start Depth-First-Searches on the FC Candidates on the transposed Base Block graph (i.e. all edges reversed)

Green BB:FC-Candidate

Ii fdf details
II. FDF-Details

  • Explanation and Parameter Description:

    • T: Time (Rot: for Rotation; SW: For SW Execution

    • p: Probability

    • E: Energy

    • α: Parameter for Energy vs. Speedup fine-tuning