Rispp r otating i nstruction s et p rocessing p latform
Sponsored Links
This presentation is the property of its rightful owner.
1 / 35

RISPP: R otating I nstruction S et P rocessing P latform PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

RISPP: R otating I nstruction S et P rocessing P latform. Lars Bauer, Muhammad Shafique, Simon Kramer and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe. Outline. Motivation Related Work Our RISPP Approach: Special Instructions (SIs) composition

Download Presentation

RISPP: R otating I nstruction S et P rocessing P latform

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe


  • Motivation

  • Related Work

  • Our RISPP Approach:

    • Special Instructions (SIs) composition

    • Forecasting SI usages

    • Run-time architecture

  • Results & Evaluation

Development of Embedded Systems

  • Typical:

    • Static analysis of hot spots

    • Building tightly optimized system

  • Nowadays:

    • Increasing complexity

    • More functionality

  • Problem:

    • Statically chosen design point has to match all requirements

    • Typically inefficient for individual components (e.g. tasks or hot spots)


Possible Solution:Extensible Processors

Related Work:Extensible Processors

  • S Kobayashi, K Mita, Y Takeuchi, M Imai: “Design space exploration for dsp applications using the ASIP development system PEAS-III”, ICASSP 2002

  • A Hoffmann, T Kogel, A Nohl, G Braun, O Schliesbusch, O Wahlen, A Wieferink, H Meyr “A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language”, IEEE Trans. on CAD of Int. Circ. and Syst. 01

  • K Atasu, L Pozzi, P Ienne “Automatic application-specific instruction-set extensions under microarchitectural constraints”, DAC, 2003

  • F Sun, S Ravi, A Raghunathan, NK Jha “A scalable application-specific processor synthesis methodology”, ICCAD, 2003

  • N Cheung, S Parameswaran, J Henkel “A quantitative study and estimation models for extensible instructions in embedded processors”, ICCAD, 2004

Problem: Various Hot-Spots

Related Work:Reconfigurable Computing

  • K Compton, S Hauck “Reconfigurable computing: a survey of systems and software”, ACM Computing Surveys 2002

  • F Barat, R Lauwereins “Reconfigurable instruction set processors: a survey”, RSP 2000

  • RD Wittig, P Chow “OneChip: an FPGA processor with reconfigurable logic”, IEEE Symp. FCCM, 1996

  • S Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E.M. Panainte, “The MOLEN polymorphic processor”, IEEE Transaction on Computers, 2004

Dynamic System Behavior

  • Extensible Processor: choosing points in designspace at design time

  • Reconfigurable Computing: typically fix at compile time when and how to deploy reconfigurable hardware

  • How to handle situations that areunknown at design- & compile- time?

    • (while still supporting various extensible instructions)

  • Depending on input data(e.g. different computational paths in video encoder)

  • Which tasks/applications will be executed together?

Our New Concept:Basic Idea and Overview

  • At design time: fix the amount of reconfigurable hardware

  • At compile time: compose Special Instructions (SIs) out of high re-usable datapaths

  • At run time:dynamicallydetermine theimplementa-tion of a SI

  • Altogether:Rotate theInstructionSet

Fundamental Idea:Atom / Molecule Model

Example Atom

Example Molecule

Example Molecule

  • Key:

  • Multiple implementations per SI (Molecules)

  • Each Molecule is composed out of Atoms

    • Implementation hierarchy

    • Atoms are more reusable

    • Molecules are more specific

  • Advantage: Enables dynamic trade-off

  • Drawback: Higher design effort

  • Atom: elementary data path (smaller granularity)

  • Molecule: combination of Atoms (bigger granularity)

  • Special Instr.: Application specific assembly instruction



Relation “is bigger or equal than”

Infimum of the Molecules

Supremum of the Molecules

Formal Atom / Molecule Model: Example

  • Molecule relations are e.g. needed when Molecules comprise each other

    • In such cases we can first configure the smallest possible Molecule with required functionality and then upgrade to faster implementations

# Atoms A2

(in general: n-dimensional)




# Atoms A1


Formal Atom / Molecule Model: Details

  • Main data structure:Set of all Molecules

  • Meta-Molecule to implement two Molecules, such that they can be executed consecutively, i.e. temporal domain (Abelian Group)

  • Meta-Molecule for the common Atoms (indicator for compatibility)

  • Relation (Complete Lattice), with

    • Supremum: Meta-Molecule that is needed to implement all Molecules

    • Infimum: Meta-Molecule that is col-lectively needed for all Molecules

Formal Atom / Molecule Model: Details

  • Determinant: number of Atoms needed to implement a Molecule

  • Upgrading: Atoms that are additionally needed to implement o, assuming m is already available

Instruction Set Rotation Time

For our examples:

0.84 – 0.95 ms

  • Loading time depends on:

    • Atom size

    • Reconfiguration bandwidth

Execution and Reconfiguration times for SATD_4x4 for 1 frame:

  • Altogether: Hardware has to be available when needed start loading early

“forecast SATD_4x4, 42”

Executions of SATD_4x4

SI Forecasting: Example

  • Control-flow graph

    • Each node is a Base-Block (BB)

  • At compile time:

    • Determine points to forecast a SI

    • Add Forecast Instructions with forecast values (about the SI importance) to these points

  • At run time:

    • Use the Forecasts to determine the Instruction Set rotation

    • Dynamically update the importance of the forecasted SIs

Time for Instruction set rotation

Return fromsubroutine

Inserting Forecast Points (FCs):General Idea of Algorithm

Pre-computations from profiling data for each Special Instruction (SI)


For every SI determine

Forecast Candidates


Optimize list of FC-Candidatesand select final forecasts


I. Pre-Computations

  • Pre-computations are done on control-flow graph using profiling-information

  • Temporal Distance from Base Block to SI execution

  • Probability that the SI executions are reached

  • Number of executions of this SI (if it is executed)

II. Forecast Decision Function (FDF)

General Idea:

While the forecasted SIs in a Base Blockconsume too many area:remove the forecast with the worst

Achieved Speedup

Exclusively used Atoms

III. Optimize list of FC Candidates

Main Tasks of theRun-Time Architecture

  • Monitoring Forecasts and Special Instructions:

    • Fine-tune the forecasted importanceto reflect varying run-time situation


  • Selecting Molecules to implement SIs:

    • Dynamically choose an SI implementationthat matches the current needs of the application


  • Realize the taken decisions:

    • Determine a loading sequence forthe Atoms & control the SI execution


Run-time Architecture example

  • 2 Tasks are running alternating, sharing the available Atom Containers

  • Only one task may determine the content of an Atom container, but both can use them

  • [SASO’07]: “A Self-Adaptive Extensible Embedded Processor”(IEEE International Conference on Self-Adaptive andSelf-Organizing Systems Boston, July 9-11)

Results & Evaluation:Flow of Test Application

  • Core part of Encoding Engineof ITU-T H.264

  • Special Instructions (# executions per MacroBlock):

    • SATD_4x4 (256)

    • DCT_4x4 (16)

    • HT_4x4 (1)

  • Focus: Proof of concept, not automatic SI detection

Designing an Atom for thethree transform operations

  • Consider constraints

    • Max size of data path

    • Number of I/O signals

    • Number of control signals

  • Increase re-usability

    • Combine similar data paths (MUX)

Composing Molecules for SATD_4x4







Performance vs. Area Trade-off

Area requirements

[# loaded Atoms]

Hardware Feasibility Study

  • Xilinx Virtex II 3000 xc2v3000-6ff1152

  • Board: Xilinx HW-AFX-FF1152-200

  • Floor-Planning with Plan Ahead

Special Instruction Execution Timefor Different Resources

Application Execution Time

Time matters!

Design Time

Compile Time

Run Time

  • Fix the avail-able reconfi-gurable hard-ware resour-ces

  • Determine Special Instructions

  • Determine composition out of Atoms / Molecules

  • Profile the application

  • Add Forecast Points to the application

  • Dynamically update the forecasted Importance of the SIs

  • Choose Molecule implemen-tation for SIs

  • The art is to find the right trade-off between design-/compile-time and run-time

Summary & Conclusion

  • Hierarchical Special Instruction (SI) composition

    • Atom / Molecule model

    • Use resources more efficiently

    • Offer multiple SI implementations

  • Forecasting SI usages at compile time

    • Pre-computations from profiling and graph analysis

    • Forecast Decision Function

  • Push more decisions to run time

    • Which SI implementation (dynamic trade-off)

    • Adapting to run-time situation

  • There is a large potential for improving the way current Extensible Processors work

Thank you foryour attention !

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe


Lars Bauer, CES, University of Karlsruhe, DAC 2007

Atom-Container Interconnections

Final Forecast

III. Select final Forecasts

  • Optimization goals

    • As few FCs as possible (smaller code size, less executed cycles), as many as needed (provide all necessary information to the run time system

    • Choose FCs with a good trade-off between ‘sufficiently early’ and a ‘high execution probability’

  • For each SI start Depth-First-Searches on the FC Candidates on the transposed Base Block graph (i.e. all edges reversed)

Green BB:FC-Candidate

RISPP Area Savings

II. FDF-Details

  • Explanation and Parameter Description:

    • T: Time (Rot: for Rotation; SW: For SW Execution

    • p: Probability

    • E: Energy

    • α: Parameter for Energy vs. Speedup fine-tuning

  • Login