Rispp r otating i nstruction s et p rocessing p latform
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

RISPP: R otating I nstruction S et P rocessing P latform PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

RISPP: R otating I nstruction S et P rocessing P latform. Lars Bauer, Muhammad Shafique, Simon Kramer and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe. Outline. Motivation Related Work Our RISPP Approach: Special Instructions (SIs) composition

Download Presentation

RISPP: R otating I nstruction S et P rocessing P latform

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Rispp r otating i nstruction s et p rocessing p latform

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe


Outline

Outline

  • Motivation

  • Related Work

  • Our RISPP Approach:

    • Special Instructions (SIs) composition

    • Forecasting SI usages

    • Run-time architecture

  • Results & Evaluation


Development of embedded systems

Development of Embedded Systems

  • Typical:

    • Static analysis of hot spots

    • Building tightly optimized system

  • Nowadays:

    • Increasing complexity

    • More functionality

  • Problem:

    • Statically chosen design point has to match all requirements

    • Typically inefficient for individual components (e.g. tasks or hot spots)

nokia.com


Possible solution extensible processors

Possible Solution:Extensible Processors


Related work extensible processors

Related Work:Extensible Processors

  • S Kobayashi, K Mita, Y Takeuchi, M Imai: “Design space exploration for dsp applications using the ASIP development system PEAS-III”, ICASSP 2002

  • A Hoffmann, T Kogel, A Nohl, G Braun, O Schliesbusch, O Wahlen, A Wieferink, H Meyr “A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language”, IEEE Trans. on CAD of Int. Circ. and Syst. 01

  • K Atasu, L Pozzi, P Ienne “Automatic application-specific instruction-set extensions under microarchitectural constraints”, DAC, 2003

  • F Sun, S Ravi, A Raghunathan, NK Jha “A scalable application-specific processor synthesis methodology”, ICCAD, 2003

  • N Cheung, S Parameswaran, J Henkel “A quantitative study and estimation models for extensible instructions in embedded processors”, ICCAD, 2004


Problem various hot spots

Problem: Various Hot-Spots


Related work reconfigurable computing

Related Work:Reconfigurable Computing

  • K Compton, S Hauck “Reconfigurable computing: a survey of systems and software”, ACM Computing Surveys 2002

  • F Barat, R Lauwereins “Reconfigurable instruction set processors: a survey”, RSP 2000

  • RD Wittig, P Chow “OneChip: an FPGA processor with reconfigurable logic”, IEEE Symp. FCCM, 1996

  • S Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, E.M. Panainte, “The MOLEN polymorphic processor”, IEEE Transaction on Computers, 2004


Dynamic system behavior

Dynamic System Behavior

  • Extensible Processor: choosing points in designspace at design time

  • Reconfigurable Computing: typically fix at compile time when and how to deploy reconfigurable hardware

  • How to handle situations that areunknown at design- & compile- time?

    • (while still supporting various extensible instructions)

  • Depending on input data(e.g. different computational paths in video encoder)

  • Which tasks/applications will be executed together?


Our new concept basic idea and overview

Our New Concept:Basic Idea and Overview

  • At design time: fix the amount of reconfigurable hardware

  • At compile time: compose Special Instructions (SIs) out of high re-usable datapaths

  • At run time:dynamicallydetermine theimplementa-tion of a SI

  • Altogether:Rotate theInstructionSet


Fundamental idea atom molecule model

Fundamental Idea:Atom / Molecule Model

Example Atom

Example Molecule

Example Molecule

  • Key:

  • Multiple implementations per SI (Molecules)

  • Each Molecule is composed out of Atoms

    • Implementation hierarchy

    • Atoms are more reusable

    • Molecules are more specific

  • Advantage: Enables dynamic trade-off

  • Drawback: Higher design effort

  • Atom: elementary data path (smaller granularity)

  • Molecule: combination of Atoms (bigger granularity)

  • Special Instr.: Application specific assembly instruction


Formal atom molecule model example

Legend:

Molecule

Relation “is bigger or equal than”

Infimum of the Molecules

Supremum of the Molecules

Formal Atom / Molecule Model: Example

  • Molecule relations are e.g. needed when Molecules comprise each other

    • In such cases we can first configure the smallest possible Molecule with required functionality and then upgrade to faster implementations

# Atoms A2

(in general: n-dimensional)

(3,5)

(1,4)

1

# Atoms A1

1


Formal atom molecule model details

Formal Atom / Molecule Model: Details

  • Main data structure:Set of all Molecules

  • Meta-Molecule to implement two Molecules, such that they can be executed consecutively, i.e. temporal domain (Abelian Group)

  • Meta-Molecule for the common Atoms (indicator for compatibility)

  • Relation (Complete Lattice), with

    • Supremum: Meta-Molecule that is needed to implement all Molecules

    • Infimum: Meta-Molecule that is col-lectively needed for all Molecules


Formal atom molecule model details1

Formal Atom / Molecule Model: Details

  • Determinant: number of Atoms needed to implement a Molecule

  • Upgrading: Atoms that are additionally needed to implement o, assuming m is already available


Instruction set rotation time

Instruction Set Rotation Time

For our examples:

0.84 – 0.95 ms

  • Loading time depends on:

    • Atom size

    • Reconfiguration bandwidth

Execution and Reconfiguration times for SATD_4x4 for 1 frame:

  • Altogether: Hardware has to be available when needed start loading early


Si forecasting example

“forecast SATD_4x4, 42”

Executions of SATD_4x4

SI Forecasting: Example

  • Control-flow graph

    • Each node is a Base-Block (BB)

  • At compile time:

    • Determine points to forecast a SI

    • Add Forecast Instructions with forecast values (about the SI importance) to these points

  • At run time:

    • Use the Forecasts to determine the Instruction Set rotation

    • Dynamically update the importance of the forecasted SIs

Time for Instruction set rotation

Return fromsubroutine


Inserting forecast points fcs general idea of algorithm

Inserting Forecast Points (FCs):General Idea of Algorithm

Pre-computations from profiling data for each Special Instruction (SI)

I.

For every SI determine

Forecast Candidates

II.

Optimize list of FC-Candidatesand select final forecasts

III.


I pre computations

I. Pre-Computations

  • Pre-computations are done on control-flow graph using profiling-information

  • Temporal Distance from Base Block to SI execution

  • Probability that the SI executions are reached

  • Number of executions of this SI (if it is executed)


Ii forecast decision function fdf

II. Forecast Decision Function (FDF)


Iii optimize list of fc candidates

General Idea:

While the forecasted SIs in a Base Blockconsume too many area:remove the forecast with the worst

Achieved Speedup

Exclusively used Atoms

III. Optimize list of FC Candidates


Main tasks of the run time architecture

Main Tasks of theRun-Time Architecture

  • Monitoring Forecasts and Special Instructions:

    • Fine-tune the forecasted importanceto reflect varying run-time situation

a)

  • Selecting Molecules to implement SIs:

    • Dynamically choose an SI implementationthat matches the current needs of the application

b)

  • Realize the taken decisions:

    • Determine a loading sequence forthe Atoms & control the SI execution

c)


Run time architecture example

Run-time Architecture example

  • 2 Tasks are running alternating, sharing the available Atom Containers

  • Only one task may determine the content of an Atom container, but both can use them

  • [SASO’07]: “A Self-Adaptive Extensible Embedded Processor”(IEEE International Conference on Self-Adaptive andSelf-Organizing Systems Boston, July 9-11)


Results evaluation flow of test application

Results & Evaluation:Flow of Test Application

  • Core part of Encoding Engineof ITU-T H.264

  • Special Instructions (# executions per MacroBlock):

    • SATD_4x4 (256)

    • DCT_4x4 (16)

    • HT_4x4 (1)

  • Focus: Proof of concept, not automatic SI detection


Designing an atom for the three transform operations

Designing an Atom for thethree transform operations

  • Consider constraints

    • Max size of data path

    • Number of I/O signals

    • Number of control signals

  • Increase re-usability

    • Combine similar data paths (MUX)


Composing molecules for satd 4x4

Composing Molecules for SATD_4x4

Increasedre-usability


Performance vs area trade off

max

15

10

5

0

Performance vs. Area Trade-off

Area requirements

[# loaded Atoms]


Hardware feasibility study

Hardware Feasibility Study

  • Xilinx Virtex II 3000 xc2v3000-6ff1152

  • Board: Xilinx HW-AFX-FF1152-200

  • Floor-Planning with Plan Ahead


Special instruction execution time for different resources

Special Instruction Execution Timefor Different Resources


Application execution time

Application Execution Time


Time matters

Time matters!

Design Time

Compile Time

Run Time

  • Fix the avail-able reconfi-gurable hard-ware resour-ces

  • Determine Special Instructions

  • Determine composition out of Atoms / Molecules

  • Profile the application

  • Add Forecast Points to the application

  • Dynamically update the forecasted Importance of the SIs

  • Choose Molecule implemen-tation for SIs

  • The art is to find the right trade-off between design-/compile-time and run-time


Summary conclusion

Summary & Conclusion

  • Hierarchical Special Instruction (SI) composition

    • Atom / Molecule model

    • Use resources more efficiently

    • Offer multiple SI implementations

  • Forecasting SI usages at compile time

    • Pre-computations from profiling and graph analysis

    • Forecast Decision Function

  • Push more decisions to run time

    • Which SI implementation (dynamic trade-off)

    • Adapting to run-time situation

  • There is a large potential for improving the way current Extensible Processors work


Thank you for your attention

Thank you foryour attention !

RISPP: Rotating Instruction SetProcessing Platform

Lars Bauer, Muhammad Shafique, Simon Kramerand Jörg Henkel

Chair for Embedded Systems (CES)

University of Karlsruhe

http://ces.univ-karlsruhe.de

Lars Bauer, CES, University of Karlsruhe, DAC 2007


Atom container interconnections

Atom-Container Interconnections


Iii select final forecasts

Final Forecast

III. Select final Forecasts

  • Optimization goals

    • As few FCs as possible (smaller code size, less executed cycles), as many as needed (provide all necessary information to the run time system

    • Choose FCs with a good trade-off between ‘sufficiently early’ and a ‘high execution probability’

  • For each SI start Depth-First-Searches on the FC Candidates on the transposed Base Block graph (i.e. all edges reversed)

Green BB:FC-Candidate


Rispp area savings

RISPP Area Savings


Ii fdf details

II. FDF-Details

  • Explanation and Parameter Description:

    • T: Time (Rot: for Rotation; SW: For SW Execution

    • p: Probability

    • E: Energy

    • α: Parameter for Energy vs. Speedup fine-tuning


  • Login