Exploring Custom Instruction Synthesis for
1 / 21

Lin, Hai Fei, Yunsi - PowerPoint PPT Presentation

  • Uploaded on

Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives. Lin, Hai Fei, Yunsi ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), 2010 Date:2010/05/20 吳俊雄. OUTLINE. INTRODUCTION

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Lin, Hai Fei, Yunsi' - jonco

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Exploring Custom Instruction Synthesis forApplication-Specific Instruction Set Processors withMultiple Design Objectives

Lin, Hai Fei, Yunsi

ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), 2010






  • Two Algorithms for Custom Instruction Synthesis

    • Mixed Integer Linear Programming

    • Simulated Annealing Method



  • Traditional custom instruction synthesisflows for ASIPs mainly target performance improvement.

  • We show that the existing custom instruction exploration algorithms

    • Mixed Integer Linear Programming (MILP)

    • Simulated Annealing Method

  • And cost estimation methods

    • Performance improvement

    • Energy efficiency

    • Area overhead


  • Our work presented in this paper has three major contributions

    • We address the importance of energy andresource efficiency in ASIP design

    • We discuss a setof key factors during the custominstruction selection

    • We show that traditional design spaceexploration algorithms are either not feasible or inefficientto estimate all the necessary factors

  • Since the theoretical complexity for exploring the design space thoroughly is O(2n), most practical techniques adopt heuristics to prune the design space during the search.

  • Present a holistic ASIP synthesis and simulation flow which allows the flexibility to adjust the optimization goal between energy efficiency, area overhead and performance.

Multi objective asip design

  • There are two major energy factors:

    • Instruction fetch consumes aconsiderable portion of the total energy within a processor.

    • The data communication between operations is originally implemented through register file accesses within the base processor.

  • The dynamic energy consumption is affectedby the reduction of the number of instructions and dataregister file accesses.

Multi objective asip design1

Custom processor 1 with CFU1 achieves better performanceimprovement, because it utilizes operation parallelism in theDFG to reduce the total execution cycles.

Customprocessor 2 with CFU2 achieves larger energy saving, because it realizes a sub-graph covering more operations anddata transfer edges.

Multi objective asip design2

We show that generating custom instructions from a DFGcan be viewed as solving an operation scheduling problem.

Thescheduling scheme should ensure data dependency and that the input/outputedges of each software stage satisfy the I/O constraint setby the register file ports.

For a scheduling scheme, the

number ofsoftware stages with

operations in represents the

number ofinstructions for the

customized processor.

The edges acrossdifferent software

stages represent register file


Two algorithms for custom instruction synthesis
Two Algorithms for Custom InstructionSynthesis


  • Mixed Integer Linear Programming (MILP)

  • Primary Variable definition:

    i: index of the operations, l: index of software stages.

  • Parameter definition: hardware execution delay

    k is the index of operation types.

Two algorithms for custom instruction synthesis1
Two Algorithms for Custom InstructionSynthesis

  • Assistant Variable definition: execution cycle delay

  • Constraints:

  • data dependency constraint

  • I/O




Two algorithms for custom instruction synthesis2
Two Algorithms for Custom InstructionSynthesis

SN:The number of instructions

SE:The total number of data accesses

For multi-issue, out-of-order processors

equals to the longest execution path delay of the DFG

:The largest number of this type of operations amongdifferent software stages

:the number of functional modules (operators) of type k needed in the final custom hardware extension.

Two algorithms for custom instruction synthesis3
Two Algorithms for Custom InstructionSynthesis

:The unit hardware area of functional module type k.

energy consumption area overhead execution cycle

The advantage of applying MILP to solve the scheduling problem is that, theoretically, it can find the optimum solution to the problem with sufficient searching time.

Two algorithms for custom instruction synthesis4
Two Algorithms for Custom InstructionSynthesis

Simulated Annealing Method

Solution Vector definition: OPv = {op1, op2, op3, ..., opn}

Solution variation mechanism:

In each iteration, we randomly selectn operations and move them to a different software stage togenerate a new solution.

n represents the maximum distance between current solution and the one it evolves to. t is the current temperature, T is the starting temperature and N is the total number of operations.

Two algorithms for custom instruction synthesis5
Two Algorithms for Custom InstructionSynthesis


The allowable range for certain operation to move aroundis determined by the location of its parent and child nodes.

In our algorithm, the actual moving range for an operation is further tightened by the current temperature - range = R * sqr(t/T ). We randomly move the operation to a software stage within this range.

Two algorithms for custom instruction synthesis6
Two Algorithms for Custom InstructionSynthesis

Solution acceptance mechanism: A new solution is accepted when its cost is smaller than that of the current solution, or can be accepted with a probability of p when the new cost is larger than that of the current solution, where

Simulated Annealing algorithm balances the trade-off between the solution quality and searching time.

Experimental results

CPLEX is used to solve the MILP problem for design space exploration.

The baseline processor is an out-of-order MIPSstyle processor.

Set the ratio betweenthe weight variable g1 and g2 to be 12.2 : 1.

Set the register file I/O constraints to be 4/2.

We perform experiments for energy reduction and for performance improvement by setting the variable å2 and å3 at zero, and å1 and å2 at zero, respectively.

Experimental results1

The average speedup

1.42 for Binary Tree

1.64 for MILP (p.)

1.56 for MILP (e.)

The average energy

consumption reductions are

18.1%, 22.7% and 29.8%.

Experimental results2

The custom instruction templatespresented in (b) and (c) are targeting performance and energy efficiency, respectively. There are more operations inthe templates identified for energy efficiency, shown in (c),and they include longer critical paths than the sub-graphsshown in (b).

Experimental results3

å3=0, å1 = 1, å2 = 0 å1 = å2 = 0.5

For different designs, the ratio between å1 and å2 can be varied to find the best trade-off between them.

Experimental results4

The SA algorithm achieves anaverage of 1.46 performance speedup, which is a little lowerthan that achieved by the MILP algorithm (1.64).