Platform design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Platform Design PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

Platform Design. ASIP Application Specific Instruction-set Processor. TU/e 5kk70 Henk Corporaal Bart Mesman. Application domain specific processors (ADSP or ASIP). DSP. Programmable CPU. Programmable DSP. Application domain specific. Application

Download Presentation

Platform Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Platform design

Platform Design

ASIP

Application Specific Instruction-set Processor

TU/e 5kk70

Henk Corporaal Bart Mesman

Platform Design H.Corporaal and B. Mesman


Platform design

Application domain specific processors (ADSP or ASIP)

DSP

Programmable

CPU

Programmable

DSP

Application domain

specific

Application

specific processor

flexibility

efficiency

Platform Design H.Corporaal and B. Mesman


Platform design

implementation

Appl. domain

GP

ADSP

Appl. domain

implementation

Application domain specific processors (ADSP or ASIP)

  • takes a well defined application domain as a starting point

    • exploits characteristics of the domain (computation kernels)

    • still programmable within the domain

  • e.g. MPEG2 coding uses 8*8 DCT transform, DECT, GSM etc ...

performance: clock speed + ILP ILP + tuning to domain

flexible dev. (new apps.) cost effective (high volume)

problems - specification

manual design, - design time and effort

large effort => synthesized cores

Platform Design H.Corporaal and B. Mesman


Platform design

www.adelantetech.com

Platform Design H.Corporaal and B. Mesman


Platform design

Outline

  • design process

  • retargetable code generation (problem statement)

  • ADSP/VLIW architectures (Mistral 2 /A|RT designer)

  • low power aspects (Mistral 2 /A|RT designer)

  • discussion

  • conclusion

Platform Design H.Corporaal and B. Mesman


Platform design

OK?

more appl.?

Design process

processor-

model

e.g. VLIW with

shared RFs

application(s)

instance

parameters

3 phases

1. exploration

2. hw design (layout)

+ processing

3. design appl. sw

SW (code

generation)

HW

design

Estimations

nsec/cycle,

area, power/instr

Estimations

cycles/alg

occupation

Fast, accurate and

early feedback

no

yes

yes

no

go to phase 2

Platform Design H.Corporaal and B. Mesman


Platform design

Problem statement

A compiler is retargetable if it can generate code for a ‘new’

processor architecture specified in a machine description file.

A guarded register transfer pattern (GRTP) is a register transfer

pattern (RTP) together with the control bits of the instruction word

that control the RTP.

a: = b + c | instr = xxxx0101

GRTPs contain all inter-RT-conflict information.

Instruction set extraction (ISE) is the process of generating all

possible GRTPs for a specific processor.

Platform Design H.Corporaal and B. Mesman


Platform design

Problem statement

Algorithm

spec

Processor

spec (instance)

in ch 4 this is

part of the code

generator

FE

ISE

CDFG

GRTP

Code Generation

Machinecode

Platform Design H.Corporaal and B. Mesman


Platform design

Example: Simple processor [Leupers]

I.(12:5)

Inp

RAM

I.(20:13)

I.(4)

+1

PC

I.(3:2)

IM

I.(1:0)

I.(20:0)

REG

outp

Platform Design H.Corporaal and B. Mesman


Platform design

Example: Simple processor [Leupers]

Platform Design H.Corporaal and B. Mesman


Platform design

ASIP/VLIW architectures

A|RT designer template as an example (= set of rules, a model)

  • Differences with VLIW processors of ch. 4

  • 1. // FUs

    • ASUs = complex appl. Spec. FUs (beyond subword //)

      • e.g. biquad, median, DCT etc …

    • larger grainsize, more heterogeneous, more pipelines

  • 2. Rfiles

    • many Rfiles (>5 vs 1 or 2)

    • limited # ports (3 vs 15)

    • limited size (<16 vs. 128)

  • 3. Issue slots

    • all in parallel vs. 5

Platform Design H.Corporaal and B. Mesman


Platform design

RF5

RF7

RF6

RF8

RF1

RF3

RF2

RF4

FU3

FU4

FU1

FU2

flags

IR3

IR4

IR1

IR2

Instruction memory

Con-

trol

Platform Design H.Corporaal and B. Mesman


Platform design

read

address

RF 1

control

FU

mux 1

write

address

RF 1

read

address

RF 2

mux 2

write

address

RF 2

output

drivers

ASIP/VLIW architectures

  • Additional characteristics of the A|RT designer template

  • interconnect network: busses + input multiplexers

    • mux control is part of the instruction

    • control can change every clock cycle

    • network can be incomplete

    • busses can be merged

  • memories are modeled as FUs

    • separate data in and data out

    • 2 inputs (data in and address) and 1 output

  • Each FU can generate one or more flags

  • instruction format (per issue slot)

Platform Design H.Corporaal and B. Mesman


Platform design

19

10

0

9

mux

2

mux

3

read

RF1

write

RF1

read

RF2

write

RF2

ALU instr.

read

RF3

write

RF3

read

RF4

write

RF4

MAC instr.

ASIP/VLIW architectures: example

RF1

RF2

RF3

RF4

ALU

MAC

bus1

bus2

Platform Design H.Corporaal and B. Mesman


Platform design

ASIP/VLIW architectures : example

Platform Design H.Corporaal and B. Mesman


Platform design

OK?

assign ( a+b, ALU, fu_alu1)

assign ( a+_, ALU, fu_alu2)

assign ( _+_, ALU, fu_alu3)

ASIP/VLIW

architectures:

design flow

Algorithm

spec

Datapath

synthesis

RF1 : x = RF2 : y, RF3 : z |

ALU = ADD

Inmux = bus2

Change

pragmas

RTs

Controller

synthesis

Estimations

area, power,

timing

no

VLIW makes relatively

simple code selection

possible

yes

Platform Design H.Corporaal and B. Mesman


Platform design

*

+

*

*

+

*

*

*

*

+

0

0

1

2

3

1

2

3

1

3

1

2

*

*

*

*

*

*

1

1

5

3

4

3

4

4

*

+

*

+

+

2

2

3

6

3

6

6

+

*

+

*

*

*

*

+

*

3

3

7

5

7

8

5

8

8

7

8

*

+

*

*

*

*

*

4

4

9

10

5

9

5

9

5

*

+

*

+

9

10

9

10

ASIP/VLIW architectures: list scheduling

Candidate

Conflict &

Scheduled

IPB

LIST

Priority Comp.

Operation

*

4

OPB

MULT

ALU

IPB

OPB

5

Platform Design H.Corporaal and B. Mesman


Platform design

ASIP/VLIW architectures: feedback

Platform Design H.Corporaal and B. Mesman


Platform design

Outline

  • design process

  • retargetable code generation (problem statement)

  • ADSP/VLIW architectures (Mistral 2 /A|RT designer)

  • low power aspects (Mistral 2 /A|RT designer)

  • discussion

  • conclusion

Platform Design H.Corporaal and B. Mesman


Platform design

Implementation

Independent

Design Database

Low power aspects

  • Estimation

area

+

speed

power

Mistral2

Estimation Database

Architecture

Platform Design H.Corporaal and B. Mesman


Gsm viterbi decoder default solution

GSM viterbi decoder : default solution

EXUACTIVAREAPOWER

alu_196%346946196

romctrl_148%39259

acu_126%3271209

ipb_15%131105

opb_123%18045801

ctrl9821135035

total15591188605

  • controller responsible for 70% of power consumption

    • maximum resource-sharing

    • heavy decision-making : “main” loop with 16 metrics-computations per iteration

  • EXU-numbers include Registers for local storage

13750

Platform Design H.Corporaal and B. Mesman


Gsm viterbi decoder no loop folding

GSM viterbi decoder : no loop-folding

EXUACTIVAREAPOWER

alu_192%341145073

romctrl_145%39255

acu_125%2941087

ipb_15%10786

opb_122%16615340

ctrl491970087

total10431121928

  • area down by 33%

  • power down by 35%

  • next step: reduce # of program-steps with second ALU

14247

Platform Design H.Corporaal and B. Mesman


Gsm viterbi decoder 2 alu s

GSM viterbi decoder : 2 ALU’s

EXUACTIVAREAPOWER

alu_169%179712248

alu_265%13938916

romctrl_167%39255

acu_137%2941087

ipb_18%149119

opb_133%21366871

ctrl895787235

total14766116731

9739

  • cycle count down 30%

  • area up 42%

  • power down by 5%

  • next step: introduce ASU to reduce ALU-load

Platform Design H.Corporaal and B. Mesman


Gsm viterbi decoder 1 x acs asu

GSM viterbi decoder : 1 x ACS-ASU

func ACS ( M1, M2, d ) MS, MS8 =

begin

MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi;

MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;

end;

=

EXUACTIVAREAPOWER

alu_120%261105

acs_asu_183%23823816

or_asu_110%611122

romctrl_116%6521

acu_136%294205

ipb_120%10743

opb_111%16335

ctrl18643597

total57477944

1930

  • cycle count down 5X

  • power down20X!

Platform Design H.Corporaal and B. Mesman


Gsm viterbi decoder 4 x acs asu

GSM viterbi decoder : 4 x ACS-ASU

EXUACTIVAREAPOWER

alu_194%24397

acs_asu_195%1041420

acs_asu_295%1041420

acs_asu_395%1041420

acs_asu_495%1041420

split_asu_147%9018

or_asu_147%592118

romctrl_128%486

acu_198%21285

ipb_123%606

opb_150%36980

ctrl1306555

total70842645

425

  • cycle count down another 5X

  • area up 23%

  • power downanother 3X!

Platform Design H.Corporaal and B. Mesman


Gsm viterbi example summary

Implementation

Independent

Design Database

GSM viterbi example : summary

Mistral2

72x !

Platform Design H.Corporaal and B. Mesman


Platform design

OK?

OK?

more appl.?

Discussion: phase 3

processor-

model

application(s)

application(s)

SW (code

generation)

HW

design

SW (code

generation)

Freeze

processor model

no

no

no

yes

yes

no

yes

Application software

development:

constraint driven compilation

Exploration phase

Platform Design H.Corporaal and B. Mesman


Platform design

Discussion: problems with VLIWs

code size and instruction bandwidth

  • code compaction = reduce code size after scheduling

    • possible compaction ratio ?

    • e.g. p0 = 0.9 and p1 = 0.1

    • information content (entropy) = - pi log2 pi = 0.47

      • maximum compression factor  2

  • control parallelism during scheduling = switch between

    • different processor models (10% of code = 90% runtime)

  • architecture

    • reduce number of control bits for operand addresses

    • e.g. 128 reg (TM) -> 28 bits/issue slot for addresses only

    • => use stacks and fifos

  • Platform Design H.Corporaal and B. Mesman


    Platform design

    RF2

    RF1

    RF3

    RF4

    FU3

    FU4

    FU1

    FU2

    flags

    IR3

    IR4

    IR1

    IR2

    Instruction memory

    Con-

    trol

    Platform Design H.Corporaal and B. Mesman


    Conclusions

    Conclusions

    • ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency).

    • The methodology is interesting for IP creation.

    • The key problem is retargetable compilation.

    • A (distributed) VLIW model is a good compromise between HW and SW.

    • Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback.

    Platform Design H.Corporaal and B. Mesman


  • Login