platform design
Download
Skip this Video
Download Presentation
Platform Design

Loading in 2 Seconds...

play fullscreen
1 / 30

Platform Design - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Platform Design. ASIP Application Specific Instruction-set Processor. TU/e 5kk70 Henk Corporaal Bart Mesman. Application domain specific processors (ADSP or ASIP). DSP. Programmable CPU. Programmable DSP. Application domain specific. Application

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Platform Design' - gareth-clark


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
platform design

Platform Design

ASIP

Application Specific Instruction-set Processor

TU/e 5kk70

Henk Corporaal Bart Mesman

Platform Design H.Corporaal and B. Mesman

slide2
Application domain specific processors (ADSP or ASIP)

DSP

Programmable

CPU

Programmable

DSP

Application domain

specific

Application

specific processor

flexibility

efficiency

Platform Design H.Corporaal and B. Mesman

slide3
implementation

Appl. domain

GP

ADSP

Appl. domain

implementation

Application domain specific processors (ADSP or ASIP)

  • takes a well defined application domain as a starting point
    • exploits characteristics of the domain (computation kernels)
    • still programmable within the domain
  • e.g. MPEG2 coding uses 8*8 DCT transform, DECT, GSM etc ...

performance: clock speed + ILP ILP + tuning to domain

flexible dev. (new apps.) cost effective (high volume)

problems - specification

manual design, - design time and effort

large effort => synthesized cores

Platform Design H.Corporaal and B. Mesman

slide4
www.adelantetech.com

Platform Design H.Corporaal and B. Mesman

slide5
Outline
  • design process
  • retargetable code generation (problem statement)
  • ADSP/VLIW architectures (Mistral 2 /A|RT designer)
  • low power aspects (Mistral 2 /A|RT designer)
  • discussion
  • conclusion

Platform Design H.Corporaal and B. Mesman

slide6
OK?

more appl.?

Design process

processor-

model

e.g. VLIW with

shared RFs

application(s)

instance

parameters

3 phases

1. exploration

2. hw design (layout)

+ processing

3. design appl. sw

SW (code

generation)

HW

design

Estimations

nsec/cycle,

area, power/instr

Estimations

cycles/alg

occupation

Fast, accurate and

early feedback

no

yes

yes

no

go to phase 2

Platform Design H.Corporaal and B. Mesman

slide7
Problem statement

A compiler is retargetable if it can generate code for a ‘new’

processor architecture specified in a machine description file.

A guarded register transfer pattern (GRTP) is a register transfer

pattern (RTP) together with the control bits of the instruction word

that control the RTP.

a: = b + c | instr = xxxx0101

GRTPs contain all inter-RT-conflict information.

Instruction set extraction (ISE) is the process of generating all

possible GRTPs for a specific processor.

Platform Design H.Corporaal and B. Mesman

slide8
Problem statement

Algorithm

spec

Processor

spec (instance)

in ch 4 this is

part of the code

generator

FE

ISE

CDFG

GRTP

Code Generation

Machinecode

Platform Design H.Corporaal and B. Mesman

slide9
Example: Simple processor [Leupers]

I.(12:5)

Inp

RAM

I.(20:13)

I.(4)

+1

PC

I.(3:2)

IM

I.(1:0)

I.(20:0)

REG

outp

Platform Design H.Corporaal and B. Mesman

slide10
Example: Simple processor [Leupers]

Platform Design H.Corporaal and B. Mesman

slide11
ASIP/VLIW architectures

A|RT designer template as an example (= set of rules, a model)

  • Differences with VLIW processors of ch. 4
  • 1. // FUs
    • ASUs = complex appl. Spec. FUs (beyond subword //)
      • e.g. biquad, median, DCT etc …
    • larger grainsize, more heterogeneous, more pipelines
  • 2. Rfiles
    • many Rfiles (>5 vs 1 or 2)
    • limited # ports (3 vs 15)
    • limited size (<16 vs. 128)
  • 3. Issue slots
    • all in parallel vs. 5

Platform Design H.Corporaal and B. Mesman

slide12
RF5

RF7

RF6

RF8

RF1

RF3

RF2

RF4

FU3

FU4

FU1

FU2

flags

IR3

IR4

IR1

IR2

Instruction memory

Con-

trol

Platform Design H.Corporaal and B. Mesman

slide13
read

address

RF 1

control

FU

mux 1

write

address

RF 1

read

address

RF 2

mux 2

write

address

RF 2

output

drivers

ASIP/VLIW architectures

  • Additional characteristics of the A|RT designer template
  • interconnect network: busses + input multiplexers
    • mux control is part of the instruction
    • control can change every clock cycle
    • network can be incomplete
    • busses can be merged
  • memories are modeled as FUs
    • separate data in and data out
    • 2 inputs (data in and address) and 1 output
  • Each FU can generate one or more flags
  • instruction format (per issue slot)

Platform Design H.Corporaal and B. Mesman

slide14
19

10

0

9

mux

2

mux

3

read

RF1

write

RF1

read

RF2

write

RF2

ALU instr.

read

RF3

write

RF3

read

RF4

write

RF4

MAC instr.

ASIP/VLIW architectures: example

RF1

RF2

RF3

RF4

ALU

MAC

bus1

bus2

Platform Design H.Corporaal and B. Mesman

slide15
ASIP/VLIW architectures : example

Platform Design H.Corporaal and B. Mesman

slide16
OK?

assign ( a+b, ALU, fu_alu1)

assign ( a+_, ALU, fu_alu2)

assign ( _+_, ALU, fu_alu3)

ASIP/VLIW

architectures:

design flow

Algorithm

spec

Datapath

synthesis

RF1 : x = RF2 : y, RF3 : z |

ALU = ADD

Inmux = bus2

Change

pragmas

RTs

Controller

synthesis

Estimations

area, power,

timing

no

VLIW makes relatively

simple code selection

possible

yes

Platform Design H.Corporaal and B. Mesman

slide17
*

+

*

*

+

*

*

*

*

+

0

0

1

2

3

1

2

3

1

3

1

2

*

*

*

*

*

*

1

1

5

3

4

3

4

4

*

+

*

+

+

2

2

3

6

3

6

6

+

*

+

*

*

*

*

+

*

3

3

7

5

7

8

5

8

8

7

8

*

+

*

*

*

*

*

4

4

9

10

5

9

5

9

5

*

+

*

+

9

10

9

10

ASIP/VLIW architectures: list scheduling

Candidate

Conflict &

Scheduled

IPB

LIST

Priority Comp.

Operation

*

4

OPB

MULT

ALU

IPB

OPB

5

Platform Design H.Corporaal and B. Mesman

slide18
ASIP/VLIW architectures: feedback

Platform Design H.Corporaal and B. Mesman

slide19
Outline
  • design process
  • retargetable code generation (problem statement)
  • ADSP/VLIW architectures (Mistral 2 /A|RT designer)
  • low power aspects (Mistral 2 /A|RT designer)
  • discussion
  • conclusion

Platform Design H.Corporaal and B. Mesman

slide20
Implementation

Independent

Design Database

Low power aspects

  • Estimation

area

+

speed

power

Mistral2

Estimation Database

Architecture

Platform Design H.Corporaal and B. Mesman

gsm viterbi decoder default solution
GSM viterbi decoder : default solution

EXU ACTIV AREA POWER

alu_1 96% 3469 46196

romctrl_1 48% 39 259

acu_1 26% 327 1209

ipb_1 5% 131 105

opb_1 23% 1804 5801

ctrl 9821 135035

total 15591 188605

  • controller responsible for 70% of power consumption
    • maximum resource-sharing
    • heavy decision-making : “main” loop with 16 metrics-computations per iteration
  • EXU-numbers include Registers for local storage

13750

Platform Design H.Corporaal and B. Mesman

gsm viterbi decoder no loop folding
GSM viterbi decoder : no loop-folding

EXU ACTIV AREA POWER

alu_1 92% 3411 45073

romctrl_1 45% 39 255

acu_1 25% 294 1087

ipb_1 5% 107 86

opb_1 22% 1661 5340

ctrl 4919 70087

total 10431 121928

  • area down by 33%
  • power down by 35%
  • next step: reduce # of program-steps with second ALU

14247

Platform Design H.Corporaal and B. Mesman

gsm viterbi decoder 2 alu s
GSM viterbi decoder : 2 ALU’s

EXU ACTIV AREA POWER

alu_1 69% 1797 12248

alu_2 65% 1393 8916

romctrl_1 67% 39 255

acu_1 37% 294 1087

ipb_1 8% 149 119

opb_1 33% 2136 6871

ctrl 8957 87235

total 14766 116731

9739

  • cycle count down 30%
  • area up 42%
  • power down by 5%
  • next step: introduce ASU to reduce ALU-load

Platform Design H.Corporaal and B. Mesman

gsm viterbi decoder 1 x acs asu
GSM viterbi decoder : 1 x ACS-ASU

func ACS ( M1, M2, d ) MS, MS8 =

begin

MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi;

MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;

end;

=

EXU ACTIV AREA POWER

alu_1 20% 261 105

acs_asu_1 83% 2382 3816

or_asu_1 10% 611 122

romctrl_1 16% 65 21

acu_1 36% 294 205

ipb_1 20% 107 43

opb_1 11% 163 35

ctrl 1864 3597

total 5747 7944

1930

  • cycle count down 5X
  • power down20X!

Platform Design H.Corporaal and B. Mesman

gsm viterbi decoder 4 x acs asu
GSM viterbi decoder : 4 x ACS-ASU

EXU ACTIV AREA POWER

alu_1 94% 243 97

acs_asu_1 95% 1041 420

acs_asu_2 95% 1041 420

acs_asu_3 95% 1041 420

acs_asu_4 95% 1041 420

split_asu_1 47% 90 18

or_asu_1 47% 592 118

romctrl_1 28% 48 6

acu_1 98% 212 85

ipb_1 23% 60 6

opb_1 50% 369 80

ctrl 1306 555

total 7084 2645

425

  • cycle count down another 5X
  • area up 23%
  • power downanother 3X!

Platform Design H.Corporaal and B. Mesman

gsm viterbi example summary
Implementation

Independent

Design Database

GSM viterbi example : summary

Mistral2

72x !

Platform Design H.Corporaal and B. Mesman

slide27
OK?

OK?

more appl.?

Discussion: phase 3

processor-

model

application(s)

application(s)

SW (code

generation)

HW

design

SW (code

generation)

Freeze

processor model

no

no

no

yes

yes

no

yes

Application software

development:

constraint driven compilation

Exploration phase

Platform Design H.Corporaal and B. Mesman

slide28
Discussion: problems with VLIWs

code size and instruction bandwidth

  • code compaction = reduce code size after scheduling
      • possible compaction ratio ?
      • e.g. p0 = 0.9 and p1 = 0.1
      • information content (entropy) = - pi log2 pi = 0.47
        • maximum compression factor  2
  • control parallelism during scheduling = switch between
      • different processor models (10% of code = 90% runtime)
  • architecture
      • reduce number of control bits for operand addresses
      • e.g. 128 reg (TM) -> 28 bits/issue slot for addresses only
      • => use stacks and fifos

Platform Design H.Corporaal and B. Mesman

slide29
RF2

RF1

RF3

RF4

FU3

FU4

FU1

FU2

flags

IR3

IR4

IR1

IR2

Instruction memory

Con-

trol

Platform Design H.Corporaal and B. Mesman

conclusions
Conclusions
  • ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency).
  • The methodology is interesting for IP creation.
  • The key problem is retargetable compilation.
  • A (distributed) VLIW model is a good compromise between HW and SW.
  • Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback.

Platform Design H.Corporaal and B. Mesman

ad