Design automation methodologies for extensible processors platform
Download
1 / 54

Design Automation Methodologies for Extensible Processors Platform - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Design Automation Methodologies for Extensible Processors Platform. Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia. Outline. System-On-Chips Design Challenges Extensible Processors Platform Background – Xtensa

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Design Automation Methodologies for Extensible Processors Platform' - noura


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Design automation methodologies for extensible processors platform

Design Automation Methodologies for Extensible Processors Platform

Newton Cheung

School of Computer Science & Engineering

The University of New South Wales

Sydney, Australia


Outline
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Soc design challenges
SoC Design Challenges Platform

  • Application functionality

  • Ever changing nature of embedded product

Increasing software content

Flexibility

  • Power consumption

  • Chip area

  • Cost

Customizing the architecture to the specific application (application domain)

Efficiency

ncheung@cse.unsw.edu.au


Flexibility vs efficiency energy
Flexibility vs. Efficiency (Energy) Platform

Application Specific Integrated Circuit (ASIC)

500-1000 MIPS/mw

Application Specific Instruction-set Processor (ASIP)

50-100 MIPS/mw

Energy Efficiency

Flexibility

Flexibility

Domain Specific

Processor (DSP)

1-10 MIPS/mw

General Purpose Processor

0.1-1 MIPS/mw

Source: Fei Sun @ ICCAD 2002

ncheung@cse.unsw.edu.au


Flexibility vs efficiency performance
Flexibility vs. Efficiency (Performance) Platform

Performance

ASIC

ASIP

FPGA

GPP

Flexibility

ncheung@cse.unsw.edu.au


Application specific processor

General Purpose Processor Platform

Application Specific Processor

ncheung@cse.unsw.edu.au


Optimized processor design asip
Optimized Processor Design (ASIP) Platform

  • Cost

    • Small size

  • Performance

    • Application extensions

  • Productivity

    • Rapid hardware

      and software

ncheung@cse.unsw.edu.au


Extensible processors platform
Extensible Processors Platform Platform

  • Represents the state-of-the-art in application specific instruction-set processor (ASIP)

  • Consists of a base processor containing a base instruction set, plus the capability to customize their architecture to replace computationally intensive code segments

  • The goal of designing extensible processors is typically to maximize the performance of an application, while meeting design constraints

ncheung@cse.unsw.edu.au


Extensible processors platform1

I:Instruction Fetch Platform

R:Instruction Decode / Register Fetch

E: Execute / Effective Address

M: Memory Access / Branch Complete

W: Write Back

Instruction ROM / RAM / Cache

Decode / General Registers

ALU / Address Generation

Data ROM / RAM / Cache

Write Back / Exception Resolution

Custom / Specific Instruction

Predefined Block Register (i.e. 64-bit)

Predefined Blocks (e.g. DSP – Digital Signal Processing, Floating-Point Unit)

Extensible Processors Platform

  • Enables to address three architectural levels on the base processor:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

ncheung@cse.unsw.edu.au


Outline1
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Background xtensa
Background (Xtensa) Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa architecture
Xtensa Architecture Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa custom instructions tie
Xtensa Custom Instructions (TIE) Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa s design flow
Xtensa’s Design Flow Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Extensible processor why
Extensible Processor (Why?) Platform

ncheung@cse.unsw.edu.au


Example digital video
Example: Digital Video Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Example multiple processors for toe
Example: Multiple Processors for TOE Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Example voice gateway
Example: Voice Gateway Platform

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Outline2
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Problems in automation

Design Expertise Required!! Platform

Problems in Automation??

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Generic design flow of extensible processor
Generic Design Flow of Extensible Processor Platform

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Generate extensible processor

Explore extensible processor

design space

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

No

Yes

OK?

ncheung@cse.unsw.edu.au


Previous work s
Previous Platform Works

  • Profiling/Identification

    • [Binh 1998], [Yang 2002], ARC, Xtensa

  • Design methodology for different aspects

    • [Gupta 2000], [Jain 2001]

  • Instruction generation/selection

    • [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao 2002]

  • Overall design flow for extensible processors

    • [Kathail 2002], [Lee 2002], [Sun 2002]

  • Vendor and academic

    • ARC, Lisatek, Xtensa, ASIP-Meister

ncheung@cse.unsw.edu.au


Outline3
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Goal of the research
Goal of the Research Platform

  • To automate the design flow of extensible processors platform.

  • Given an application and design constraints, the system configures an extensible processor that maximizes the performance of an application while satisfying the design constraints.

ncheung@cse.unsw.edu.au


Proposed solution

Application in C/C++ Platform

Application in C/C++

Compilation

Compilation

Analysis and Profiling

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

INSIDE system

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate extensible processor

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Explore extensible processor

design space

Explore extensible processor

design space

  • MINCE tool

  • Match code

  • segment to

  • instruction

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Proto-typing

Synthesis and tape out

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

No

No

Yes

Yes

OK?

OK?

Proposed Solution

ncheung@cse.unsw.edu.au


Inside mince
INSIDE / MINCE Platform

  • INSIDE system

    • Identifies sections of code segments which are suitable for translation to instructions.

    • A two-level hierarchy approach.

    • A performance estimator.

      Reduces the design turnaround time significantly.

  • MINCE tool

    • Match code segment with pre-synthesized extensible instruction using combinational equivalence.

      Enhances reusability of the extensible instructions.

ncheung@cse.unsw.edu.au


Outline4
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Inside system overview

Processor 1 Platform

Processor 2

Processor 3

Base

Processor

Base

Processor

FPU

Base

Processor

DSP

INSIDE system (Overview)

ncheung@cse.unsw.edu.au


Phase i heuristic algorithm part i
Phase I: Heuristic Algorithm (Part I) Platform

  • For selecting pre-configured processor

  • The area delay product, EPiof processor i for a certain application

ncheung@cse.unsw.edu.au


Phase ii identify code segments
Phase II: Identify Code Segments Platform

  • Problem:

    • Application has millions of lines of C code, how do we know which code segment is good for converting to extensible instruction

  • Fitting function

    • Identifies code segments which are suitable for translation to extensible instructions

    • Extracts characteristics of the code segment

ncheung@cse.unsw.edu.au


Fitting function
Fitting Function Platform

  • The four characteristics:

    • The frequency of use of a code segment

    • The number of operands in a code segment

    • The percentage of integer (short) type operands in all the operands

    • The percentage of bit operations in all the operands

  • The fitting function:

    α – the ideal number of operands in the code segment.

ncheung@cse.unsw.edu.au


Relationship
Relationship Platform

  • Relationship between the fitting function and the speedup/area ratio of the instruction

ncheung@cse.unsw.edu.au


Phase iii heuristic algorithm part ii
Phase III: Heuristic Algorithm (Part II) Platform

  • For selecting extensible instructions

  • The potential speedup/area ratio, PSAR, of extensible instruction in processor:

ncheung@cse.unsw.edu.au


Phase iv performance estimation
Phase IV: Performance Estimation Platform

  • For rapidly estimating the execution time

  • The execution time estimation, ETE, for an extensible processor with a set of selected extensible instruction:

ncheung@cse.unsw.edu.au


Experimental methodology
Experimental Methodology Platform

ncheung@cse.unsw.edu.au


Results
Results Platform

ncheung@cse.unsw.edu.au


Results design turnaround time
Results (Design Turnaround Time) Platform

ncheung@cse.unsw.edu.au


Results pareto points
Results (Pareto Points) Platform

ncheung@cse.unsw.edu.au


Results of inside system
Results of PlatformINSIDE system

  • Mediabench applications

  • Speedup of the applications:

    • On average 2.03x (up to 7x)

  • Hardware overheads:

    • On average 25% (up to 72%)

  • Pareto points:

    • Obtained on average 83% (up to 100%)

    • On average within 5% of the execution time

ncheung@cse.unsw.edu.au


Outline5
Outline Platform

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


The aim of the mince

Functional Platform

Equivalent

The Aim of the MINCE

  • Matching INstructions using Combinational Equivalence

  • Automatically matching code segments to pre-synthesized specific instructions.

// Application in C

int main() {

… …

// Code segment

for (x = 0; x < 100000; x++) {

tmp = a[x] * b[x] << 4;

total += tmp;

}

… …

}

// Pre-synthesized specific instructions

state total 32

iclass ei EI {out arr, in art, in ars} {in state}

reference EI {

wire [31;0] tmp

assign tmp = TIEmul(art, ars, 1’b0) >> 4;

assign arr = tmp + state;

}

ncheung@cse.unsw.edu.au


Motivation

  • MINCE Platform tool

  • Matching designed extensible instruction to the code segment

Motivation

Application in C/C++

Design Constraints (Performance, Area, Power Consumption)

Compilation

Analysis and Profiling

Identifying the “hotspots” of the application through simulation, profiling, and trace

Designed Extensible Instruction Library

Synthesizable RTL of base processor and extensible instructions

Designing extensible instructions for the “hotspots”

Explore extensible processor

design space

Testing/Verifying the functionality, speedup, area, power consumption of extensible instruction

Generate extensible processor

Selecting the extensible instructions based on the design constraints

Proto-typing

Synthesis and tape out

No

Yes

OK?

ncheung@cse.unsw.edu.au


Mince tool
MINCE Tool Platform

  • An automated tool for matching pre-synthesized extensible instruction to the functional equivalence of code segments using combinational equivalence checking in the extensible processors platform.

ncheung@cse.unsw.edu.au


Mince tool1

II Platform

MINCE Tool

  • MINCE consists:

    • A translator

    • A filtering algorithm

    • A combinational equivalence checking tool

ncheung@cse.unsw.edu.au


Related works
Related Works Platform

  • Simulation techniques:

    [Stadler 1999]

  • Graph matching techniques:

    [Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996]

  • Equivalence verifications:

    [Clarke 2003], [Pnueli 1998], [Semeria 2002]

ncheung@cse.unsw.edu.au


Our contributions
Our Contributions Platform

  • Enhances reusability of the extensible instructions.

  • MINCE tool is superior to computation-intensive and error-prone simulation approaches.

  • The usage of functional equivalence checking ensures that the results are largely independentof the programming style of the application.

ncheung@cse.unsw.edu.au


Phase i translator
Phase I: Translator Platform

  • The goal of the translator is to convert the application written in C/C++ to a set of code segment in Verilog HDL using a systematic approach.

  • To reduce the granularity of the application written in C/C++.

  • The translator consists of four steps:

    • Separate the application into code segments;

    • Compile code segments;

    • Convert to register transfer list;

    • Map to Verilog HDL file.

ncheung@cse.unsw.edu.au


Translator
Translator Platform

ncheung@cse.unsw.edu.au


Phase ii filtering algorithm
Phase II: Filtering Algorithm Platform

  • The goal of the filtering algorithm is to eliminate the unnecessary and complex Verilog HDL file into the combinational equivalence checking model.

  • Verilog HDL files can be pruned as non-match:

    • Differing number of ports;

    • Differing port sizes;

    • Insufficient number of base hardware modules to present complex module.

ncheung@cse.unsw.edu.au


Combinational equivalence checking
Combinational Equivalence Checking Platform

  • Binary Decision Diagram (BDD)

  • For example,

  • Using Verification Interfacing with Synthesis (VIS) which was jointly developed by the University of California, in Berkeley and the University of Colorado, Boulder.

ncheung@cse.unsw.edu.au


Results1
Results Platform

EM: Exact match

FE: Functional equivalence

DM: I/O match only

TW: Do not match

3

1

3

3

1

1

3

1

3

1

1

3

1

3

1

3

1

1

ncheung@cse.unsw.edu.au


Results2
Results Platform

ncheung@cse.unsw.edu.au


Future plan

INSIDE Platformsystem

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Future Plan

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate Extensible Instruction Automatically

  • MINCEtool

  • Match code

  • segment to

  • instruction

Proto-typing

Synthesis and tape out

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

Area, Latency, Power Model

No

Yes

OK?

ncheung@cse.unsw.edu.au


Conclusion
Conclusion Platform

  • Extensible processors platform enables to address three architectural levels in order to tune for application specific:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

  • The goal of the research is to automate the design flow of extensible processor platform.

  • INSIDE system / MINCE tool

ncheung@cse.unsw.edu.au