Design automation methodologies for extensible processors platform
This presentation is the property of its rightful owner.
Sponsored Links
1 / 54

Design Automation Methodologies for Extensible Processors Platform PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

Design Automation Methodologies for Extensible Processors Platform. Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia. Outline. System-On-Chips Design Challenges Extensible Processors Platform Background – Xtensa

Download Presentation

Design Automation Methodologies for Extensible Processors Platform

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design automation methodologies for extensible processors platform

Design Automation Methodologies for Extensible Processors Platform

Newton Cheung

School of Computer Science & Engineering

The University of New South Wales

Sydney, Australia


Outline

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


Soc design challenges

SoC Design Challenges

  • Application functionality

  • Ever changing nature of embedded product

Increasing software content

Flexibility

  • Power consumption

  • Chip area

  • Cost

Customizing the architecture to the specific application (application domain)

Efficiency

[email protected]


Flexibility vs efficiency energy

Flexibility vs. Efficiency (Energy)

Application Specific Integrated Circuit (ASIC)

500-1000 MIPS/mw

Application Specific Instruction-set Processor (ASIP)

50-100 MIPS/mw

Energy Efficiency

Flexibility

Flexibility

Domain Specific

Processor (DSP)

1-10 MIPS/mw

General Purpose Processor

0.1-1 MIPS/mw

Source: Fei Sun @ ICCAD 2002

[email protected]


Flexibility vs efficiency performance

Flexibility vs. Efficiency (Performance)

Performance

ASIC

ASIP

FPGA

GPP

Flexibility

[email protected]


Application specific processor

General Purpose Processor

Application Specific Processor

[email protected]


Optimized processor design asip

Optimized Processor Design (ASIP)

  • Cost

    • Small size

  • Performance

    • Application extensions

  • Productivity

    • Rapid hardware

      and software

[email protected]


Extensible processors platform

Extensible Processors Platform

  • Represents the state-of-the-art in application specific instruction-set processor (ASIP)

  • Consists of a base processor containing a base instruction set, plus the capability to customize their architecture to replace computationally intensive code segments

  • The goal of designing extensible processors is typically to maximize the performance of an application, while meeting design constraints

[email protected]


Extensible processors platform1

I:Instruction Fetch

R:Instruction Decode / Register Fetch

E: Execute / Effective Address

M: Memory Access / Branch Complete

W: Write Back

Instruction ROM / RAM / Cache

Decode / General Registers

ALU / Address Generation

Data ROM / RAM / Cache

Write Back / Exception Resolution

Custom / Specific Instruction

Predefined Block Register (i.e. 64-bit)

Predefined Blocks (e.g. DSP – Digital Signal Processing, Floating-Point Unit)

Extensible Processors Platform

  • Enables to address three architectural levels on the base processor:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

[email protected]


Outline1

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


Background xtensa

Background (Xtensa)

Source: www.tensilica.com

[email protected]


Xtensa architecture

Xtensa Architecture

Source: www.tensilica.com

[email protected]


Xtensa custom instructions tie

Xtensa Custom Instructions (TIE)

Source: www.tensilica.com

[email protected]


Xtensa s design flow

Xtensa’s Design Flow

Source: www.tensilica.com

[email protected]


Extensible processor why

Extensible Processor (Why?)

[email protected]


Example digital video

Example: Digital Video

Source: www.tensilica.com

[email protected]


Example multiple processors for toe

Example: Multiple Processors for TOE

Source: www.tensilica.com

[email protected]


Example voice gateway

Example: Voice Gateway

Source: www.tensilica.com

[email protected]


Outline2

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


Problems in automation

Design Expertise Required!!

Problems in Automation??

Source: www.tensilica.com

[email protected]


Generic design flow of extensible processor

Generic Design Flow of Extensible Processor

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Generate extensible processor

Explore extensible processor

design space

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

No

Yes

OK?

[email protected]


Previous work s

Previous Works

  • Profiling/Identification

    • [Binh 1998], [Yang 2002], ARC, Xtensa

  • Design methodology for different aspects

    • [Gupta 2000], [Jain 2001]

  • Instruction generation/selection

    • [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao 2002]

  • Overall design flow for extensible processors

    • [Kathail 2002], [Lee 2002], [Sun 2002]

  • Vendor and academic

    • ARC, Lisatek, Xtensa, ASIP-Meister

[email protected]


Outline3

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


Goal of the research

Goal of the Research

  • To automate the design flow of extensible processors platform.

  • Given an application and design constraints, the system configures an extensible processor that maximizes the performance of an application while satisfying the design constraints.

[email protected]


Proposed solution

Application in C/C++

Application in C/C++

Compilation

Compilation

Analysis and Profiling

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

INSIDE system

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate extensible processor

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Explore extensible processor

design space

Explore extensible processor

design space

  • MINCE tool

  • Match code

  • segment to

  • instruction

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Proto-typing

Synthesis and tape out

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

No

No

Yes

Yes

OK?

OK?

Proposed Solution

[email protected]


Inside mince

INSIDE / MINCE

  • INSIDE system

    • Identifies sections of code segments which are suitable for translation to instructions.

    • A two-level hierarchy approach.

    • A performance estimator.

      Reduces the design turnaround time significantly.

  • MINCE tool

    • Match code segment with pre-synthesized extensible instruction using combinational equivalence.

      Enhances reusability of the extensible instructions.

[email protected]


Outline4

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


Inside system overview

Processor 1

Processor 2

Processor 3

Base

Processor

Base

Processor

FPU

Base

Processor

DSP

INSIDE system (Overview)

[email protected]


Phase i heuristic algorithm part i

Phase I: Heuristic Algorithm (Part I)

  • For selecting pre-configured processor

  • The area delay product, EPiof processor i for a certain application

[email protected]


Phase ii identify code segments

Phase II: Identify Code Segments

  • Problem:

    • Application has millions of lines of C code, how do we know which code segment is good for converting to extensible instruction

  • Fitting function

    • Identifies code segments which are suitable for translation to extensible instructions

    • Extracts characteristics of the code segment

[email protected]


Fitting function

Fitting Function

  • The four characteristics:

    • The frequency of use of a code segment

    • The number of operands in a code segment

    • The percentage of integer (short) type operands in all the operands

    • The percentage of bit operations in all the operands

  • The fitting function:

    α – the ideal number of operands in the code segment.

[email protected]


Relationship

Relationship

  • Relationship between the fitting function and the speedup/area ratio of the instruction

[email protected]


Phase iii heuristic algorithm part ii

Phase III: Heuristic Algorithm (Part II)

  • For selecting extensible instructions

  • The potential speedup/area ratio, PSAR, of extensible instruction in processor:

[email protected]


Phase iv performance estimation

Phase IV: Performance Estimation

  • For rapidly estimating the execution time

  • The execution time estimation, ETE, for an extensible processor with a set of selected extensible instruction:

[email protected]


Experimental methodology

Experimental Methodology

[email protected]


Results

Results

[email protected]


Results design turnaround time

Results (Design Turnaround Time)

[email protected]


Results pareto points

Results (Pareto Points)

[email protected]


Results of inside system

Results of INSIDE system

  • Mediabench applications

  • Speedup of the applications:

    • On average 2.03x (up to 7x)

  • Hardware overheads:

    • On average 25% (up to 72%)

  • Pareto points:

    • Obtained on average 83% (up to 100%)

    • On average within 5% of the execution time

[email protected]


Outline5

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

[email protected]


The aim of the mince

Functional

Equivalent

The Aim of the MINCE

  • Matching INstructions using Combinational Equivalence

  • Automatically matching code segments to pre-synthesized specific instructions.

// Application in C

int main() {

… …

// Code segment

for (x = 0; x < 100000; x++) {

tmp = a[x] * b[x] << 4;

total += tmp;

}

… …

}

// Pre-synthesized specific instructions

state total 32

iclass ei EI {out arr, in art, in ars} {in state}

reference EI {

wire [31;0] tmp

assign tmp = TIEmul(art, ars, 1’b0) >> 4;

assign arr = tmp + state;

}

[email protected]


Motivation

  • MINCE tool

  • Matching designed extensible instruction to the code segment

Motivation

Application in C/C++

Design Constraints (Performance, Area, Power Consumption)

Compilation

Analysis and Profiling

Identifying the “hotspots” of the application through simulation, profiling, and trace

Designed Extensible Instruction Library

Synthesizable RTL of base processor and extensible instructions

Designing extensible instructions for the “hotspots”

Explore extensible processor

design space

Testing/Verifying the functionality, speedup, area, power consumption of extensible instruction

Generate extensible processor

Selecting the extensible instructions based on the design constraints

Proto-typing

Synthesis and tape out

No

Yes

OK?

[email protected]


Mince tool

MINCE Tool

  • An automated tool for matching pre-synthesized extensible instruction to the functional equivalence of code segments using combinational equivalence checking in the extensible processors platform.

[email protected]


Mince tool1

II

MINCE Tool

  • MINCE consists:

    • A translator

    • A filtering algorithm

    • A combinational equivalence checking tool

[email protected]


Related works

Related Works

  • Simulation techniques:

    [Stadler 1999]

  • Graph matching techniques:

    [Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996]

  • Equivalence verifications:

    [Clarke 2003], [Pnueli 1998], [Semeria 2002]

[email protected]


Our contributions

Our Contributions

  • Enhances reusability of the extensible instructions.

  • MINCE tool is superior to computation-intensive and error-prone simulation approaches.

  • The usage of functional equivalence checking ensures that the results are largely independentof the programming style of the application.

[email protected]


Phase i translator

Phase I: Translator

  • The goal of the translator is to convert the application written in C/C++ to a set of code segment in Verilog HDL using a systematic approach.

  • To reduce the granularity of the application written in C/C++.

  • The translator consists of four steps:

    • Separate the application into code segments;

    • Compile code segments;

    • Convert to register transfer list;

    • Map to Verilog HDL file.

[email protected]


Translator

Translator

[email protected]


Phase ii filtering algorithm

Phase II: Filtering Algorithm

  • The goal of the filtering algorithm is to eliminate the unnecessary and complex Verilog HDL file into the combinational equivalence checking model.

  • Verilog HDL files can be pruned as non-match:

    • Differing number of ports;

    • Differing port sizes;

    • Insufficient number of base hardware modules to present complex module.

[email protected]


Combinational equivalence checking

Combinational Equivalence Checking

  • Binary Decision Diagram (BDD)

  • For example,

  • Using Verification Interfacing with Synthesis (VIS) which was jointly developed by the University of California, in Berkeley and the University of Colorado, Boulder.

[email protected]


Results1

Results

EM: Exact match

FE: Functional equivalence

DM: I/O match only

TW: Do not match

3

1

3

3

1

1

3

1

3

1

1

3

1

3

1

3

1

1

[email protected]


Results2

Results

[email protected]


Future plan

INSIDEsystem

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Future Plan

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate Extensible Instruction Automatically

  • MINCEtool

  • Match code

  • segment to

  • instruction

Proto-typing

Synthesis and tape out

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

Area, Latency, Power Model

No

Yes

OK?

[email protected]


Conclusion

Conclusion

  • Extensible processors platform enables to address three architectural levels in order to tune for application specific:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

  • The goal of the research is to automate the design flow of extensible processor platform.

  • INSIDE system / MINCE tool

[email protected]


  • Login