Design automation methodologies for extensible processors platform
Download
Sponsored Links
This presentation is the property of its rightful owner.
1 / 54

Design Automation Methodologies for Extensible Processors Platform PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Design Automation Methodologies for Extensible Processors Platform. Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia. Outline. System-On-Chips Design Challenges Extensible Processors Platform Background – Xtensa

Download Presentation

Design Automation Methodologies for Extensible Processors Platform

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design automation methodologies for extensible processors platform

Design Automation Methodologies for Extensible Processors Platform

Newton Cheung

School of Computer Science & Engineering

The University of New South Wales

Sydney, Australia


Outline

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Soc design challenges

SoC Design Challenges

  • Application functionality

  • Ever changing nature of embedded product

Increasing software content

Flexibility

  • Power consumption

  • Chip area

  • Cost

Customizing the architecture to the specific application (application domain)

Efficiency

ncheung@cse.unsw.edu.au


Flexibility vs efficiency energy

Flexibility vs. Efficiency (Energy)

Application Specific Integrated Circuit (ASIC)

500-1000 MIPS/mw

Application Specific Instruction-set Processor (ASIP)

50-100 MIPS/mw

Energy Efficiency

Flexibility

Flexibility

Domain Specific

Processor (DSP)

1-10 MIPS/mw

General Purpose Processor

0.1-1 MIPS/mw

Source: Fei Sun @ ICCAD 2002

ncheung@cse.unsw.edu.au


Flexibility vs efficiency performance

Flexibility vs. Efficiency (Performance)

Performance

ASIC

ASIP

FPGA

GPP

Flexibility

ncheung@cse.unsw.edu.au


Application specific processor

General Purpose Processor

Application Specific Processor

ncheung@cse.unsw.edu.au


Optimized processor design asip

Optimized Processor Design (ASIP)

  • Cost

    • Small size

  • Performance

    • Application extensions

  • Productivity

    • Rapid hardware

      and software

ncheung@cse.unsw.edu.au


Extensible processors platform

Extensible Processors Platform

  • Represents the state-of-the-art in application specific instruction-set processor (ASIP)

  • Consists of a base processor containing a base instruction set, plus the capability to customize their architecture to replace computationally intensive code segments

  • The goal of designing extensible processors is typically to maximize the performance of an application, while meeting design constraints

ncheung@cse.unsw.edu.au


Extensible processors platform1

I:Instruction Fetch

R:Instruction Decode / Register Fetch

E: Execute / Effective Address

M: Memory Access / Branch Complete

W: Write Back

Instruction ROM / RAM / Cache

Decode / General Registers

ALU / Address Generation

Data ROM / RAM / Cache

Write Back / Exception Resolution

Custom / Specific Instruction

Predefined Block Register (i.e. 64-bit)

Predefined Blocks (e.g. DSP – Digital Signal Processing, Floating-Point Unit)

Extensible Processors Platform

  • Enables to address three architectural levels on the base processor:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

ncheung@cse.unsw.edu.au


Outline1

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Background xtensa

Background (Xtensa)

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa architecture

Xtensa Architecture

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa custom instructions tie

Xtensa Custom Instructions (TIE)

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Xtensa s design flow

Xtensa’s Design Flow

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Extensible processor why

Extensible Processor (Why?)

ncheung@cse.unsw.edu.au


Example digital video

Example: Digital Video

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Example multiple processors for toe

Example: Multiple Processors for TOE

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Example voice gateway

Example: Voice Gateway

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Outline2

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Problems in automation

Design Expertise Required!!

Problems in Automation??

Source: www.tensilica.com

ncheung@cse.unsw.edu.au


Generic design flow of extensible processor

Generic Design Flow of Extensible Processor

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Generate extensible processor

Explore extensible processor

design space

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

No

Yes

OK?

ncheung@cse.unsw.edu.au


Previous work s

Previous Works

  • Profiling/Identification

    • [Binh 1998], [Yang 2002], ARC, Xtensa

  • Design methodology for different aspects

    • [Gupta 2000], [Jain 2001]

  • Instruction generation/selection

    • [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao 2002]

  • Overall design flow for extensible processors

    • [Kathail 2002], [Lee 2002], [Sun 2002]

  • Vendor and academic

    • ARC, Lisatek, Xtensa, ASIP-Meister

ncheung@cse.unsw.edu.au


Outline3

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Goal of the research

Goal of the Research

  • To automate the design flow of extensible processors platform.

  • Given an application and design constraints, the system configures an extensible processor that maximizes the performance of an application while satisfying the design constraints.

ncheung@cse.unsw.edu.au


Proposed solution

Application in C/C++

Application in C/C++

Compilation

Compilation

Analysis and Profiling

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

INSIDE system

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate extensible processor

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Explore extensible processor

design space

Explore extensible processor

design space

  • MINCE tool

  • Match code

  • segment to

  • instruction

Define/Select:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Proto-typing

Proto-typing

Synthesis and tape out

Synthesis and tape out

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Evaluate:

Compiler, Linker,

Assembler, Debugger, Instruction Set Simulator

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

No

No

Yes

Yes

OK?

OK?

Proposed Solution

ncheung@cse.unsw.edu.au


Inside mince

INSIDE / MINCE

  • INSIDE system

    • Identifies sections of code segments which are suitable for translation to instructions.

    • A two-level hierarchy approach.

    • A performance estimator.

      Reduces the design turnaround time significantly.

  • MINCE tool

    • Match code segment with pre-synthesized extensible instruction using combinational equivalence.

      Enhances reusability of the extensible instructions.

ncheung@cse.unsw.edu.au


Outline4

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


Inside system overview

Processor 1

Processor 2

Processor 3

Base

Processor

Base

Processor

FPU

Base

Processor

DSP

INSIDE system (Overview)

ncheung@cse.unsw.edu.au


Phase i heuristic algorithm part i

Phase I: Heuristic Algorithm (Part I)

  • For selecting pre-configured processor

  • The area delay product, EPiof processor i for a certain application

ncheung@cse.unsw.edu.au


Phase ii identify code segments

Phase II: Identify Code Segments

  • Problem:

    • Application has millions of lines of C code, how do we know which code segment is good for converting to extensible instruction

  • Fitting function

    • Identifies code segments which are suitable for translation to extensible instructions

    • Extracts characteristics of the code segment

ncheung@cse.unsw.edu.au


Fitting function

Fitting Function

  • The four characteristics:

    • The frequency of use of a code segment

    • The number of operands in a code segment

    • The percentage of integer (short) type operands in all the operands

    • The percentage of bit operations in all the operands

  • The fitting function:

    α – the ideal number of operands in the code segment.

ncheung@cse.unsw.edu.au


Relationship

Relationship

  • Relationship between the fitting function and the speedup/area ratio of the instruction

ncheung@cse.unsw.edu.au


Phase iii heuristic algorithm part ii

Phase III: Heuristic Algorithm (Part II)

  • For selecting extensible instructions

  • The potential speedup/area ratio, PSAR, of extensible instruction in processor:

ncheung@cse.unsw.edu.au


Phase iv performance estimation

Phase IV: Performance Estimation

  • For rapidly estimating the execution time

  • The execution time estimation, ETE, for an extensible processor with a set of selected extensible instruction:

ncheung@cse.unsw.edu.au


Experimental methodology

Experimental Methodology

ncheung@cse.unsw.edu.au


Results

Results

ncheung@cse.unsw.edu.au


Results design turnaround time

Results (Design Turnaround Time)

ncheung@cse.unsw.edu.au


Results pareto points

Results (Pareto Points)

ncheung@cse.unsw.edu.au


Results of inside system

Results of INSIDE system

  • Mediabench applications

  • Speedup of the applications:

    • On average 2.03x (up to 7x)

  • Hardware overheads:

    • On average 25% (up to 72%)

  • Pareto points:

    • Obtained on average 83% (up to 100%)

    • On average within 5% of the execution time

ncheung@cse.unsw.edu.au


Outline5

Outline

  • System-On-Chips Design Challenges

  • Extensible Processors Platform

  • Background – Xtensa

  • Problems in Extensible Processors Platform

  • The Goal of the research

  • Proposed Solution

  • INSIDE

  • MINCE

  • Conclusions

ncheung@cse.unsw.edu.au


The aim of the mince

Functional

Equivalent

The Aim of the MINCE

  • Matching INstructions using Combinational Equivalence

  • Automatically matching code segments to pre-synthesized specific instructions.

// Application in C

int main() {

… …

// Code segment

for (x = 0; x < 100000; x++) {

tmp = a[x] * b[x] << 4;

total += tmp;

}

… …

}

// Pre-synthesized specific instructions

state total 32

iclass ei EI {out arr, in art, in ars} {in state}

reference EI {

wire [31;0] tmp

assign tmp = TIEmul(art, ars, 1’b0) >> 4;

assign arr = tmp + state;

}

ncheung@cse.unsw.edu.au


Motivation

  • MINCE tool

  • Matching designed extensible instruction to the code segment

Motivation

Application in C/C++

Design Constraints (Performance, Area, Power Consumption)

Compilation

Analysis and Profiling

Identifying the “hotspots” of the application through simulation, profiling, and trace

Designed Extensible Instruction Library

Synthesizable RTL of base processor and extensible instructions

Designing extensible instructions for the “hotspots”

Explore extensible processor

design space

Testing/Verifying the functionality, speedup, area, power consumption of extensible instruction

Generate extensible processor

Selecting the extensible instructions based on the design constraints

Proto-typing

Synthesis and tape out

No

Yes

OK?

ncheung@cse.unsw.edu.au


Mince tool

MINCE Tool

  • An automated tool for matching pre-synthesized extensible instruction to the functional equivalence of code segments using combinational equivalence checking in the extensible processors platform.

ncheung@cse.unsw.edu.au


Mince tool1

II

MINCE Tool

  • MINCE consists:

    • A translator

    • A filtering algorithm

    • A combinational equivalence checking tool

ncheung@cse.unsw.edu.au


Related works

Related Works

  • Simulation techniques:

    [Stadler 1999]

  • Graph matching techniques:

    [Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996]

  • Equivalence verifications:

    [Clarke 2003], [Pnueli 1998], [Semeria 2002]

ncheung@cse.unsw.edu.au


Our contributions

Our Contributions

  • Enhances reusability of the extensible instructions.

  • MINCE tool is superior to computation-intensive and error-prone simulation approaches.

  • The usage of functional equivalence checking ensures that the results are largely independentof the programming style of the application.

ncheung@cse.unsw.edu.au


Phase i translator

Phase I: Translator

  • The goal of the translator is to convert the application written in C/C++ to a set of code segment in Verilog HDL using a systematic approach.

  • To reduce the granularity of the application written in C/C++.

  • The translator consists of four steps:

    • Separate the application into code segments;

    • Compile code segments;

    • Convert to register transfer list;

    • Map to Verilog HDL file.

ncheung@cse.unsw.edu.au


Translator

Translator

ncheung@cse.unsw.edu.au


Phase ii filtering algorithm

Phase II: Filtering Algorithm

  • The goal of the filtering algorithm is to eliminate the unnecessary and complex Verilog HDL file into the combinational equivalence checking model.

  • Verilog HDL files can be pruned as non-match:

    • Differing number of ports;

    • Differing port sizes;

    • Insufficient number of base hardware modules to present complex module.

ncheung@cse.unsw.edu.au


Combinational equivalence checking

Combinational Equivalence Checking

  • Binary Decision Diagram (BDD)

  • For example,

  • Using Verification Interfacing with Synthesis (VIS) which was jointly developed by the University of California, in Berkeley and the University of Colorado, Boulder.

ncheung@cse.unsw.edu.au


Results1

Results

EM: Exact match

FE: Functional equivalence

DM: I/O match only

TW: Do not match

3

1

3

3

1

1

3

1

3

1

1

3

1

3

1

3

1

1

ncheung@cse.unsw.edu.au


Results2

Results

ncheung@cse.unsw.edu.au


Future plan

INSIDEsystem

Define:

1) P.B.

2) E.I.

3) P.S.

Select:

1) P.B.

2) E.I.

3) P.S.

Future Plan

Application in C/C++

Compilation

Analysis and Profiling

Synthesizable RTL of:

Base processor

Predefined blocks

Extensible instructions

Instruction Library

Identify:

1) Predefined blocks

2) Extensible instructions

3) Parameter settings

Fitting Function:

- Finds suitable code segment

to implement as extensible

instruction

Use libraries of pre-configured processor and instructions to limit the design space

Generate extensible processor

Generate Extensible Instruction Automatically

  • MINCEtool

  • Match code

  • segment to

  • instruction

Proto-typing

Synthesis and tape out

Performance Estimation Model

1) Information from profiling

2) Synthesized information of

instruction

Area, Latency, Power Model

No

Yes

OK?

ncheung@cse.unsw.edu.au


Conclusion

Conclusion

  • Extensible processors platform enables to address three architectural levels in order to tune for application specific:

    • Inclusion/Exclusion of predefined blocks

    • Instructions extension

    • Parameterizations

  • The goal of the research is to automate the design flow of extensible processor platform.

  • INSIDE system / MINCE tool

ncheung@cse.unsw.edu.au


  • Login