flexcc2 an optimizing retargetable c compiler for dsp applications l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications PowerPoint Presentation
Download Presentation
FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications

Loading in 2 Seconds...

play fullscreen
1 / 20

FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications - PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on

FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications . V. Bertin, J-M.Daveau, P. Guillaume, D. Pilat, C.Robine, M. Santana, T. Théry FlexWare Embedded System Technology. Plan. Context Goals FlexCC2 architecture optimizations Results Conclusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
flexcc2 an optimizing retargetable c compiler for dsp applications

FlexCC2 : An Optimizing Retargetable C Compiler for DSP Applications

V. Bertin, J-M.Daveau, P. Guillaume, D. Pilat, C.Robine, M. Santana, T. Théry

FlexWare

Embedded System Technology

slide2
Plan
  • Context
  • Goals
  • FlexCC2
    • architecture
    • optimizations
  • Results
  • Conclusion
context industrial compiler
Context: Industrial Compiler
  • Specific instr./features to certain classes of applications.
  • Loop intensive.

Enabling technology for embedded processors

ASIP / AS-DSP

Digital imaging

MP3

Hard-disk

  • Performance located in small portions of critical code.
  • Productivity
  • Time-to-market
  • Retargetability

Embedded

System

Embedded

software

Mobile

goals
Goals
  • High-quality generated code
    • best in class for DSP compilers
    • eliminate any interest of ASM hand coding.
  • Irregular target architectures
    • encoding constraints
    • irregular instruction-level parallelism
    • register-set constraints
  • Specific instructions and features
    • hardware loops, multiply-accumulate, addressing modes, post-operations, … .
  • Short retargeting time
    • Shorten time-to-market for new processors
flexcc2 overall design
FlexCC2 Overall Design
  • Flexible compilation framework :
    • Easily add/remove generic/custom optimizations.
    • Re-order optimizations.
    • Retargetable compilation system.
  • Multi-level framework
    • Machine level optimizations.
    • Multi-level optimizations.
  • DSP oriented
    • Support for DSP datatypes and operations
    • High added-value DSP optimizations
flexcc2 architecture
FlexCC2 Architecture

register

allocator

HW

Loops

HW

Loops

post

operation

arT

High Level

IR

Low Level

IR

.c

.asm

code

generation

anc0

cse

lower

software

pipeliner

local

scheduler

global

scheduler

EDL

TDF

CGD

EDF

SDF

high level framework cosy
High-level framework: CoSy®

Front End

Back End

BEG

strength

chainflow

gra

CCMIR

.asm

.c

match

emit

anc0

sched

cse

lowering

engine

Engine

Description

List

CGD

Target

Description

File

Code

Generator

Description

EDL

TDF

specific dsp optimizations

Loops

+

arrays

Loops

+

pointers

Parallel evolution of

references

1 set = 1 pointer

Partitioning

Loop

analysis

Connivance

sets

Sets

manipulation

Pointers

generation

addressing

resources

&

operations

Induction Expressions (IEs)

ADDRESS {

Ax[1..6];

…}

OPERATIONS {

Ax:++;

Ax[1]+=2;

…}

Specific DSP optimizations

Array to pointer transformation

Support for hardware-do loops

Intrinsic functions recognition and replacement

  • Group access op. into families
  • Optimize address modes
  • Use index registers
  • Handle loop nesting
elixir back end infrastructure
EliXir Back-end Infrastructure

software

pipeliner

SDF

local

scheduler

post

operation

Machine

Description

microengines

dataflow API

EliXir API

Low Level

IR

register

allocator

global

scheduler

reg. alloc. API

scheduling API

soft. pipe. API

EDF

-engine

Chaining

dwarf

liveness

HW

Loops

engines flow
engines Flow

Register

Allocation

Liveness

Scheduler

Post-Op

Coalesce

Super

Blocker

Code Generator

Dataflow

Peephole

Hwloop

Scheduler

Output assembly file

Software

Pipelining

Post-Op

Dominator

Paths

Software

Pipelining

ASMdump

Pre-allocationoptimizations

Post-allocationoptimizations

register allocation framework

microengines

C++ classes

Register Allocation Framework

Conservative

Coalescer

Briggs

Allocator

Callahan

Allocator

allocation API

Briggs API

Spill Manager

/ Optimizer

Shuffle code

Manager

Targeting API

RegsetGroup

Interference Graph

Interference Graph

low level API

SSA

StackInfo

RegsetGroup

RegId

Dependencies

LoopTree

processor specific instr features
Processor Specific Instr./Features
  • Managed as target specific or generic
    • Intrinsics recognition and replacement.
    • Post operation, post increment.
  • Mainly handled by specific engine or engine.
  • Some optimizations require retargeting.
  • Make use of various EliXir APIs (dataflow graph, scheduling, …).
intrinsics recognition replacement

if(ab)

Then

Else

max = b

max = a

C

Instruction

Patterns

Graph

Pattern Matching

Intrinsics Recognition & Replacement

if(ab)

max = a;

else

max = b;

Control Flow Graph

Expression Trees

  • Complex expressions
  • Multi-statements

cmp r1,r2

move r2,r3

move if(ge),r1,r3

max r1,r2,r3

max = L_max(a, b);

Unoptimized ASM

Optimized ASM

dataflow peephole
Dataflow Peephole

rep L14, r5h

L12:

ldx_f ax1,r4h

ldx_f ax2,r1h

ldx_f axx1,r0h

L_fmul r0h,r1h,r3

dmv r4h,r1h

fmul r0h,r1h,r0h

X_deplsp r0h,r0

L_addsat r3,r0,r3

mea axx1,++#1

mea ax2,++#-1

mea ax1,++#-1

L14:

ldx_fax1,r4h

Dataflow Graph

dmvr4h,r1h

meaax1,++#-1

Def-Use

Dataflow

Instruction

Patterns

Graph

Pattern Matching

Liveness

ldx_fax1--,r4h

retargeting flexcc2
Retargeting FlexCC2

Machine

description

Code generation

rules

SDF

CGD

Engine

flow

µ-engine

flow

EDL

EDF

C++ API

BEG

High

level IR

Code

generation

engines

Low

level

IR

Lower

Intr.

patterns

Lowered IR

(µ-) engines

results
Results
  • MMDSP+ single MAC DSP core.
  • Retargeting time  4 months.
  • ETSI Enhanced Full Rate benchmark (EFR).
results17
Results

CoSy

EliXir + HWLoops + arT

Software pipelining + post-op

Register Allocation

original research work
Original research work
  • FlexCC2 includes advanced in-house research work
    • arT / GarT.
    • flexible back-end infrastructure
    • retargetable register allocation for irregular architecture.
    • retargetable dataflow peephole optimizer.
    • automatic intrinsic functions recognition.
    • MMX optimization using pattern matching
future work
Future work
  • Inter procedural optimizations.
  • Aliasing.
  • Memory placement.
  • MMX optimization using pattern matching.
  • Interaction between scheduling and register allocation.
  • Improved retargetability.
conclusion
Conclusion
  • Keystone for embedded software development
    • Synthesizing application code into processor I/S
      • Exploiting processor features
      • Optimizing code and resource usage
    • Driving processor architecture evolution
  • Modular and extendible compiler framework
    • At high and low level.
    • State of the art optimizations.
    • Advanced DSP optimizations.
    • Target specific optimizations.
  • Short retargeting time.
  • Perspectives: compiler as a CAD tool for SoCs