A parameterized dataflow language extension for embedded streaming systems
Download
1 / 30

A Parameterized Dataflow Language Extension for Embedded Streaming Systems - PowerPoint PPT Presentation


  • 159 Views
  • Uploaded on

A Parameterized Dataflow Language Extension for Embedded Streaming Systems. Yuan Lin 1 , Yoonseo Choi 1 , Scott Mahlke 1 , Trevor Mudge 1 , Chaitali Chakrabarti 2 1 Advanced Computer Architecture Lab, University of Michigan at Ann Arbor

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Parameterized Dataflow Language Extension for Embedded Streaming Systems' - arabella


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A parameterized dataflow language extension for embedded streaming systems l.jpg

A Parameterized Dataflow Language Extension for Embedded Streaming Systems

Yuan Lin1, Yoonseo Choi1, Scott Mahlke1, Trevor Mudge1, Chaitali Chakrabarti2

1Advanced Computer Architecture Lab, University of Michigan at Ann Arbor

2Department of Electrical Engineering, Arizona State University


Embedded streaming systems l.jpg
Embedded Streaming Systems Streaming Systems

  • Mobile computing: multimedia anywhere at anytime

  • Many of its key workloads are embedded streaming systems

    • Video/audio coding (i.e. H.264)

    • Wireless communications (i.e. W-CDMA)

    • 3D graphics

Cell phones are getting more complex

PCs are getting more mobile


Characteristics of streaming systems l.jpg
Characteristics of Streaming Systems Streaming Systems

  • Data are processed in a pipeline of DSP algorithm kernels

  • Mostly vector/matrix-based data computation

  • Periodic system reconfigurations

    • i.e. changing from voice communication to data communication

LPF-Tx

Scrambler

Spreader

Interleaver

Channel

encoder

W-CDMA Physical Layer Processing

Searcher

Transmitter

LPF-Rx

Channel

decoder

(Viterbi/Turbo)

LPF-Tx

Scrambler

Spreader

Interleaver

Channel

encoder

Descrambler

Despreader

Interleaver

Combiner

Descrambler

Despreader

Receiver

Analog

Searcher

Upper layer

Channel

decoder

(Viterbi/Turbo)

Descrambler

Despreader

LPF-Rx

Interleaver

Combiner

Descrambler

Despreader


Embedded dsp processors l.jpg
Embedded DSP Processors Streaming Systems

  • Current trend: multi-core DSPs for streaming applications

    • IBM Cell processor

    • TI OMAP

    • Many other SoCs

  • Common hardware characteristics

    • Multiple (potentially heterogeneous) data engines (DEs)

    • Software-managed scratchpad memories

    • Explicit DMA transfer operations

Global

Mem

ARM

Our DSP case study:

SODA, a multi-core DSP processor

Local

Mem

Local

Mem

Local

Mem

Local

Mem

DE

DE

DE

DE

SIMD

Unit

SIMD

Unit

SIMD

Unit

SIMD

Unit


Programming challenge l.jpg
Programming Challenge Streaming Systems

  • How to automatically compile streaming systems onto multi-core DSP hardware?

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

?

Global

Mem

ARM

How to divide the system into multiple threads?

VLIW execution scheduling?

When and where to issue DMA transfers?

Local

Mem

Local

Mem

Local

Mem

Local

Mem

How to manage the local and global memory?

How to SIMDize DSP kernels?

DE

DE

DE

DE

Who does the execution scheduling?

SIMD

Unit

SIMD

Unit

SIMD

Unit

SIMD

Unit

and many other problems….


Compile for multi core dsps l.jpg
Compile for Multi-core DSPs Streaming Systems

  • Two-tier compilation approach

SIMD

Data

MEM

32-lane

SIMD

ALU

Transmitter

void Turbo()

{

...

}

void Turbo()

{

...

}

LPF-Tx

Scrambler

Spreader

Interleaver

Channel

encoder

Receiver

SIMD

RF

Frontend

Upper layer

Searcher

E

X

32-lane

SSN

W

B

Channel

decoder

(Viterbi/Turbo)

Descrambler

Despreader

LPF-Rx

Interleaver

Combiner

Descrambler

Despreader

Global

Mem

SIMD

to scalar

SODA

System Architecture

ARM

  • This presentation is focused on system-level language & compilation

  • Compiling functions, not instructions

S

TV

V

TS

Scalar

Data

MEM

SIMD

Local

Mem

Local

Mem

Local

Mem

Local

Mem

scalar

RF

16-bit

ALU

E

X

W

B

PE

PE

PE

PE

Exe

Unit

Exe

Unit

Exe

Unit

Exe

Unit

Scalar


System compilation overview l.jpg
System Compilation Overview Streaming Systems

SPEX

  • Coarse-grained compilation

    • Function-level, not instruction-level

    • C/C++-to-C compiler

  • SPEX: Signal Processing EXtension

    • Our high-level language extension

  • Frontend compilation

    • Translate from SPEX into SPIR

  • SPIR: Signal Processing IR

    • System compiler’s IR

    • Models function-level interactions

  • Backend compilation

    • Function-level compilation

    • Generate multi-threaded C code

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

DE0

ARM


System compilation overview8 l.jpg
System Compilation Overview Streaming Systems

SPEX

  • Coarse-grained compilation

    • Function-level, not instruction-level

    • C/C++-to-C compiler

  • SPEX: Signal Processing EXtension

    • Our high-level language extension

  • Frontend compilation

    • Translate from SPEX into SPIR

  • SPIR: Signal Processing IR

    • System compiler’s IR

    • Models function-level interactions

  • Backend compilation

    • Function-level compilation

    • Generate multi-threaded C code

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

DE0

ARM


Spir function level ir l.jpg
SPIR: Function-level IR Streaming Systems

SPEX

  • Must captures stream applications’ system-level behaviors

  • Based on the dataflow computation model

    • Good for modeling streaming computations

    • Easy to generate parallel code

  • But which dataflow model?

----

----

----

----

----

----

----

----

----

node

----

----

-----

----

----

FIFO buffer

FIFO buffer

node

node

FIFO buffer

Frontend

SPIR

Backend

PE0

ARM


Synchronous dataflow l.jpg
Synchronous Dataflow Streaming Systems

  • Synchronous dataflow (SDF)

    • Simplest dataflow model

    • Static dataflow

    • No conditional dataflow allowed

  • Pros

    • Efficiency: can generate execution schedule during compile-time

    • Optimality: We know how to compile SDFs for multi-processor DSPs

      • Berkeley Ptolemy project, MIT StreamIt compiler

  • Cons

    • Lack of flexibility: Cannot describe run-time reconfigurations in stream computations

input_rate = 2

output_rate = 3

node


Parameterized dataflow l.jpg
Parameterized Dataflow Streaming Systems

  • Parameterized dataflow (PDF)

    • Use parameters to model run-time system reconfiguration

    • Each parameter is a variable with a finite set of discrete values

  • Parameterized attributes in SPIR

    • Dataflow rates

First proposed by: B. Bhattacharya and S. S. Bbhattacharyya, “Parameterized Dataflow Modeling for DSP Systems.” IEEE Transactions on Signal Processing, Oct. 2001

input_rate = {1, 4, 8}

output_rate = {2, 8}

node


Parameterized dataflow12 l.jpg
Parameterized Dataflow Streaming Systems

  • Parameterized dataflow (PDF)

    • Use parameters to model run-time system reconfiguration

    • Each parameter is a variable with a finite set of discrete values

  • Parameterized attributes in SPIR

    • Dataflow rates

    • Conditional dataflow

if_cond = {true, false}

if

node

{1,4,8}

{2,8}

IF

IF

{2,4}

{6,8}

else

node


Parameterized dataflow13 l.jpg
Parameterized Dataflow Streaming Systems

  • Parameterized dataflow (PDF)

    • Use parameters to model run-time system reconfiguration

    • Each parameter is a variable with a finite set of discrete values

  • Parameterized attributes in SPIR

    • Dataflow rates

    • Conditional dataflow

    • Number of dataflow actors

A[0]

A[1]

split

merge

A[n]

Number of A nodes = {1, 4, 12}


Parameterized dataflow14 l.jpg
Parameterized Dataflow Streaming Systems

  • Parameterized dataflow (PDF)

    • Use parameters to model run-time system reconfiguration

    • Each parameter is a variable with a finite set of discrete values

  • Parameterized attributes in SPIR

    • Dataflow rates

    • Conditional dataflow

    • Number of dataflow actors

    • Streaming size between reconfigurations

  • There are also other modifications to the dataflow model

    • Please refer to the paper for further details

stream_size = {10k, 20k}


Pdf run time execution model l.jpg
PDF Run-time Execution Model Streaming Systems

  • Three stage run-time execution model

  • Goal: provide the efficiency of the synchronous dataflow execution on parameterized dataflow


Pdf run time execution model16 l.jpg
PDF Run-time Execution Model Streaming Systems

  • Stage 1: dataflow initialization

  • Convert a PDF graph into a SDF graph

    • Setting parameter variables to constant values

  • Perform other initialization computation


Pdf run time execution model17 l.jpg
PDF Run-time Execution Model Streaming Systems

  • Stage 2: dataflow computation

  • Dataflow computation following static SDF execution schedules

Stream input

Stream output


Pdf run time execution model18 l.jpg
PDF Run-time Execution Model Streaming Systems

  • Stage 3: dataflow finalization

  • Update the dataflow states with calculated results


System compilation frontend l.jpg
System Compilation Frontend Streaming Systems

SPEX

  • Start from a stream system described in C or C++ with SPEX

  • Translate the description into dataflow representation

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

PE0

ARM


Slide20 l.jpg
SPEX Streaming Systems

SPEX

  • Q: Why can’t we compile pure C/C++?

  • A: Some of C/C++’s language features cannot be translated into dataflow

  • i.e. passing pointers as function arguments

    • C/C++: pointer’s memory locations can be read and written

    • Dataflow: can have read-only and write-only edges

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

PE0

ARM


Slide21 l.jpg
SPEX Streaming Systems

  • SPEX is a set of keywords and language restrictions

  • A guideline for programmers to write stylized C/C++ code that can be translated into dataflow

    • Dataflow-safe C/C++ programming

  • SPEX code can be compiled directly with g++

#include <spex_stream.h> SPEX definition headers

class WCDMA: spex_kernel{

pdf_node(interleaver)(...) { ... }  Functions for declaring dataflow nodes

pdf_node(turbo_dec)(...) { ... }

pdf_graph(wcdma_rec)()  Functions for declaring a dataflow graph {

...interleaver(intlv_to_turbo, intlv_in);

turbo_dec(turbo_out, intlv_to_turbo);

... }

};


Spex pdf node code snippets l.jpg
SPEX Streaming Systemspdf_node Code Snippets

Read-only input dataflow edge

Write-only output dataflow edge

pdf_node(fir)(channel<int> in, channel<int> & out){  ...

z[0] = in.pop();  for (i = 0; i < TAPS; i++) {    sum += z[i] * coeff[i];

  }

out.push(sum);

...}

FIR’s dataflow input

FIR’s dataflow output


Spex code snippets l.jpg
SPEX Code Snippets Streaming Systems

pdf_graph(WCDMA_rec)() {

FIR fir;  ...

channel<int> fir_to_rake;

...pdf {   for (i = 0; i < slot_size; i++) {

fir.run(fir_to_rake, AtoD);

rake.run(rake_out, fir_to_rake);

if (mode == voice)

viterbi.run(mac_in, rake_out);

else

turbo.run(mac_in, rake_out);

mac(mac_in);

    }} }

pdf_graph_init(WCDMA_rec)() { ... }

pdf_graph_final(WCDMA_rec)() { ... }

Static PDF node and edge declarations

PDF scope: a PDF graph description.

Language restrictions within PDF scope.

i.e.

- Must only use for-loop constructions with constant loop-bounds

- Must only include function calls to pdf_node functions.

A guideline for writing dataflow-safe C++ code

Descriptions for dataflow initialization and finalization stages

vit

fir

rake

if

if

mac

tur


System compilation frontend24 l.jpg
System Compilation Frontend Streaming Systems

  • Translate SPEX into parameterized dataflow representation

    • Use traditional control-flow and dataflow analysis

  • Semantic error-checking to ensure dataflow-safe C/C++ code

  • Possible to support other high-level languages

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

PE0

ARM


System compilation backend l.jpg
System Compilation Backend Streaming Systems

  • Function-level compilation

    • Node-to-DE assignments

    • Memory buffer allocations

    • DMA assignments

  • Function-level optimizations

    • Software pipelining

  • Code generation

    • Parallel thread generation

    • Physical buffer allocation

    • If-conversion and predicate propagation

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

PE0

ARM


Conclusion l.jpg
Conclusion Streaming Systems

  • System-level compilation framework

  • We have a working compiler for SPEX

    • Target: SODA-like multi-core DSPs

  • Parameterized dataflow is used as compiler IR

  • SPEX is a set of language extensions for efficient translation from C/C++ into dataflow

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

SPIR

Backend

DE0

ARM


Questions l.jpg
Questions Streaming Systems

  • www.eecs.umich.edu/~sdrg


Shared variables in dataflow l.jpg
Shared Variables In Dataflow Streaming Systems

  • Shared variables are not allowed in traditional dataflow models

  • SPIR allows shared variables between dataflow nodes

    • Multi-dimensional streaming patterns

    • Non-sequential streaming patterns

    • Decoupled streaming

    • Shared memory buffers


Backend compilation l.jpg
Backend Compilation Streaming Systems

  • Problem with function-level compilation

    • Requires function-level parallelism

    • Wireless protocols do not have many concurrent functions

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

in[0..N]

PE0

SPIR

PE1

PE2

FIR

FIR

Rake

Rake

Backend

Turbo

PE0

Turbo

ARM


Backend compilation30 l.jpg
Backend Compilation Streaming Systems

  • Utilize existing compiler optimization

  • Function-level software pipelining

    • Processing each stream data is the same as a loop iteration

    • Modulo scheduling applied to function-level compilation

----

----

----

----

----

----

----

----

----

----

----

-----

----

----

Frontend

in[i]

SPIR

FIR

in[i+1]

Rake

FIR

in[i+2]

PE0

PE1

PE2

Turbo

FIR

FIR

Rake

Turbo

Rake

Backend

PE0

Turbo

Rake

ARM

Turbo


ad