1 / 30

A Parameterized Dataflow Language Extension for Embedded Streaming Systems

A Parameterized Dataflow Language Extension for Embedded Streaming Systems. Yuan Lin 1 , Yoonseo Choi 1 , Scott Mahlke 1 , Trevor Mudge 1 , Chaitali Chakrabarti 2 1 Advanced Computer Architecture Lab, University of Michigan at Ann Arbor

chaim
Download Presentation

A Parameterized Dataflow Language Extension for Embedded Streaming Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Parameterized Dataflow Language Extension for Embedded Streaming Systems Yuan Lin1, Yoonseo Choi1, Scott Mahlke1, Trevor Mudge1, Chaitali Chakrabarti2 1Advanced Computer Architecture Lab, University of Michigan at Ann Arbor 2Department of Electrical Engineering, Arizona State University

  2. Embedded Streaming Systems • Mobile computing: multimedia anywhere at anytime • Many of its key workloads are embedded streaming systems • Video/audio coding (i.e. H.264) • Wireless communications (i.e. W-CDMA) • 3D graphics • and others… Cell phones are getting more complex PCs are getting more mobile

  3. Characteristics of Streaming Systems • Data are processed in a pipeline of DSP algorithm kernels • Mostly vector/matrix-based data computation • Periodic system reconfigurations • i.e. changing from voice communication to data communication LPF-Tx Scrambler Spreader Interleaver Channel encoder W-CDMA Physical Layer Processing Searcher Transmitter LPF-Rx Channel decoder (Viterbi/Turbo) LPF-Tx Scrambler Spreader Interleaver Channel encoder Descrambler Despreader Interleaver Combiner Descrambler Despreader Receiver Analog Searcher Upper layer Channel decoder (Viterbi/Turbo) Descrambler Despreader LPF-Rx Interleaver Combiner Descrambler Despreader

  4. Embedded DSP Processors • Current trend: multi-core DSPs for streaming applications • IBM Cell processor • TI OMAP • Many other SoCs • Common hardware characteristics • Multiple (potentially heterogeneous) data engines (DEs) • Software-managed scratchpad memories • Explicit DMA transfer operations Global Mem ARM Our DSP case study: SODA, a multi-core DSP processor Local Mem Local Mem Local Mem Local Mem DE DE DE DE SIMD Unit SIMD Unit SIMD Unit SIMD Unit

  5. Programming Challenge • How to automatically compile streaming systems onto multi-core DSP hardware? ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- ? Global Mem ARM How to divide the system into multiple threads? VLIW execution scheduling? When and where to issue DMA transfers? Local Mem Local Mem Local Mem Local Mem How to manage the local and global memory? How to SIMDize DSP kernels? DE DE DE DE Who does the execution scheduling? SIMD Unit SIMD Unit SIMD Unit SIMD Unit and many other problems….

  6. Compile for Multi-core DSPs • Two-tier compilation approach SIMD Data MEM 32-lane SIMD ALU Transmitter void Turbo() { ... } void Turbo() { ... } LPF-Tx Scrambler Spreader Interleaver Channel encoder Receiver SIMD RF Frontend Upper layer Searcher E X 32-lane SSN W B Channel decoder (Viterbi/Turbo) Descrambler Despreader LPF-Rx Interleaver Combiner Descrambler Despreader Global Mem SIMD to scalar SODA System Architecture ARM • This presentation is focused on system-level language & compilation • Compiling functions, not instructions S TV V TS Scalar Data MEM SIMD Local Mem Local Mem Local Mem Local Mem scalar RF 16-bit ALU E X W B PE PE PE PE Exe Unit Exe Unit Exe Unit Exe Unit Scalar

  7. System Compilation Overview SPEX • Coarse-grained compilation • Function-level, not instruction-level • C/C++-to-C compiler • SPEX: Signal Processing EXtension • Our high-level language extension • Frontend compilation • Translate from SPEX into SPIR • SPIR: Signal Processing IR • System compiler’s IR • Models function-level interactions • Backend compilation • Function-level compilation • Generate multi-threaded C code ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend DE0 ARM

  8. System Compilation Overview SPEX • Coarse-grained compilation • Function-level, not instruction-level • C/C++-to-C compiler • SPEX: Signal Processing EXtension • Our high-level language extension • Frontend compilation • Translate from SPEX into SPIR • SPIR: Signal Processing IR • System compiler’s IR • Models function-level interactions • Backend compilation • Function-level compilation • Generate multi-threaded C code ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend DE0 ARM

  9. SPIR: Function-level IR SPEX • Must captures stream applications’ system-level behaviors • Based on the dataflow computation model • Good for modeling streaming computations • Easy to generate parallel code • But which dataflow model? ---- ---- ---- ---- ---- ---- ---- ---- ---- node ---- ---- ----- ---- ---- FIFO buffer FIFO buffer node node FIFO buffer Frontend SPIR Backend PE0 ARM

  10. Synchronous Dataflow • Synchronous dataflow (SDF) • Simplest dataflow model • Static dataflow • No conditional dataflow allowed • Pros • Efficiency: can generate execution schedule during compile-time • Optimality: We know how to compile SDFs for multi-processor DSPs • Berkeley Ptolemy project, MIT StreamIt compiler • Cons • Lack of flexibility: Cannot describe run-time reconfigurations in stream computations input_rate = 2 output_rate = 3 node

  11. Parameterized Dataflow • Parameterized dataflow (PDF) • Use parameters to model run-time system reconfiguration • Each parameter is a variable with a finite set of discrete values • Parameterized attributes in SPIR • Dataflow rates First proposed by: B. Bhattacharya and S. S. Bbhattacharyya, “Parameterized Dataflow Modeling for DSP Systems.” IEEE Transactions on Signal Processing, Oct. 2001 input_rate = {1, 4, 8} output_rate = {2, 8} node

  12. Parameterized Dataflow • Parameterized dataflow (PDF) • Use parameters to model run-time system reconfiguration • Each parameter is a variable with a finite set of discrete values • Parameterized attributes in SPIR • Dataflow rates • Conditional dataflow if_cond = {true, false} if node {1,4,8} {2,8} IF IF {2,4} {6,8} else node

  13. Parameterized Dataflow • Parameterized dataflow (PDF) • Use parameters to model run-time system reconfiguration • Each parameter is a variable with a finite set of discrete values • Parameterized attributes in SPIR • Dataflow rates • Conditional dataflow • Number of dataflow actors A[0] A[1] split merge A[n] Number of A nodes = {1, 4, 12}

  14. Parameterized Dataflow • Parameterized dataflow (PDF) • Use parameters to model run-time system reconfiguration • Each parameter is a variable with a finite set of discrete values • Parameterized attributes in SPIR • Dataflow rates • Conditional dataflow • Number of dataflow actors • Streaming size between reconfigurations • There are also other modifications to the dataflow model • Please refer to the paper for further details stream_size = {10k, 20k}

  15. PDF Run-time Execution Model • Three stage run-time execution model • Goal: provide the efficiency of the synchronous dataflow execution on parameterized dataflow

  16. PDF Run-time Execution Model • Stage 1: dataflow initialization • Convert a PDF graph into a SDF graph • Setting parameter variables to constant values • Perform other initialization computation

  17. PDF Run-time Execution Model • Stage 2: dataflow computation • Dataflow computation following static SDF execution schedules Stream input Stream output

  18. PDF Run-time Execution Model • Stage 3: dataflow finalization • Update the dataflow states with calculated results

  19. System Compilation Frontend SPEX • Start from a stream system described in C or C++ with SPEX • Translate the description into dataflow representation ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend PE0 ARM

  20. SPEX SPEX • Q: Why can’t we compile pure C/C++? • A: Some of C/C++’s language features cannot be translated into dataflow • i.e. passing pointers as function arguments • C/C++: pointer’s memory locations can be read and written • Dataflow: can have read-only and write-only edges ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend PE0 ARM

  21. SPEX • SPEX is a set of keywords and language restrictions • A guideline for programmers to write stylized C/C++ code that can be translated into dataflow • Dataflow-safe C/C++ programming • SPEX code can be compiled directly with g++ #include <spex_stream.h> SPEX definition headers class WCDMA: spex_kernel{ pdf_node(interleaver)(...) { ... }  Functions for declaring dataflow nodes pdf_node(turbo_dec)(...) { ... } pdf_graph(wcdma_rec)()  Functions for declaring a dataflow graph { ...interleaver(intlv_to_turbo, intlv_in); turbo_dec(turbo_out, intlv_to_turbo); ... } };

  22. SPEX pdf_node Code Snippets Read-only input dataflow edge Write-only output dataflow edge pdf_node(fir)(channel<int> in, channel<int> & out){  ... z[0] = in.pop();  for (i = 0; i < TAPS; i++) {    sum += z[i] * coeff[i];   } out.push(sum); ...} FIR’s dataflow input FIR’s dataflow output

  23. SPEX Code Snippets pdf_graph(WCDMA_rec)() { FIR fir;  ... channel<int> fir_to_rake; ...pdf {   for (i = 0; i < slot_size; i++) { fir.run(fir_to_rake, AtoD); rake.run(rake_out, fir_to_rake); if (mode == voice) viterbi.run(mac_in, rake_out); else turbo.run(mac_in, rake_out); mac(mac_in);     }} } pdf_graph_init(WCDMA_rec)() { ... } pdf_graph_final(WCDMA_rec)() { ... } Static PDF node and edge declarations PDF scope: a PDF graph description. Language restrictions within PDF scope. i.e. - Must only use for-loop constructions with constant loop-bounds - Must only include function calls to pdf_node functions. A guideline for writing dataflow-safe C++ code Descriptions for dataflow initialization and finalization stages vit fir rake if if mac tur

  24. System Compilation Frontend • Translate SPEX into parameterized dataflow representation • Use traditional control-flow and dataflow analysis • Semantic error-checking to ensure dataflow-safe C/C++ code • Possible to support other high-level languages ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend PE0 ARM

  25. System Compilation Backend • Function-level compilation • Node-to-DE assignments • Memory buffer allocations • DMA assignments • Function-level optimizations • Software pipelining • Code generation • Parallel thread generation • Physical buffer allocation • If-conversion and predicate propagation ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend PE0 ARM

  26. Conclusion • System-level compilation framework • We have a working compiler for SPEX • Target: SODA-like multi-core DSPs • Parameterized dataflow is used as compiler IR • SPEX is a set of language extensions for efficient translation from C/C++ into dataflow ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend SPIR Backend DE0 ARM

  27. Questions • www.eecs.umich.edu/~sdrg

  28. Shared Variables In Dataflow • Shared variables are not allowed in traditional dataflow models • SPIR allows shared variables between dataflow nodes • Multi-dimensional streaming patterns • Non-sequential streaming patterns • Decoupled streaming • Shared memory buffers

  29. Backend Compilation • Problem with function-level compilation • Requires function-level parallelism • Wireless protocols do not have many concurrent functions ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend in[0..N] PE0 SPIR PE1 PE2 FIR FIR Rake Rake Backend Turbo PE0 Turbo ARM

  30. Backend Compilation • Utilize existing compiler optimization • Function-level software pipelining • Processing each stream data is the same as a loop iteration • Modulo scheduling applied to function-level compilation ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----- ---- ---- Frontend in[i] SPIR FIR in[i+1] Rake FIR in[i+2] PE0 PE1 PE2 Turbo FIR FIR Rake Turbo Rake Backend PE0 Turbo Rake ARM Turbo

More Related