1 / 27

Static Translation of Stream Programs

Static Translation of Stream Programs. S. M. Farhad School of Information Technology The University of Sydney. Parallel Programming Gap. Uniprocessor hits the physical limit Multicore, many core systems Programming challenges now Multiple flows of control and memories

Download Presentation

Static Translation of Stream Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney

  2. Parallel Programming Gap • Uniprocessor hits the physical limit • Multicore, many core systems • Programming challenges now • Multiple flows of control and memories • Finding algorithm that can run in parallel • Correctness of the program • Synchronization Issues • Debugging the program

  3. Stream Programming Paradigm General purpose programming paradigm Suitable for high performance applications Inherent parallelism Two types of elements in a program: Actors: computation elements Streams: channels to ship data Stream graph: composition of actors and streams Input and output: stream Input channels Actor Output channels 3

  4. StreamIt Language filter pipeline • StreamIt: stream programming language • Architecture independent • Modular may be any StreamIt language construct splitjoin parallel computation splitter joiner feedback loop splitter joiner

  5. A Simple Filter in StreamIt int->int filter A { init { // empty } work pop 1 peek 2 push 1 { push(peek(0) + peek(1)); pop(); } } A

  6. StreamIt Example z A 1 float->float pipeline Example() { add A(); add B(); add splitjoin { split duplicate; add D(); add E(); join roundrobin; } add G(); add H(); } B 1 split 1 1 C D 1 1 join 1 E 1 F

  7. Proc-1 Proc-2 Proc-N Research Question? Parallel System z A 1 B 1 split ? 1 1 D E 1 1 join 1 G 1 H

  8. Static Translation of Stream Programs • We propose • Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program • Quantitative analysis to resolve bottlenecks in stream programs • Algorithm for mapping actors to parallel system • Scheduling mapped actors to processor • Our goal • To statically optimize the throughput of a stream program on a parallel system 8

  9. Overview of the Proposed System Finding a Closed Form for Steady State Find Mapping by ILP or Approximation Resolve Bottleneck Schedule Actors for Processors 9 9

  10. Bandwidth Transfer Model k i • Synchronous model • Fundamental assumption: • Actors deliver constant output bandwidth given a constant input bandwidth • Output bandwidth of actor is normalized (scaled btw. 0..1) • Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight wij wijxi j Finding a Closed Form for Steady State

  11. Simultaneous Linear Functional Equations System • Bandwidth transfer functions: • Resulting in a simultaneous functional equation system: Finding a Closed Form for Steady State

  12. Solving Equation System • Solve by Gaussian-Style equation solver • Bandwidth variables are successively eliminated • Substitution • Loop-breaking • Solution of the system is a linear function for each actor , which • does not depend on the predecessors • depends only on the input bandwidth “z” Finding a Closed Form for Steady State

  13. Mapping Actors to Processors z A • An NP-hard problem • Takes too long time • Approximation algorithm • Much faster O(n log n) • Non-optimal solution • We formulate Integer Linear Programming problem considering • Input, output and processing bandwidths of each actor • Processing capacity of each processor • Inter processor communication bandwidths B 1.0 P1 split 1.0 P2 0.5 0.5 D E 1.0 1.0 join 1.0 G P3 1.0 H Find Mapping by ILP or Approximation

  14. mid zsys left 1 0 right Solution space Find Optimal Solution by Binary Search • Developing a test for a given input bandwidth “z” of stream program • Binary Search • If solution is not feasible at the mid point, new upper bound for z • If solution is feasible at the mid point, new lower bound for z Find Mapping by ILP or Approximation

  15. Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p for all i, 1 ≤ i ≤ n …… (1) for all j, 1 ≤ j ≤ p …… (2) where Find Mapping by ILP or Approximation

  16. Bottleneck Analysis SystemBW • Constraints limiting input bandwidth of the whole program • Input, output and processing bandwidth constraints • Bottleneck-free if SystemBW < ActorBW • If SystemBW > ActorBW then there is a bottleneck in the system 1 0 < ActorBW Resolve Bottleneck

  17. Region Duplication Resolve Bottleneck

  18. Overview of the Proposed System Finding a Closed Form for Steady State Find Mapping by ILP or Approximation Resolve Bottleneck Schedule Actors for Processors 18 18

  19. Example Program

  20. Mapping to Processors

  21. Execution Time of CPLEX

  22. Execution Time of an Approx. Algorithm

  23. Optimal vs. Approximation 23

  24. Summary • Synchronous model for stream programs • Novel steady state model • Statically optimize the throughput of stream programs • Resolving bottleneck by simple quantitative analysis • Finding an approximation for the mapping problem

  25. Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09]

  26. Future Plan to Complete

  27. Questions?

More Related