1 / 36

Scheduling of Parallelized Synchronous Dataflow Actors

Scheduling of Parallelized Synchronous Dataflow Actors Zheng Zhou*, Karol Desnos**, Maxime Pelcat**, Jean-François Nezan**, William Plishker*, and Shuvra S. Bhattacharyya* . **Institut d'Electronique et de Telecommunications de Rennes INSA Rennes, CNRS UMR 6164, UEB, Rennes, France.

matt
Download Presentation

Scheduling of Parallelized Synchronous Dataflow Actors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling of Parallelized Synchronous Dataflow Actors Zheng Zhou*, Karol Desnos**, Maxime Pelcat**, Jean-François Nezan**, William Plishker*, and Shuvra S. Bhattacharyya* **Institut d'Electronique et de Telecommunications de Rennes INSA Rennes, CNRS UMR 6164, UEB, Rennes, France *Maryland DSPCAD Research Grouphttp://www.ece.umd.edu/DSPCAD/home/dspcad.htm Department of ECE, andInstitute for Advanced Computer StudiesUniversity of Maryland, College Park, 20742, USA Presentation version: 10/24/2013

  2. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  3. Introductions William Plishker, research associate at University of Maryland. Expertise in dataflow representation and analysis, software defined radio, medical imaging, and high energy physics. Zheng Zhou, software engineer at Texas Instruments, alumni at University of Maryland. Expertise in dataflow models, multiprocessor programming, and task scheduling for embedded systems. Shuvra S. Bhattacharyya, professor at University of Maryland. Expertise in real-time signal processing systems, & model-based HW & SW design tools, dataflow methodologies. Karol Desnos, PhD student at IETR. Research interest in dataflow models, wireless communication, and memory management of embedded systems. Maxime Pelcat, associate professor at IETR. Expertise in dataflow models, multimedia, telecommunication, and programming of distributed embedded systems. Jean-François Nezan, professor at IETR. Expertise in dataflow programming, embedded systems, multicore and video compression.

  4. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  5. Motivation Application FFT implementations[Zhou 2012] • An actor in the application may have multiple implementations including sequential and parallel implementations. • Choosing appropriate actor implementations together with actor execution order has major impact on system implementation performance. A C D B

  6. Background: Dataflow Interchange Format (DIF) [Hsu 2005] • Astandard language for specifying mixed-grain dataflow models for digital signal processing (DSP) systems. • Currently supports • Synchronous Dataflow (SDF) • Homogeneous Synchronous Dataflow (HSDF) • Cyclo-static Dataflow (CSDF) • Parameterized Synchronous Dataflow (PSDF) • Multidimensional Synchronous Dataflow (MDSDF) • Boolean Dataflow (BDF) • Enable-Invoke Dataflow (EIDF) • Core Functional Dataflow (CFDF)

  7. Background: TDIF-PPG [Zhou 2012] TDIF-PPG is a dataflow-based software design package  “Targeted dataflow interchange format / parallel processing group plug-in” • In Layer 1, the given DSP application is modeled using the DIF language. • In Layer 2, the actor interfaces are defined. • In Layer 3, generic parallel implementations of the actors are developed. • In Layer 4, a platform-specific system implementation is constructed.

  8. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  9. Related Work on Static Task Scheduling • Independent Sequential Task Scheduling • Dynamic programming [Dogramaci 1979] • Dependent Sequential Task Scheduling • Modified critical path algorithm (Heuristic) [Wu 1990] • Genetic algorithm [Omara 2010] • Independent Parallel Task Scheduling • Approximation algorithm [Nahapetian 2009] • Dependent Parallel Task Scheduling • Network flow [Giaro 2009][Manaa 2010]  Does not consider interprocessor communication costs nor multiple implementations of the actor

  10. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  11. Problem Statement: Parallel Actor Scheduling(PAS) • Given: • A dataflow graph G • A Symmetric Multi-Processing (SMP) platform P • An Actor Acceleration FunctionC, which provides information about how much time it takes (actual or estimated) for a given actor to execute on a given number of processors. • Determine: • How many processors will be used for each actor (processor count assignment). • The processor assignment for each actor. • The starting time of each actor. • We focus here on a special case of the PAS problem called the Fully Parallelized PAS (FPPAS) problem, where only parallel implementations of an actor are considered (we assume every actor has at least one parallel implementation).

  12. FPPAS example

  13. Optimal Schedule

  14. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  15. Overall Approach Phase 1 FP-PAS instance Particle swarm Schedule length Processor Count assignment Phase 2 Heuristic solver MIP solver

  16. Dataflow graph G(V,E) with processor count assignment MIP solver Computation Usage Graph (CUG) B1 A A1 B2 A2 B B3

  17. MIP Solver: Variables

  18. MIP Solver: Constraints and Objective

  19. Heuristic Solver:“Story Scheduling” Ranking: 3, 2, 1, 6, 4, 5(Ties are broken by selecting actors with higher processor count) Free vertex list: 1, 2, 3 Scheduled vertex list: empty

  20. Time(s) Story Scheduling Example 15 Free vertex list: 1, 2, 3 Scheduled vertex list: 3 Updated free vertex list: 1, 2 10 5 3 1st floor 8 0 Processor number

  21. Story Scheduling Example Time(s) 15 Free vertex list: 2, 1 Scheduled vertex list: 3, 1 10 Updated free vertex list: 2 5 1st floor 8 0 Processor number 3 1

  22. Story Scheduling Example Time(s) 15 Free vertex list: 2 Scheduled vertex list: 3, 1 10 Updated free vertex list: 2, 6, 4 5 1st floor 3 1 8 0 Processor number

  23. Story Scheduling Example Time(s) 15 Free vertex list: 2, 6, 4 Scheduled vertex list: 3, 1, 2 2 10 Updated free vertex list: 6, 4 2nd floor 5 Selected actor: 2 C(2, 4) = 5s 1st floor 3 1 8 0 Processor number

  24. Story Scheduling Example Time(s) 15 Free vertex list: 6, 4 Scheduled vertex list: 3, 1, 2, 4 4 2 10 Updated free vertex list: 6 2nd floor 5 Selected actor: 4 C(4, 4) = 5s 1st floor 3 1 8 0 Processor number

  25. Story Scheduling Example Time(s) 15 Free vertex list: 6 Scheduled vertex list: 3, 1, 2, 4 4 2 10 Updated free vertex list: 6, 5 2nd floor 5 1st floor 3 1 8 0 Processor number

  26. Story Scheduling Example 6 Time(s) 15 Free vertex list: 5 3rd floor Scheduled vertex list: 3, 1, 2, 4, 6 4 2 10 2nd floor 5 Selected actor: 6 C(6, 5) = 5s 1st floor 3 1 8 0 Processor number

  27. Story Scheduling Example Time(s) 15 Free vertex list: empty 3rd floor Scheduled vertex list: 3, 1, 2, 4, 6, 5 4 2 10 2nd floor 5 Selected actor: 5 C(5, 3) = 5s 1st floor 6 3 1 5 8 0 Processor number

  28. Simulation Results Benchmark set  Randomly generated SDF graphs using PREESM [Pelcat 2009]

  29. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  30. Experimental Setup TI TMS320C6678 Platform • Software Packages: • Code Composer Studio V5.2 • IPC 1.24.2.27 • PDK 1.1.0.2 • SYS/BIOS 6.33.4.39 • BIOS-MCSDK 2.1.0 Beta.

  31. Image Registration IRSUB Local Extrema Detection Post- Processing Difference of Gaussian Descriptor Assignment Cascade Gaussian Filter Image Reader (Reference) Key Points Matching Matching Refinement Target Image Transformation Cascade Gaussian Filter Image Reader (target) Image Writer Descriptor Assignment Difference of Gaussian Post- Processing Local Extrema Detection

  32. Outline • Motivation • Background • Related Work • Problem Statement • Solution Approach • Experimental Setup • Experimental Results

  33. Experimental Results • Implementation 1 only explores graph level parallelism: • all actors are sequential actors; • the application is assigned to 2 DSPs(optimal schedule). • Implementation 2 explores both graph level parallelism and actor level parallelism: • the actors in blue are parallel actors; • the application is assigned to 4 DSPs using the scheduling solution obtained from our 2-Phase Scheduling Framework. • Implementation 2 achieves 1.97X speedup compared to Implementation 1.

  34. Summary • New design methods and algorithms for scheduling parallelized synchronous dataflow actors, and graphs that contain such actors • We have focused on a special case of the PAS problem called the Fully Parallelized PAS (FPPAS) problem • Only parallel implementations of an actor are considered (we assume every actor has at least one parallel implementation). • Solution approach • Decomposition into global and local search phases • Local search is in terms of fixed processor count assignments • Novel MIP and heuristic (“story scheduling”) techniques for local search • Particle swarm optimization for global search process • Experimental results using simulation on random graphs, as well as an off-the-shelf digital signal processor

  35. References 1 • [Zhou 2012] Z. Zhou, C. Shen, W. Plishker, H. Wu, and S. S. Bhattacharyya. Systematic integration of flowgraph- and module-level parallelism in implementation of DSP applications on multiprocessor systems-on-chip. In Proceedings of the International Conference on Signal Processing, pages 402-408, Beijing, China, October 2012. • [Hsu 2005] C. Hsu, M. Ko, and S. S. Bhattacharyya. Software synthesis from the dataflow interchange format.in Proceedings of the International Workshop on Software and Compilers for Embedded Systems, Dallas, Texas, September 2005, pp. 37–49.. • [Dogramaci 1979] A. Dogramaci and J. Surkis. 1979. Evaluation of a Heuristic for Scheduling Independent Jobs on Parallel Identical Processors. Management Science 25, 12 (1979), 1208–1216. • [Wu 1990] M.-Y. Wu and D.D. Gajski. 1990. Hypertool: a programming aid for message-passing systems. IEEE Transactions on Parallel and Distributed Systems 1, 3 (1990), 330–343.

  36. References 2 • [Pelcat 2009] M. Pelcat, P. Menuet, S. Aridhi, and J.-F. Nezan. Scalable compile-time scheduler for multi-core architectures. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 1552-1555, 2009. • [Omara 2010] F. A. Omara and M. M. Arafa. 2010. Genetic algorithms for task scheduling problem. J. Parallel and Distrib. Comput. 70, 1 (2010), 13–22. • [Nahapetian 2009] Ani Nahapetian, Philip Brisk, Soheil Ghiasi, and Majid Sarrafzadeh. 2009. An approximation algorithm for scheduling on heterogeneous reconfigurable resources. ACM Trans. Embed. Comput. Syst. 9, 1, Article 5 (Oct. 2009), 20 pages. • [Manaa 2010] A. Manaa and C. Chu. 2010. Scheduling multiprocessor tasks to minimise the makespan on two dedicated processors. European Journal of Industrial Engineering 4, 3 (2010), 265–279 • [Giaro 2009] K. Giaro, M. Kubale, and P. Obszarski. 2009. A graph coloring approach to scheduling of multiprocessor tasks on dedicated machines with availability constraints. Discrete Applied Mathematics 157, 17 (2009), 3625–3630.

More Related