1 / 21

StreamX10: A Stream Programming Framework on X10

StreamX10: A Stream Programming Framework on X10. Haitao Wei 2012-06- 1 4. School of Computer Science at Huazhong University of Sci&Tech. Outline. 1. Introduction and Background. 2. COStream Programming Language. 3. Stream Compilation on X10. 4. Experiments. 5.

teness
Download Presentation

StreamX10: A Stream Programming Framework on X10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StreamX10: A Stream Programming Framework on X10 Haitao Wei 2012-06-14 School of Computer Science at Huazhong University of Sci&Tech

  2. Outline 1 Introduction and Background 2 COStream Programming Language 3 Stream Compilation on X10 4 Experiments 5 Conclusion and Future Work 2

  3. Background and motivition • Stream Programming • A high level programming model that has been productively applied • Usually, depends on the specific architectures which makes it difficult to port between different platforms • X10 • a productive parallel programming environment • isolates the different architecture details • provides a flexible parallel programming abstract layer for stream programming • StreamX10:trytomake the stream program portable based on X10

  4. Outline 1 Introduction and Background 2 COStream Programming Language 3 Stream Compilation on X10 4 Experiments 5 Conclusion and Future Work 4

  5. COStreamLanguage • stream • FIFO queue connecting operators • operator • Basic func unit—actor node in stream graph • Multiple inputs and multiple outputs • Window • like pop,peek,push operations • Init and work function • composite • Connected operators—subgraph of actors • A stream program is composed of composites

  6. COStreamand Stream Graph stream operator composite peek=10 pop=1 pop=1 S P Source Averager Sink push=1 push=1

  7. Outline 1 Introduction and Background 2 COStream Programming Language 3 Stream Compilation on X10 4 Experiments 5 Conclusion and Future Work 7

  8. Compilation flow of StreamX10

  9. The Execution Framework • The node is partitioned between the places • Each node is mapped to an activity • The nodes use the pipeline fashion to exploit the parallelisms • The local and Global FIFO buffer are used

  10. Work Partition Inter-place 10 Comp. work=10 1 2 Comp. work=10 5 5 5 2 2 2 5 5 5 Comp. work=10 2 1 Speedup:30/10 =3 Communication:2 10 Objective:Minimized Communication and Load Balance (Using Metis)

  11. Global FIFO implementation • Each Producer/Consumer has its own local buffer • the producer uses push operation to store the data to the local buffer • The consumer uses peek/pop operation to fetch data from the local buffer • When the local buffer is full/empty is data will be copied automatically

  12. X10 code in the Back-end Define the work function Call the work function in initial and steady schedule Spawn activities for each node at place according to the partition

  13. Outline 1 Introduction and Background 2 COStream Programming Language 3 Stream Compilation on X10 4 Experiments 5 Conclusion and Future Work 13

  14. Experimental Platform and Benchmarks • Platform • Intel Xeon processor (8 cores ) 2.4 GHZ with 4GB memory • Radhat EL5 with Linux 2.6.18 • X10 compiler and runtime used are 2.2.0 • Benchmarks • Rewrite 11 benchmarks from StreamIt

  15. The throughputs comparison • Throughputs of 4 different configurations (NPLACE*NTHREAD=8) • Normalized to 1 place with 8 threads • for most benchmarks, CPU utilization increases from 24% to 89% ,when places varies from 1 to 4, except for the benchmark with low computation/communication ratio • benefits are little or worse when the number of places increases from 4 to 8

  16. Observation and Analysis • The throughput goes up when the number of places increases. This is because that multiple places increase the CPU utilization • Multiple places show parallelism but also bring more communication overhead • Benchmarks with more computation workload like DES and Serpent_fullcan still benefit form the number of places increasing

  17. Outline 1 Introduction and Background 2 COStream Programming Language 3 Stream Compilation on X10 4 Experiments 5 Conclusion and Future Work 17

  18. Conclusion • We proposed and implemented StreamX10, a stream programming language and compilation system on X10 • A raw partitioning optimization is proposed to exploit the parallelisms based on X10 execution model • Preliminary experiment is conducted to study the performance

  19. Future Work • How to choose the best configuration (# of places and # of threads) automatically for each benchmark • How to decrease the thread switching overhead by mapping multiple nodes to the single activity

  20. Acknowledgment • X10 Innovation Award founding support • QiMingTeng, Haibo Lin and David P. Grove at IBM for their help on this research

  21. Thank you!

More Related