1 / 37

Designing Parallel Operating Systems via Parallel Programming

Designing Parallel Operating Systems via Parallel Programming. Eitan Frachtenberg 1 , Kei Davis 1 , Fabrizio Petrini 1 , Juan Fernández 1,2 and José Carlos Sancho 1 1 Performance and Architecture Lab (PAL) 2 Grupo de Arquitectura y Computación Paralelas (GACOP)

hazina
Download Presentation

Designing Parallel Operating Systems via Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Parallel Operating Systemsvia Parallel Programming Eitan Frachtenberg1, Kei Davis1, Fabrizio Petrini1, Juan Fernández1,2 and José Carlos Sancho1 1Performance and Architecture Lab (PAL) 2Grupo de Arquitectura y Computación Paralelas (GACOP) CCS-3 Modeling, Algorithms and Informatics Dpto. Ingeniería y Tecnología de Computadores Los Alamos National Laboratory, NM 87545, USA Universidad de Murcia, 30071 Murcia, SPAIN URL: http://www.c3.lanl.gov URL: http://www.ditec.um.es email:juanf@um.es

  2. Motivation • Clusters have been the most successful player in high-performance computing in the last decade OS OS OS OS OS OS OS OS HARDWARE = Independent Nodes + High-speed Network SOFTWARE = Commodity OS + Parallel Apps + System Software

  3. Motivation • Ever-increasing demand for computing capability is driving the construction of ever-larger clusters 2 3 1 Earth Simulator 5120 Processors Thunder (LLNL) 4096 Processors ASCI Q (LANL) 8192 Processors Systems are becoming more complex, less efficient and less reliable

  4. Motivation • Clusters are loosely-coupled systems used for solving inherently tightly-coupled problems • Parallel software keeps all the pieces together • Development of parallel software is a time- and resource- consuming task due to its complexity PROBLEM: parallel software has neither evolved nor scaled accordingly to cluster sizes SOLUTION: new approach to the design of parallel software for large-scale clusters

  5. Goals • Target • New methodology for the design of parallel software • Simplicity, performance, scalability, reliability • Backbone to integrate all nodes into a parallel OS • Vision • BSP-like system running MIMD applications (variable granularity in the order of hundreds of s) • Approach • BSP-like global control and coordination of all system activities • Small set of collective communication primitives for global coordination

  6. Outline • Motivation and Goals • Toward a Parallel Operating System • Core Primitives • Parallel Software Design • Case Studies • Concluding remarks

  7. Toward a Parallel OS • Designing a Parallel OS: • Lack of global coordination (loose coupling) • Redundant/missing functionality (complexity) Resource Management Parallel Application . . . Parallel File System Comm Protocol 1 Comm Protocol 2 . . . Comm Protocol N Hardware

  8. Toward a Parallel OS • Scientific applications are tightly coupled … • Data dependencies between nodes • They exchange messages very often • … but the processing nodes are “bolted together” in a loosely coupled fashion Need for global control and coordination of all the system activities, enforced by global collective communication primitives

  9. Resource Management Parallel Application . . . Parallel File System Global control and coordination Comm Protocol 1 Comm Protocol 2 . . . Comm Protocol N Hardware Toward a Parallel OS • Designing a Parallel OS: • System-level, global control and coordination of all application and system software activities

  10. Toward a Parallel OS • Parallel applications use point-to-point and collective communication • System software tasks are either collective operations or can be cast in terms of them Parallel applications and system software can be built atop the same communication primitives

  11. Toward a Parallel OS • Designing a Parallel OS: • Least common denominator of system and application software  Core Primitives Resource Management Parallel Application . . . Parallel File System Global control and coordination Comm Protocol 1 Comm Protocol 2 . . . Comm Protocol N Core Primitives Hardware

  12. Outline • Motivation and Goals • Toward a Parallel Operating System • Core Primitives • Parallel Software Design • Case Studies • Concluding remarks

  13. Core Primitives • Parallel software built atop three primitives • Xfer-And-Signal • Transfer block of data to a set of nodes • Optionally signal local/remote event upon completion • Test-Event • Poll local event • Compare-And-Write • Compare global variable on a set of nodes • Optionally write global variable on the same set of nodes

  14. D1 D3 D4 D2 Core Primitives • Parallel software built atop three primitives • Xfer-And-Signal (QsNet): • Node S transfers block of data to nodes D1, D2, D3 and D4 S

  15. D1 D3 D4 D2 Source Event Destination Events Core Primitives • Parallel software built atop three primitives • Xfer-And-Signal (QsNet): • Node S transfers block of data to nodes D1, D2, D3 and D4 • Events triggered at source and destinations S

  16. D1 D3 D4 D2 Core Primitives • Parallel software built atop three primitives • Compare-And-Write (QsNet): • Node S compares variable V on nodes D1, D2, D3 and D4 S • Is V {, , >} to Value?

  17. D1 D3 D4 D2 Core Primitives • Parallel software built atop three primitives • Compare-And-Write (QsNet): • Node S compares variable V on nodes D1, D2, D3 and D4 • Partial results are combined in the switches S

  18. Outline • Motivation and Goals • Toward a Parallel Operating System • Core Primitives • Parallel Software Design • Case Studies • Concluding remarks

  19. Toward a Parallel OS • Global control/coordination of all system activities • Global Strobe • (time slice starts) Task 1 • Global • Synchronization Task 2 Time Slice (hundreds of s) • Global • Synchronization Task 3 • Global Strobe • (time slice ends)

  20. Parallel Software Design • Using the core primitives… • Global control and coordination • Strobe sent at regular intervals (time slices) • Compare-And-Write + Xfer-And-Signal (Master) • Test-Event (Slaves) • All system activities are tightly coupled • Global information is required to schedule resources, global synchronization facilitates the task but it is not enough • Global resource scheduling • Exchange of requirements/restrictions • Xfer-And-Signal + Test-Event • Resource scheduling

  21. Parallel Software Design SYSTEM SOFTWARE

  22. Parallel Software Design • Using the core primitives…

  23. Parallel Software Design Can we really build system software using this new approach?

  24. Outline • Motivation and Goals • Introduction • Core Primitives • Parallel Software Design • Case Studies • Concluding remarks

  25. Case Studies • Experimental Setup

  26. Case Studies • STORM (Scalable TOol for Resource Management) • Architecture: • Set of dæmons running on the management/compute nodes • Built atop the three core primitives • BSP-like behavior: management activities are synchronized and scheduled every few hundreds of microseconds • Functionality: • Job Launching • Job Scheduling (FCFS, gang scheduling and others) • New scheduling algorithms can be “plugged in” • Resource Accounting

  27. Case Studies • Job Launching: send/execute/check for completion 40 times faster than the best reported result!!!

  28. Case Studies • BCS-MPI (Buffered CoScheduled MPI) • Architecture • Set of cooperative threads running in the NIC • Built atop the three core primitives • BSP-like behavior: communications are synchronized and scheduled every few hundreds of microseconds • Functionality: • Subset of the MPI standard • Paves the way to provide: • Traffic segregation • Deterministic replay of user applications • System-level fault tolerance

  29. Case Studies • SWEEP3D and SAGE Performance (IA32) • Production-level MPI versus BCS-MPI 2% SPEEDUP 0.5% SPEEDUP

  30. Outline • Motivation and Goals • Introduction • Core Primitives • Parallel Software Design • Case Studies • Concluding remarks

  31. Concluding Remarks • Methodology for designing parallel software • Coordination of all system and application software activities in a BSP-like fashion • Parallel applications and system software built atop a basic set of collective primitives for global coordination • Backbone to integrate all nodes into a parallel OS • Promising preliminary results demonstrate that this approach is indeed feasible

  32. Future Work • Kernel-level implementation • User-level solution is already working • Deterministic replay of MPI programs • Ordered resource scheduling may enforce reproducibility • Transparent fault tolerance • Global coordination simplifies the state of the machine

  33. Designing Parallel Operating Systemsvia Parallel Programming Eitan Frachtenberg1, Kei Davis1, Fabrizio Petrini1, Juan Fernández1,2 and José Carlos Sancho1 1Performance and Architecture Lab (PAL) 2Grupo de Arquitectura y Computación Paralelas (GACOP) CCS-3 Modeling, Algorithms and Informatics Dpto. Ingeniería y Tecnología de Computadores Los Alamos National Laboratory, NM 87545, USA Universidad de Murcia, 30071 Murcia, SPAIN URL: http://www.c3.lanl.gov URL: http://www.ditec.um.es email:juanf@um.es

  34. Parallel Software Design • Using the core primitives…

  35. Case Studies • Job Scheduling: gang scheduling Very small time slices: RESPONSIVENESS !!!

  36. Toward a Parallel OS • BCS-MPI: real-time communication scheduling • Global Strobe • (time slice starts) Exchange of comm requirements • Global • Synchronization Communication scheduling Time Slice (hundreds of s) • Global • Synchronization Real transmission • Global Strobe • (time slice ends)

  37. Toward a Parallel OS • BCS-MPI: real-time communication scheduling

More Related