1 / 20

Parallel Objects: Virtualization & In-Process Components

Parallel Objects: Virtualization & In-Process Components. Orion Sky Lawlor Univ. of Illinois at Urbana-Champaign POHLL-2002. Introduction. Parallel Programming is hard: Communication takes time Message startup cost Bandwidth & contention Synchronization, race conditions

toshi
Download Presentation

Parallel Objects: Virtualization & In-Process Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Objects: Virtualization& In-Process Components Orion Sky Lawlor Univ. of Illinois at Urbana-Champaign POHLL-2002

  2. Introduction Parallel Programming is hard: • Communication takes time • Message startup cost • Bandwidth & contention • Synchronization, race conditions • Parallelism breaks abstractions • Flatten data structures • Hand off control between modules • Harder than serial programming

  3. Motivation Parallel Applications are either: • Embarrassingly Parallel Trivial, 1 RA-week effort E.g. Monte Carlo, parameter sweep, SETI@home Communication totally irrelevant to performance

  4. Motivation Parallel Applications are either: • Embarrassingly Parallel • Excruciatingly Parallel Massive, 1+ RA-year effort E.g. “Pure” MPI codes ≥10k lines Communication, synchronization totally determine performance

  5. Motivation Parallel Applications are either: • Embarrassingly Parallel • Excruciatingly Parallel • “We’ll be done in 6 months…” Several parallel libraries & codes & groups, dynamic & adaptive E.g. Multiphysics simulation

  6. Serial Solution: Abstract! Build layers of software • High-level: Libc, C++ STL, … • Mid-level: OS Kernel • Silently schedule processes • Keep CPU busy even when some processes block • Allows a process to ignore other processes • Low-level: assembler

  7. Parallel Solution: Abstract! Middle layers are missing • High-level: ScaLAPACK, POOMA.. • Mid-level: ? Kernel • Silently schedule components • Keep CPU busy even when some components block • Allows a component to ignore other components • Low-level: MPI

  8. The missing middle layer: • Provides dynamic computation and communication overlap, even across separate modules • Handles inter-module handoff • Pipelines communication • Improves cache utilization—smaller components • Provides nice layer for advanced features, like process migration

  9. Examples: Multiprogramming

  10. Examples: Pipelining

  11. Middle Layer: Implementation • Real OS processes/threads • Robust, reliable, implemented • High performance penalty • No parallel features (migration!) • Converse/Charm++ • In-process components: efficient • Piles of advanced features • AMPI, MPI interface to Charm • Application Framework

  12. Charm++ • Parallel library for Object-Oriented C++ applications • Messaging via method calls • Communication “proxy” objects • Methods called by scheduler • System determines who runs next • Multiple objects per processor • Object migration fully supported • Even with broadcasts, reductions

  13. Mapping Work to Processors System implementation User View

  14. AMPI • MPI interface, implemented on Charm++ • Multiple “virtual processors” per physical processor • Implemented as user-level threads • Very fast context switching • MPI_Recv only blocks virtual processor, not physical • All the benefits of Charm++

  15. Application Frameworks • Domain-specific interfaces: unstructured grids, structured grids, particle-in-cell • Provide natural interface to application scientists (Fortran!) • “Encapsulate” communication • Built on Charm++ • Most popular interfaces to Charm++

  16. Charm++ Features: Migration • Automatic load balancing • Balance load by migrating objects • Application-independent • Built-in data collection (cpu, net) • Pluggable “strategy” modules • Adaptive Job Scheduler • Shrink/expand parallel job, by migrating objects • Dramatic utilization improvment

  17. Examples: Load Balancing 1. Adaptive Refinement 3. Chunks Migrated 2. Load Balancer Invoked

  18. Examples: Expanding Job

  19. Examples: Virtualization

  20. Conclusions • Parallel applications need something like a “kernel” • Neutral party to mediate CPU use • Significant utilization gains • Easy to put good tools in kernel • Work migration support • Load balancing • Consider using Charm++ http://charm.cs.uiuc.edu/

More Related