Optimization of
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

Optimization of Coupled Systems on Emerging Architectures PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Optimization of Coupled Systems on Emerging Architectures. Gerhard Theurich NRL/SAIC ESMF Executive Board / Interagency Working Group Meeting June 12, 2014. Coupling challenges. Applications are trending toward larger numbers of components: Coupling with multiple time-scales

Download Presentation

Optimization of Coupled Systems on Emerging Architectures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Optimization of coupled systems on emerging architectures

Optimization ofCoupled Systems

on Emerging Architectures

Gerhard Theurich

NRL/SAIC

ESMF Executive Board / Interagency Working Group Meeting

June 12, 2014


Coupling challenges

Coupling challenges

  • Applications are trending toward larger numbers of components:

    • Coupling with multiple time-scales

    • Explicit, semi-implicit and, fully implicit schemes

    • High resolution, adaptive, unstructured grids

    • Hierarchical versus flat component architectures

    • Ensembles: multi-instance, multi-model, concurrent versus sequential.

  • To make matters worse:The growing complexity of coupled systems is compounded by the increasing levels of explicit parallelism on emerging computing architectures.


Hpc hardware is a changing element

HPC hardware is a changing element

  • 1980s – Vector machines

    • SIMD type parallelism of serial code.

  • 1990s – Parallel machines

    • Coarse grain parallelism.

  • Early 2000s – Massively parallel machines (some parallel vector)

    • Distributed memory.

    • Distributed shared memory.

    • MPI as the standard to implement coarse grain parallelism.

  • Early 2010s – Massively parallel machines with multi-core CPUs

    • Hybrid coarse and fine grain parallelism: MPI+threads (e.g. OpenMP)

  • Today – Heterogeneous systems

    • Heterogeneous HW: Multi-core CPUs + GPUs + MICsNot every node the same: different #CPU sockets, memory size & devices

    • Heterogeneous SW: MPI+OpenMP + CUDA/PGI-ACC/Cray-ACC/OpenACC/Intel-MIC-Directives/Intel-MIC-Native/OpenCL


Modern supercomputer architecture

Modern supercomputer architecture

Increasing fine grain parallelism:

Threading and SIMD on CPU cores, and device cores.

Host – multi-core CPU

cores

RAM

Increasing course grain parallelism:

Decomposition and distribution.

Host – multi-core CPU

cores

RAM

O(10k)

nodes

.

.

.

Host – multi-core CPU

cores

RAM


Earth system models on accelerators

Earth system models on accelerators

  • Focus has been on specific models.

  • Optimization of the computationally intensive kernels to take advantage of the extra level of parallelism offered by accelerators.

  • Examples:

    • WRF Single Moment 5-tracer (WSM5) on Nvidia GPU (Michalakes, et al.)

    • NEMS/NMMB on Intel MIC (Michalakes)


Coupling challenges on modern architectures

Coupling challenges on modern architectures

  • Components are distributed across O(10k) nodes.

  • Field and grid data is stored and processed on different hardware throughout a run within a component and between components.

  • Efficient coupling requires data locality between the components to reduce the cost of data movements.

  • A limited number of each type of processing hardware (and memory) is available and must be shared between the components.

  • Efficient use of the available hardware requires some over-subscription, but suffers from too much.


Esmf and the nuopc layer offer key elements to address the coupling challenges

ESMF and the NUOPC Layer offer key elements to address the coupling challenges

  • Support for a wide range of multi-model application architectures.

  • Data types that can represent a large range of structured and unstructured grids.

  • Data types that can represent data decompositions and their distribution across the underlying hardware.

  • Methods to move data efficiently between decompositions/distributions.

  • A well defined set of initialization sequences with multi-way negotiation between the components.

  • Grids and decomposition information can be transferred between components during initialization.


Example of interleaved components

Example of interleaved components

Comp-A

Comp-B

Comp-C

Host – multi-core CPU

GPU

cores

Host – multi-core CPU

GPU

cores

.

.

.

Host – multi-core CPU

GPU

cores


Esmf nuopc accelerator projects

ESMF/NUOPC accelerator projects

  • ESMF team efforts are funded through ONR/Earth System Prediction Capability

  • Target is a suite of coupled models for next generation Naval prediction

  • 1 year ONR seed project: Optimized Infrastructure for the Earth System Prediction Capability includes prototype accelerator support (began May 2013)

  • 3 year ONR project: An Integration and Evaluation Framework for ESPC Coupled Models includes delivery of capability for coupled systems

  • Specific projects we interact with under ONR include:

    • Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO)

    • NPS-NRL-Rice-UIUC Collaboration on Navy Atmosphere-Ocean Coupled Models on Many-Core Computer Architectures

  • Our part is to look into accelerators with ESMF/NUOPC specifically for coupled systems.


Initial questions and considerations

Initial questions and considerations

  • Can components that use different programming models (OpenCL, OpenACC, Intel-MIC-Directives, …) run under the same single ESMF executable?

  • Do the different programming models provide enough control for a component to decide at run-time whether to use a specific accelerator device or not?

  • Is it possible to uniquely identify the available devices? Across programming models? Across the distributed parts of a component? Across components?


Early prototyping

Early prototyping

Jayesh Krishna (ANL) has prototyped ESMF components with OpenCL, OpenACC, and Intel-MIC-Directives.


Next steps

Next steps

  • Offer access to device information through ESMF: enough to guide a driver component to do component placement (interleaved components).

  • Support data references for the most efficient exchange between sequential components that are placed on the same compute resources.

  • Prototype the inter-component negotiation of distributions by looking at the optimization problem of model grid distribution within the mediator component.

  • Explore the possibility of automated construction of interleaved component based on the discovered resources and hints provided by the components during the initialization negotiation.


Thank you

Thank you!

Project page on Earth System CoG:

https://earthsystemcog.org/projects/couplingtestbed/acceleratorplans


  • Login