1 / 18

Accelerated Prediction of Polar Ice and Global Ocean (APPIGO): Overview

This project aims to enhance the performance of Arctic forecast models on advanced architectures. It focuses on improving the sea ice model (CICE), global ocean model (HYCOM), and wave model (WaveWatch III) to enable better predictions of polar ice and global ocean conditions. The project also aims to address challenges in computational intensity, parallelism, and data transfer in order to provide more accurate predictions of Arctic conditions.

chesterr
Download Presentation

Accelerated Prediction of Polar Ice and Global Ocean (APPIGO): Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerated Prediction of Polar Ice and Global Ocean (APPIGO): Overview Phil Jones (LANL) Eric Chassignet (FSU) Elizabeth Hunke, Rob Aulwes (LANL) Alan Wallcraft, Tim Campbell (NRL-SSC) Mohamed Iskandarani, Ben Kirtman (Univ. Miami)

  2. Arctic Prediction • Polar amplification • Rapid ice loss, feedbacks • Impacts on global weather • Human activities • Infrastructure, coastal erosion, permafrost melt • Resource extraction • Shipping • Security/safety, staging • Regime change • Thin ice leads to more variability Shell Kulluck Arctic oil rig runs aground in Gulf of Alaska (USCG photo) LNG carrier Ob River in winter crossing (with icebreakers)

  3. Trump: ISIS route into N. America

  4. Interagency Arctic efforts • Earth System Prediction Capability (ESPC) Focus Area • Sea ice prediction: up to seasonal • Sea Ice Prediction Network (SIPN) • Sea Ice Outlook • This project – enabling better prediction through model performance

  5. Interagency Arctic efforts • Earth System Prediction Capability (ESPC) Focus Area • Sea ice prediction: up to seasonal • Seasonal prediction: Broncos vs Carolina in Super Bowl • Sea Ice Prediction Network (SIPN) • Sea Ice Outlook • This project

  6. APPIGO • Enhance performance of Arctic forecast models on advanced architectures with a focus on: • Los Alamos CICE – sea ice model • HYCOM – global ocean model • WaveWatch III – wave model • Components of Arctic Cap Nowcast/Forecast System (ACNFS), Global Ocean Forecast System (GOFS)

  7. Proposed Approach • Refactoring: incremental • Profile • Accelerate section (slower) • Expand sections • Can test along way • Try directive/other approaches • Optimized • Best possible for specific kernels • Abstractions, larger-scale changes (data structures) • In parallel: optimized operator library • Stennis (HYCOM, Phi/many-core), LANL (GPU, CICE, HYCOM), Miami (operators), FSU (validation, science)

  8. APPIGO proposed timeline • Year 1 • Initial profiling • Initial acceleration (deceleration!) • CICE: GPU • HYCOM: GPU, Phi (MIC) • WW3: hybrid scalability • Begin operator libs • Year 2 • Continued optimization • Expand accelerated regions (change sign) • Abstractions, operator lib • Year 3 • Deploy in models and validate with science

  9. Progress to Date

  10. Focus on CICE: Challenges • CICE • Dynamics (EVP rheology) • Transport • Column physics (thermo, ridging, etc.) • Quasi-2d • Num of levels, thickness classes small • Parallelism • Not enough in just horiz domain decomp • Computational intensity • Maybe not enough work for efficient kernels • BGC and new improvements help

  11. Accelerating CICE with OpenACC • Focused on dynamics • Halo updates presented signification challenge • Attempted to use GPUDirect to avoid extra GPU – CPU data transfers • What we tried • Refactored loops to get more computation on GPU • Fused separate kernels • Using OpenACC streams to get concurrent execution and hide data transfer latencies

  12. HYCOM ProgressLarge Benchmark • Standard DoD HPCMP HYCOM 1/25 global benchmark • 9000 by 6595 by 32 layers • Includes typical I/O and data sampling • Benchmark updated from HYCOM version 2.2.27 to 2.2.98 • Land masks in place of do-loop land avoidance • Dynamic vs static memory allocation

  13. HYCOM ProgressLarge Benchmark • On the Cray XC40: • Using huge pages improves performance by about 3% • Making the first dimension of all arrays a multiple of 8 saved 3-6% • Change a single number in the run-time patch.input file • ifort -align array64byte • Total Core hours per model day vs number of cores • 3 generations of Xeon cores • No single-core improvement, but 8 vs 12 vs 16 cores per socket

  14. HYCOM on Xeon Phi • Standard gx1v6 HYCOM benchmark run in native mode on 48 cores of single 5120D Phi attached to Navy DSRC’s Cray XC30 • No additional code optimization • Compared to 24 cores of a single Xeon E5-2697v2 node • Individual subroutines run 6 to 13 times slower • Overall, 10 times slower • Memory capacity is too small • I/O is very slow • Native mode is not practical • Decided not to optimize for Knights Corner - Knights Landing very different • Self hosting Knights Landing nodes • Up to 72 cores per socket, lots of memory • Scalability of 1/25 global HYCOM make this a good target • May need additional vector (AVX-512F) optimization • I/O must perform well

  15. Validation Case • CESM test case • HYCOM (2.2.35), CICE • Implementation of flux exchange • HYCOM, CICE in G compset • Three 50-year experiments • CORE v2 forcing • HYCOM in CESM w/ CICE • POP in CESM w/ CICE • HYCOM standalone w/ CICE

  16. Lessons Learned • Hosted accelerators suck • Programming models, software stack immature • Inability to even build at Hackathon a year ago • Substantial improvement • Can build and run to break-even at 2015 Hackathon • OpenACC can compete with CUDA, 2-3x speedup • Based on ACME atmosphere experience • GPU Direct • Need to expand accelerated regions beyond single-routine to gain performance • We have learned a great deal and obtained valuable experience

  17. APPIGO Final Year • CICE • Continue, expand OpenACC work • Column physics • HYCOM • Revisit OpenACC • Continue work toward Intel Phi • Continue validation/comparison • Coupled and uncoupled

  18. APPIGO Continuation? • Focus on path to operational ESPC model • Continued optimization, but focus on coverage, incorporation into production models • CICE, HYCOM on Phi (threading), GPU (OpenACC) • WWIII? • Science application • Use coupled sims to understand Arctic regime change • Throw Mo under the bus: Abandon stencils • Too fine granularity

More Related