1 / 34

CGAM Running the Met Office Unified Model on HPCx

CGAM Running the Met Office Unified Model on HPCx. Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk www.cgam.nerc.ac.uk/~paul. Overview. CGAM : Who, what, why and how The Met Office Unified Model Ensemble Climate Models High Resolution Climate Models Unified Model Performance

wren
Download Presentation

CGAM Running the Met Office Unified Model on HPCx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGAMRunning the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk www.cgam.nerc.ac.uk/~paul

  2. Overview • CGAM : Who, what, why and how • The Met Office Unified Model • Ensemble Climate Models • High Resolution Climate Models • Unified Model Performance • Future Challenges and Directions

  3. Who is CGAM? N.E.R.C. Distributed Institute for Atmospheric Composition Atmospheric Chemistry Modelling Support Unit NERC Centres for Atmospheric Science Centre for Global Atmospheric Modelling Centre for Ecology and Hydrology Centre for Polar Observations and Modelling National Institute for Environmental e-Science Tyndall Centre for Climate Change Research British Antarctic Survey Centre for Terrestrial Carbon Dynamics Southampton Oceanography Centre Proudman Oceanographic Laboratory British Geological Survey Data Assimilation Research Centre Environmental Systems Science Centre British Atmospheric Data Centre Universities’ Weather and Environment Research Network Facility for Airbourne Atmospheric Measurements University Facilities for Atmospheric Measurement

  4. What does CGAM do? • Climate Science • UK Centre of expertise for climate science • Lead UK research in climate science • Understand and simulate the highly non-linear dynamics and feedbacks of the climate system • Earth System Modelling • From seasonal to 100’s of years • Close links to Met Office • Computational Science • Support scientists using Unified Model • Porting and optimisation • Development of new tools

  5. Why does CGAM exist? • Will there be an El Nino this year? • How severe will it be? • Are we seeing increases in extreme weather events in the UK? • 2000 Autumn floods • Drought? • Will the milder winters of the last decade continue? • Can we reproduce and understand past abrupt changes in climate?

  6. How does CGAM answer such questions? • Models are our laboratory • Investigate predictability • Explore forcings and feedbacks • Test hypothesis

  7. Met Office Unified Model • Standardise on using a single model • Met Office’s Hadley Centre recognised as world leader in climate research • Two way collaboration with the Met Office • Very flexible model • Forecast • Climate • Global or Limited Area • Coupled ocean model • Easy configuration via a GUI • User configurable diagnostic output

  8. Unified Model : Technical Details • Climate configuration uses “old” vn4.5 • Vn5 has an updated dynamical core • Next generation “HadGEM” climate configuration will use this • Grid-point model • Regular latitude/longitude grid • Dynamics • Split-explicit finite-difference scheme • Diffusion and polar filtering • Physical Parameterisation • Almost all constrained to a vertical column

  9. Unified Model : Parallelisation • Domain decomposition • Atmosphere : 2D regular decomposition • Ocean : 1D (latitude) decomposition • GCOM library for communications • Interface to selectable communications library: MPI, SHMEM, ??? • Basic communication primitives • Specialised communications for UM • Communication Patterns • Halo update (SWAPBOUNDS) • Gather/scatter • Global/partial summations • Designed/optimised for Cray T3E!

  10. Model Configurations • Currently • HadAM3 / HadCM3 • Low resolution (270km : 96 x 73 x 19L) • Running on ~10-40 CPUs • Turing (T3E1200), Green (O3800), Beowulf cluster • Over the next year • More of the same • Ensembles • Low resolution (HadAM3/HadCM3) • 10-100 members • High resolution • 90km : 288 x 217 x 30L • 60km : 432 x 325 x 40L

  11. Ensemble Methods in Weather Forecasting • Have been used operationally for many years (is. ECMWF) • Perturbed starting conditions • Reduced resolution • Multi-model ensembles • Perturbed starting conditions • Different models • Why are they used? • Give some indication of predictability • Allows objective assessment of weather-related risks • More chance of seeing extreme events

  12. Climate Ensembles • Predictability • What confidence do we have in climate change? • What effect do different forcings have? • CO2 – different scenarios • Volcano erruptions • Deforestation • How sensitive is the model • Twiddle the knobs and see what happens • How likely are extreme events? • Allows governments to take defensive action now

  13. Ensembles Implementation • Setup • Allow users to specify and design an ensemble exeperiment • Runtime • Allow the ensemble to run as a single job on the machine for easy management • Analysis • How to view and process vast amounts of data produced

  14. Setup : Normal UM workflow UM Job Shell script[poe executable] Fortran Namelists UMUI Data Output Starting data Diagnostics Forcing data Restart data

  15. Setup : UM Ensemble workflow UM Job Job.1 Job.3 Job.2 ect Control Shell script Shell script Shell script Shell script[poe executable] Fortran Namelists Fortran Namelists Fortran Namelists poe UM_Job Fortran Namelists Config N_MEMBERS=3Differences UM_Job ecdt $MEMBERid=…cd “Job.$MEMBERid”Run script Data.3 Data.2 Data.1 Starting data Starting data Starting data Out.2 Out.3 Out.1 Forcing data Forcing data Forcing data Diagnostics Diagnostics Diagnostics Restart data Restart data Restart data

  16. UM Ensemble : Runtime (1) • “poe” called at top level – calls a “top_level_script” • Works out which CPU it’s on • Hence which member it is • Hence which directory/model SCRIPT to run • Model scripts run in a separate directory for each member • Each model script calls the executable

  17. UM Ensemble : Run time (2) • Uses “MPH” to change the global communicator • http://www.nersc.gov/research/SCG/acpi/MPH/ • Freely available tool from NERSC • MPH designed for running coupled multi-model experiments • Each member has a unique MPI communicator replacing the global communicator

  18. UM Ensemble : Future Work • Run time tools • Control and monitoring of ensemble members • Real-time production of diagnostics • Currently each member writes its own diagnostics files • Lots of disk space • I/O performance? • Have a dedicated diagnostics process • Only output statistical analysis

  19. UK-HIGEM • National “Grand Challenge” Programme for High Resolution Modelling of the Global Environment • Collaboration between a number of academic groups and the Met Office’s Hadley Centre • Develop high resolution version of HadGEM (~ 10 atmosphere, 1/30 ocean) • Better understanding and prediction of • Extreme events • Predictability • Feedbacks and interactions • Climate “surprises” • Regional Impacts of climate change

  20. UK HiGEM Status • Project only just starting • Plan to use Earth Simulator for production runs • Preliminary runs carried out • Earth Simulator • Very encouraging results • HPCx is a useful platform • For development • Possibly for some production runs

  21. UM Performance • Two configurations • Low resolution 96x73x19L • High resolution 288x217x30L • Built in comprehensive timer diagnostics • Wallclock time • Communications • Not yet implemented • I/O, memory, hardware counters, ??? • Outputs an XML file • Analysed using PHP web page

  22. LowRes Scalability

  23. LowRes : Communication Time

  24. LowRes : Load Imbalance

  25. LowRes : Relative Costs

  26. HiRes Scalability

  27. HiRes Communication Time

  28. HiRes Load Imbalance

  29. HiRes Relative Costs

  30. HiRes Exclusive Timer • QT_POS has large “Collective” time • Unexpected! • Call to global_MAX routine in gather/scatter • Not needed, so deleted!

  31. HiRes : After “optimisation” • QT_POS reduced from 65s to 35s • Improved scalability • And repeat…

  32. Optimisation Strategy • Low Res • Aiming for 8 CPU runs as ensemble members (typically ~50 members) • Physics optimisation a priority • Load Imbalance (SW radiation) • Single processor optimisation • Hi Res • As many CPUs as is feasible • Dynamics optimisation a priority • Remove/optimise collective operations • Increase average message length

  33. Future Challenges • Diagnostics and I/O • UM does huge amounts of diagnostic I/O in a typical climate run • All I/O through a single processor • Cost of gather • Non-parallel I/O • Ocean models • Only 1D decomposition, so limited scalability • T3E optimised! • Next generation UM5.x • Much more expensive • Better parallelisation for dynamics scheme

More Related