1 / 28

Challenges in analyzing a High-Resolution Climate Simulation

Challenges in analyzing a High-Resolution Climate Simulation. John M. Dennis, Matthew Woitaszek dennis @ucar. edu , mattheww@ucar.edu October 26, 2010. Climate Modeling?. Coupled models Atmosphere-Ocean-Sea Ice-Land Challenges of climate: Need many years --> Limits resolution -->

nickan
Download Presentation

Challenges in analyzing a High-Resolution Climate Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in analyzing a High-Resolution Climate Simulation John M. Dennis, Matthew Woitaszek dennis@ucar.edu, mattheww@ucar.edu October 26, 2010 GG in Data-Intensive Discovery

  2. Climate Modeling? • Coupled models Atmosphere-Ocean-Sea Ice-Land • Challenges of climate: • Need many years --> • Limits resolution --> • Limits parallelism • Community Earth System Model (CESM) Formally: Community Climate System Model (CCSM) GG in Data-Intensive Discovery

  3. IPCC & AR5 • Intergovernmental Panel on Climate Change (IPCC) “The IPCC assesses the scientific, technical and socio-economical information relevant for the understanding of the risk of human-induced climate change” • The Fifth Assessment Report (AR5) • CESM plans • Control: 1 run x 1300 years • Historical: 40 runs x 55 years • Future: 180 runs x 95 years 20,600 years of simulation GG in Data-Intensive Discovery

  4. Would you like to Supersize that? GG in Data-Intensive Discovery

  5. Absolutely, But how will it clog up my networks, and fill up my archive? GG in Data-Intensive Discovery

  6. PetaApps project • PetaApps project: Interactive Ensemble • Kinter, Stan (COLA) • Kirtman (U of Miami) • Collins, Yelick (Berkley) • Bryan, Dennis, Loft, Vertenstein (NCAR) • Bitz (U of Washington) • Ultra-High resolution climate • Explore impact of weather noise on Climate • Explore technical/Computer Science issues • ~99,000 core Cray XT5 system at NICS [Kraken] • Large TG allocation: 35M CPU hours GG in Data-Intensive Discovery

  7. Wendell Sea:Ice thickness (1°) GG in Data-Intensive Discovery

  8. GG in Data-Intensive Discovery

  9. Impact of Supersizing 16x 100x GG in Data-Intensive Discovery

  10. Outline • Climate Modeling • Production Data challenges: AR5 & IPCC • PetaApps simulation • Generating the Data • Analyzing the Data • Conclusions GG in Data-Intensive Discovery

  11. Large scale PetaApps run 100x current production • 155 year control run • 0.1° Ocean model [ 3600 x 2400 x 42 ] • 0.1° Sea-ice model [3600 x 2400 x 20 ] • 0.5° Atmosphere [576 x 384 x 26 ] • 0.5° Land [576 x 384 • Statistics • ~18M CPU hours • 5844 cores for 4-5 months • ~100 TB of data generated • 0.5 to 1 TB per wall clock day generated 4x current production GG in Data-Intensive Discovery

  12. Large scale PetaApps run (con’t) • Work flow • Run on Kraken (NICS) • Transfer output from NICS to NCAR (100 – 180 MB/sec sustained) • Caused noticeable spikes in TG network traffic • Archive on HPSS • Data analysis using 55 TB project space at NCAR GG in Data-Intensive Discovery

  13. Issues/challenges with runs • Very large variability with I/O performance • 2-10x slowdown common • 300x slowdown was observed • Interference with other jobs? GG in Data-Intensive Discovery

  14. Outline • Climate Modeling • Production Data challenges: AR5 & IPCC • PetaApps simulation • Generating the Data • Analyzing the Data • Conclusions GG in Data-Intensive Discovery

  15. Issues with data analysis • Miss-match between structure and form of data needed for analysis • Model generated file: Timestep = n : U,V,T,PS,CLDLOW file.n.nc Timestep= n+step=m: U,V,T,PS,CLDLOW file.m.nc • Analysis: Average T (temperature) for last 40 years • Sub-setting and average operation necessary GG in Data-Intensive Discovery

  16. Issues with data analysis (con’t) • Slightly different form of input for different analysis/tool • Diversity of analysis tools • NCL (NCAR Command Language) • NCO (NetCDF operators • Feret • IDL • Matlab • Cascade of intermediate products “Beware the data cascade, forever your disk will it fill.” -Yoda- GG in Data-Intensive Discovery

  17. Impact of self describing format • All variables [~300 varaibles]for single timestep • + Matches raw output from model • - Does not match needs of analysis tools • One variable for single timestep • + Matches analysis tool needs • + Conceptually simple • - Replication of self describing metadata • - larger number of files…[~75M files] • One variable multiple timesteps • + Matches analysis tool needs • - Lose of generality GG in Data-Intensive Discovery

  18. Parallelizing Analysis • Parallelizing analysis of atmospheric data • Goal: • Speedup analysis of high-resolution data • Analysis no slower then low resolution • AMWG diagnostic package • NCO operators [calculating statistics] • NCL ploting • Parallelization using Swift [T. Sines] GG in Data-Intensive Discovery

  19. AMWG Diagnostics Climate Model (Community Earth System Model) • Improved Computing = increasing amounts of data • Data generation multi-threaded • Data analysis single-threaded GG in Data-Intensive Discovery

  20. Swift • Workflow system from U of Chicago (http://www.ci.uchicago.edu/swift/index.php) • Allows parallel execution of scripts • Relatively easy to use • Dependency driven • Variety of jobs submission options GG in Data-Intensive Discovery

  21. Issues with Swift • Full generality generates “excessive” copying of data • Climate workflow has small number of flops relative to input/output data size • Copy-in data: 15 sec • Calculate: 60 sec • Copy-out data: 15 sec • Used “un-published” option to perform direct-IO [0-copy] • Unresolved issue with deleting out of scope ‘files’ GG in Data-Intensive Discovery

  22. Swift workflow on Dash (8 cores) GG in Data-Intensive Discovery

  23. Data analysis Platforms • Twister@NCAR: • 8 core, Intel box, GPFS filesystem • Dash@SDSC: • Compute nodes with SSD disk • vSMP node with SSD + ScaleMP • GPFS-WAN • Nautilus@NICS • 1024 core SGI Ultra Violet • GPFS filesystem GG in Data-Intensive Discovery

  24. Execution time for AMWG diagnostics Nice/Interesting result Not so interesting result GG in Data-Intensive Discovery

  25. Performance Issues with AMWG diagnostic • Twister@NCAR: • No specialized hardware • Tiny system • Dash@SDSC: • vSMP system has network driver problem • 7.7x too slower compute node vsvSMP • Nautilus@NICS: • Apparently excessive Java exec overhead GG in Data-Intensive Discovery

  26. Conclusions • PetaApps project [Interactive Ensembles] • 155 year CCSM control run • 0.5° ATM & LND + 0.1° OCN ICE • 0.5 to 1 TB/day of data to analysis • Increased ability to generated data creates analysis bottleneck core • Encouraging results for parallel workflow with Swift • No transformative benefit yet for expensive hardware ! • Overnight • After coffee break • Nearly Instant • No solution yet to data volume growth GG in Data-Intensive Discovery

  27. NCAR: D. Bailey F. Bryan T. Craig B. Eaton J. Edwards [IBM] N. Hearn K. Lindsay N. Norton M. Vertenstein COLA: J. Kinter C. Stan U. Miami B. Kirtman U.C. Berkeley W. Collins K. Yelick (NERSC) U. Washington C. Bitz Grant Support: DOE DE-FC03-97ER62402 [SciDAC] DE-PS02-07ER07-06 [SciDAC] NSF Cooperative Grant NSF01 OCI-0749206 [PetaApps] CNS-0421498 CNS-0420873 CNS-0420985 Computer Allocations: TeraGrid TRAC @ NICS DOE INCITE @ NERSC LLNL Grand Challenge Thanks for Assistance: Cray, NICS, and NERSC Acknowledgements • NICS: • M. Fahey • P. Kovatch • ANL: • R. Jacob • R. Loy • LANL: • E. Hunke • P. Jones • M. Maltrud • LLNL • D. Bader • D. Ivanova • J. McClean (Scripps) • A. Mirin • ORNL: • P. Worley and many more… GG in Data-Intensive Discovery

  28. Questions dennis@ucar.edu GG in Data-Intensive Discovery

More Related