1 / 12

Petascale

Petascale. LLNL Appro AMD: 9K processors [today] TJ Watson Blue Gene/L: 40K processors [today] NY Blue Gene/L: 32K processors ORNL Cray XT3/4 : 44K processors [Jan 2008] TACC Sun : 55K processors [Jan 2008] ANL Blue Gene/P : 160K processors [Jan 2008]. CCSM and Component Models.

milt
Download Presentation

Petascale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Petascale • LLNL Appro AMD: 9K processors [today] • TJ Watson Blue Gene/L: 40K processors [today] • NY Blue Gene/L: 32K processors • ORNL Cray XT3/4 : • 44K processors [Jan 2008] • TACC Sun : 55K processors [Jan 2008] • ANL Blue Gene/P : 160K processors [Jan 2008]

  2. CCSM and Component Models • POP (Ocean) • CICE (Sea Ice) • CLM (Land Model) • CPL (Coupler) • CAM (Atmosphere) • CCSM

  3. Status of POP (John Dennis) • 17K Cray XT4 processors [12.5 years/day] • 29K IBM Blue Gene/L [8.5 years/day] (BG Ready in Expedition Mode) Parallel I/O [Underway] Land causes load imbalance at 0.1 degree resolutions

  4. Status of CAM (John Dennis) • CAM HOMME In Expedition Mode • Standard CAM “may be” run at 1 degree resolution or slightly higher on BG

  5. 1/2 1/3 1/4 Simulation rate for HOMME:Held-Suarez

  6. CAM & CCSM BG/L Expedition not from climate scientistsParallel I/O is the biggest bottleneck

  7. Cloud Resolving Models/LES • Active Tracer High-resolution Atmospheric Model (ATHAM): • modularized • parallel-ready (MPI) • Goddard Cloud Ensemble Model (GCE): • well-established ( 70s- present) • parallel-ready (MPI) • scales linearly (99% up to 256 tasks) • comprehensive

  8. Implementations • Been done(NERSC IBM SP, GFSC): • ATHAM: 2D & 3D bulk cloud physics • GCE: 3D bulk cloud physics 2D size-bins cloud physics • Being & to be done(Blue Gene): • GCE(ATHAM): 3D size-bins cloud physics larger domain longer simulation period finer resolution …

  9. From: John Michalakes, NCAR

  10. Single version of code for efficient execution on: Distributed-memory Shared-memory Clusters of SMPs Vector and microprocessors Parallelism in WRF: Multi-level Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication Model domains are decomposed for parallelism on two-levels • Patch: section of model domain allocated to a distributed memory node • Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. • Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches • Slide Courtesy: NCAR

  11. NCAR WRF Issues With Bluegene/L(from John Michalakes) • Relatively slow I/O • Limited memory per node • Relatively poor processor performance • “Lots of of little gotchas mostly related to immaturity, especially in the programming environment.”

More Related