1 / 21

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006. Architectural Comparison. NERSC 5 Application Benchmarks. CAM3 Climate model, NCAR GAMESS Computational chemistry, Iowa State, Ames Lab GTC Fusion, PPPL

pascha
Download Presentation

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006

  2. Architectural Comparison

  3. NERSC 5 Application Benchmarks • CAM3 • Climate model, NCAR • GAMESS • Computational chemistry, Iowa State, Ames Lab • GTC • Fusion, PPPL • MADbench • Astrophysics (CMB analysis), LBL • Milc • QCD, multi-site collaboration • Paratec • Materials science,developed LBL and UC Berkeley • PMEMD • Computational chemistry, University of North Carolina-Chapel Hill

  4. Application Summary

  5. CAM3 • Community Atmospheric Model version 3 • Developed at NCAR with substantial DOE input, both scientific and software. • The atmosphere model for CCSM, the coupled climate system model. • Also the most timing consuming part of CCSM. • Widely used by both American and foreign scientists for climate research. • For example, Carbon, bio-geochemistry models are built upon  (integrated with) CAM3. • IPCC predictions use CAM3 (in part) • About 230,000 lines codes in Fortran 90. • 1D Decomposition, runs up to 128 processors at T85 resolution (150Km) • 2D Decomposition, runs up to 1680 processors at 0.5 deg (60Km) resolution.

  6. CAM3: Performance

  7. GAMESS • Computational chemistry application • Variety of electronic structure algorithms available • About 550,000 lines of Fortran 90 • Communication layer makes use of highly optimized vendor libraries • Many methods available within the code • Benchmarks are DFT energy and gradient calculation, MP2 energy and gradient calculation • Many computational chemistry studies rely on these techniques • Exactly the same as DOD HPCMP TI-06 GAMESS benchmark • Vendors will only have to do the work once

  8. GAMESS: Performance • Small case: large, messy, low computational-intensity kernels problematic for compilers • Large case depends on asynchronous messaging

  9. GTC • Gyrokinetic Toroidal Code • Important code for Fusion SciDAC Project and for the International Fusion collaboration ITER. • Transport of thermal energy via plasma microturbulence using particle-in-cell approach (PIC) 3D visualization of electrostatic potential in magnetic fusion device

  10. GTC: Performance • SX8 highest raw performance (ever) but lower efficiency than ES • Scalar architectures suffer from low computational intensity, irregular data access, and register spilling • Opteron/IB is 50% faster than Itanium2/Quadrics and only 1/2 speed of X1 • Opteron: on-chip memory controller and caching of FP L1 data • X1 suffers from overhead of scalar code portions

  11. MADbench • Cosmic microwave background radiation analysis tool (MADCAP) • Used large amount of time in FY04 and one of the highest scaling codes at NERSC • MADBench is a benchmark version of the original code • Designed to be easily run with synthetic data for portability. • Used in a recent study in conjunction with Berkeley Institute for Performance Studies (BIPS). • Written in C making extensive use of ScaLAPACK libraries • Has extensive I/O requirements

  12. MADbench: Performance • Dominated by • Blas3 • I/O

  13. MILC • Quantum ChromoDynamics application • Widespread community use, large allocation • Easy to build, no dependencies, standards conforming • Can be setup to run on wide-range of concurrency • Conjugate gradient algorithm • Physics on a 4D lattice • Local computations are 3x3 complex matrix multiplies, with sparse (indirect) access pattern

  14. MILC: Performance

  15. PARATEC • Parallel Total Energy Code • Plane Wave DFT using custom 3D FFT • 70% of Materials Science Computation at NERSC is done via Plane Wave DFT codes. PARATEC capture the performance of a wide range of codes (VASP, CPMD, PETOT).

  16. PARATEC: Performance • All architectures generally perform well due to computational intensity of code (BLAS3, FFT) • SX8 achieves highest per-processor performance • X1/X1E shows lowest % of peak • Non-vectorizable code much more expensive on X1/X1E (32:1) • Lower bisection bandwidth to computational ratio (4D-hypercube) • X1 Performance is comparable to Itanium2 • Itanium2 outperforms Opteron because • Paratec less sensitive to memory access issues (BLAS3) • Opteron lacks FMA unit • Quadrics shows better scaling of all-to-all at large concurrencies

  17. PMEMD • Particle Mesh Ewald Molecular Dynamics • A F90 code with advanced MPI coding should test compiler and stress asynchronous point to point messaging. • PMEMD is very similar to the MD Engine in AMBER 8.0 used in both chemistry and biosciences • Test system is a 91K atom blood coagulation protein

  18. PMEMD: Performance

  19. Summary

  20. Summary

  21. Summary • Average ratio bassi to seaborg is 6.0 for N5 application benchmarks

More Related