1 / 42

Terascaling Applications on HPCx: The First 12 Months

Terascaling Applications on HPCx: The First 12 Months. Mike Ashworth HPCx Terascaling Team HPCx Service CCLRC Daresbury Laboratory UK m.ashworth@dl.ac.uk http://www.hpcx.ac.uk/. Outline. Terascaling Objectives Case Studies DL-POLY CRYSTAL CASTEP AMBER PFARM PCHAN POLCOMS

henrik
Download Presentation

Terascaling Applications on HPCx: The First 12 Months

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Terascaling Applications on HPCx: The First 12 Months Mike Ashworth HPCx Terascaling Team HPCx Service CCLRC Daresbury Laboratory UK m.ashworth@dl.ac.uk http://www.hpcx.ac.uk/

  2. Outline • Terascaling Objectives • Case Studies • DL-POLY • CRYSTAL • CASTEP • AMBER • PFARM • PCHAN • POLCOMS • Efficiency of Codes • Summary Application, and not H/W driven HPCx Annual Seminar

  3. Terascaling Objectives HPCx Annual Seminar

  4. Terascaling Objectives Jobs which use >= 50% of cpus • The primary aim of the HPCx service is Capability Computing • Key objective that user codes should scale to O(1000) cpus • Largest part of our science support is the Terascaling Team • Understanding performance and scaling of key codes • Enabling world-leading calculations (demonstrators) • Closely linked with Software Engineering Team and Applications Support Team HPCx Annual Seminar

  5. Strategy for Capability Computing • Performance Attributes of Key Applications Trouble-shooting with Vampir & Paraver • Scalability of Numerical Algorithms Parallel eigensolvers. FFTs etc • Optimisation of Communication Collectives e.g., MPI_ALLTOALLV and CASTEP • New Techniques Mixed-mode programming • Memory-driven Approaches e.g., “In-core” SCF & DFT, direct minimisation & CRYSTAL • Migration from replicated to distributed data e.g., DL_POLY3 • Scientific drivers amenable to Capability Computing - Enhanced Sampling Methods, Replica Methods HPCx Terascaling Team HPCx Annual Seminar

  6. Case Studies HPCx Annual Seminar

  7. Molecular Simulation DL_POLY W. Smith and T.R. Forester, CLRC Daresbury Laboratory • General purpose molecular dynamics simulation package http://www.cse.clrc.ac.uk/msi/software/DL_POLY/ HPCx Annual Seminar

  8. DL_POLY3 Coulomb Energy Performance • Distributed Data • SPME, with revised FFT Scheme Performance Relative to the Cray T3E/1200E DL_POLY3 216,000 ions, 200 time steps, Cutoff=12Å Number of CPUs HPCx Annual Seminar

  9. DL_POLY3 Macromolecular Simulations Gramicidin in water; rigid bonds + SHAKE: 792,960 ions, 50 time steps Measured Time (seconds) Performance Relative to the SGI Origin 3800/R14k-500 Number of CPUs Number of CPUs HPCx Annual Seminar

  10. Materials Science CRYSTAL • calculate wave-functions and properties of crystalline systems • periodic Hartree-Fock or density functional Kohn-Sham Hamiltonian • various hybrid approximations http://www.cse.clrc.ac.uk/cmg/CRYSTAL/ HPCx Annual Seminar

  11. Crystal • Electronic structure and related properties of periodic systems • All electron, local Gaussian basis set, DFT and Hartree-Fock • Under continuous development since 1974 • Distributed to over 500 sites world wide • Developed jointly by Daresbury and the University of Turin HPCx Annual Seminar

  12. Crystal Functionality • Basis Set • LCAO - Gaussians • All electron or pseudopotential • Hamiltonian • Hartree-Fock (UHF, RHF) • DFT (LSDA, GGA) • Hybrid funcs (B3LYP) • Techniques • Replicated data parallel • Distributed data parallel • Forces • Structural optimization • Direct SCF • Visualisation • AVS GUI (DLV) Properties Energy Structure Vibrations (phonons) Elastic tensor Ferroelectric polarisation Piezoelectric constants X-ray structure factors Density of States / Bands Charge/Spin Densities Magnetic Coupling Electrostatics (V, E, EFG classical) Fermi contact (NMR) EMD (Compton, e-2e) HPCx Annual Seminar

  13. Benchmark Runs on Crambin • Very small protein from Crambe Abyssinica - 1284 atoms per unit cell • Initial studies using STO3G (3948 basis functions) • Improved to 6-31G * * (12354 functions) • All calculations Hartree-Fock • As far as we know the largest Hartree-Fock calculation ever converged HPCx Annual Seminar

  14. Scalability of CRYSTAL for crystalline Crambin HPCx vs. SGI Origin faster, more stable version of the parallel Jacobi diagonalizer replaces ScaLaPack Increasing the basis set size increases the scalability HPCx Annual Seminar

  15. Crambin Results – Electrostatic Potential • Charge density isosurface coloured according to potential • Useful to determine possible chemically active groups HPCx Annual Seminar

  16. Futures - Rusticyanin • Rusticyanin (Thiobacillus Ferrooxidans) has 6284 atoms (Crambin was 1284) and is involved in redox processes • We have just started calculations using over 33000 basis functions • In collaboration with S.Hasnain (DL) we want to calculate redox potentials for rusticyanin and associated mutants HPCx Annual Seminar

  17. Materials Science CASTEP CAmbridge Serial Total Energy Package http://www.cse.clrc.ac.uk/cmg/NETWORKS/UKCP/ HPCx Annual Seminar

  18. What is Castep? • First principles (DFT) materials simulation code • electronic energy • geometry optimization • surface interactions • vibrational spectra • materials under pressure, chemical reactions • molecular dynamics • Method (direct minimization) • plane wave expansion of valence electrons • pseudopotentials for core electrons HPCx Annual Seminar

  19. Castep 2003 HPCx performance gain Bottleneck: • Data Traffic in 3D FFT and MPI_AlltoAllV HPCx Annual Seminar

  20. Castep 2003 HPCx performance gain HPCx Annual Seminar

  21. Molecular Simulation AMBER (Assisted Model Building with Energy Refinement) Weiner and Kollman, University of California, 1981 • Widely used suite of programs particularly for biomolecules http://amber.scripps.edu/ HPCx Annual Seminar

  22. AMBER - Initial Scaling • Factor IX protein with Ca++ ions – 90906 atoms HPCx Annual Seminar

  23. Current developments - AMBER • Bob Duke • Developed a new version of Sander on HPCx • Originally called AMD (Amber Molecular Dynamics) • Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics) • Substantial rewrite of the code • Converted to Fortran90, removed multiple copies of routines,… • Likely to be incorporated into AMBER8 • We are looking at optimising the collective communications – the reduction / scatter HPCx Annual Seminar

  24. Optimisation – PMEMD HPCx Annual Seminar

  25. Atomic and Molecular Physics PFARM Queen’s University Belfast, CLRC Daresbury Laboratory • R-matrix formalism to treat applications such as the description of the edge region in Tokamak plasmas (fusion power research) and for the interpretation of astrophysical spectra HPCx Annual Seminar

  26. Peigs vs. ScaLapack in PFARM Bottleneck: Matrix Diagonalisation HPCx Annual Seminar

  27. ScaLapack diagonalisation on HPCx HPCx Annual Seminar

  28. Stage 1 (Sector Diags) on HPCx • Sector Hamiltonian matrix size 10032 (x 3 sectors) HPCx Annual Seminar

  29. Computational Engineering UK Turbulence Consortium Led by Prof. Neil Sandham, University of Southampton • Focus on compute-intensive methods (Direct Numerical Simulation, Large Eddy Simulation, etc) for the simulation of turbulent flows • Shock boundary layer interaction modelling - critical for accurate aerodynamic design but still poorly understood http://www.afm.ses.soton.ac.uk/ HPCx Annual Seminar

  30. 40.0 30.0 Performance (million iteration points/sec) 20.0 IBM Regatta (ORNL) Cray T3E/1200E IBM Regatta (HPCx) 10.0 Scaled from 128 CPUs 0.0 0 128 256 384 512 640 768 896 1024 Number of processors Direct Numerical Simulation: 3603 benchmark HPCx Annual Seminar

  31. Environmental Science Proudman Oceanographic Laboratory Coastal Ocean Modelling System (POLCOMS) • Coupled marine ecosystem modelling http://www.pol.ac.uk/home/research/polcoms/ HPCx Annual Seminar

  32. Irradiation Heat Flux Cloud Cover Pelagic Ecosystem Model Wind Stress River Inputs oC C, N, P, Si Sediments Open Boundary oC Benthic Model Physical Model Coupled Marine Ecosystem Model HPCx Annual Seminar

  33. POLCOMS resolution b/m : HPCx HPCx Annual Seminar

  34. POLCOMS 2 km b/m : All systems HPCx Annual Seminar

  35. Efficiency of Codes HPCx Annual Seminar

  36. Motivation and Strategy • Scalability of Terascale applications is only half the story • Absolute performance also depends on single cpu performance • Percentage of peak is seen as an important measure • Comparison with other systems e.g. vector machines • Run representative test cases on small numbers of processors for applications and some important kernels • Use IBM’s hpmlib to measure Mflop/s • Other hpmlib counters can help to understand performance e.g. memory bandwidth, cache miss rates, FMA count, computational intensity etc. Scientific output is the key measure HPCx Annual Seminar

  37. Matrix-matrix multiply kernel HPCx Annual Seminar

  38. PCHAN small test case 1203 HPCx Annual Seminar

  39. Summary of percentage of peak HPCx Annual Seminar

  40. Acknowledgements • Adrian Jackson • Chris Johnson • Martin Plummer • Gavin Pringle • Lorna Smith • Kevin Stratford • Andrew Sunderland • HPCx Terascaling Team • Mike Ashworth • Mark Bull • Ian Bush • Martyn Guest • Joachim Hein • David Henty • IBM Technical Support • Luigi Brochard et al. • CSAR Computing Service Cray T3E ‘turing’, Origin 3800 R12k-400 ‘green’ • ORNL IBM Regatta ‘cheetah’ • SARA Origin 3800 R14k-500 • PSC AlphaServer SC ES45-1000 HPCx Annual Seminar

  41. The Reality of Capability Computing on HPCx • The success of the Terascaling strategy is shown by the Nov 2003 HPCx usage • Capability jobs (512+ procs) account for 48% of usage • Even without Teragyroid it is 40.7% HPCx Annual Seminar

  42. Summary • HPCx Terascaling team is addressing scalability for a wide range of codes • Key Strategic Applications Areas • Atomic and Molecular Physics, Molecular Simulation, Materials Science, Computational Engineering, Environmental Science • Reflected by take up of Capability Computing on HPCx • In Nov ’03, >40% of time used by jobs with 512 procs and greater • Key challenges • Maintain progress with Terascaling • Include new applications and new science areas • Address efficiency issues esp. with single processor performance • Fully exploit the phase 2 system: 1.7 GHz p690+, 32 proc partitions, Federation interconnect HPCx Annual Seminar

More Related