NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory OAMarques@lbl.gov. What is the ACTS Collection?. http://acts.nersc.gov. A dvanced C ompu T ational S oftware Collection Tools for developing parallel applications
Workshops and Training
The development of complex simulation codes on high-end computers is not a trivial task
Time to the first solution (prototype)
Time to solution (production)
Increasingly sophisticated models
Increasingly complex algorithms
Increasingly complex architectures
Increasingly demanding applications
Libraries written in different languages
Discussions about standardizing interfaces are often sidetracked into implementation issues
Difficulties managing multiple libraries developed by third-parties
Need to use more than one language in one application
The code is long-lived and different pieces evolve at different rates
Swapping competing implementations of the same idea and testing without modifying the code
Need to compose an application with some other(s) that were not originally designed to be combinedChallenges in the Development of Scientific Codes
To be installed
To be installed
To be installed
To be installed
* Also in LibSci
Linear System Interfaces
CALL BLACS_GET( -1, 0, ICTXT )
CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO )
3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of Kaushik and Keyes).
Model of the heart mechanics (blood-muscle-valve) by an adaptive and parallel version of the immersed boundary method, using PETSc, Hypre and SAMRAI (courtesy of Boyce Griffith, New York University).
Micro-FE bone modeling using ParMetis, Prometheus and PETSc; models up to 537 million DOF (Adams, Bayraktar, Keaveny, and Papadopoulos).
Molecular dynamics and thermal flow simulation using codes based on Global Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid, etc.
Electronic structure optimization performed with TAO, (UO2)3(CO3)6
(courtesy of deJong).
Problems (different grid types) solved with Hypre.
Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.
Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon, Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.
Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution of generalized eigenvalue problems.A parallel exact shift-invert eigensolver based on PARPACK and SuperLUhas allowed for the solution of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.
OPT++is used in protein energy minimization problems (shown here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)
UTK, UCB …
Version 1.7.5 released in January 2007; NSF funding for further development.
Linear systems, least squares, singular value decomposition, eigenvalues.
Communication routines targeting linear algebra operations.
Clarity,modularity, performance and portability. Atlas can be used here for automatic tuning.
Communication layer (message passing).
Execution time of PDGESV for various grid shape
LU on 2.2 GHz AMD Opteron (4.4 GFlop/s peak performance)
60 processors, Dual AMD Opteron 1.4GHz Cluster with Myrinet Interconnect, 2GB memory
PETSc PDE Application Codes
Preconditioners + Krylov Methods
Matrices, Vectors, Indices
Computation and Communication Kernels
MPI, MPI-IO, BLAS, LAPACK
Portable, Extensible Toolkit for Scientific computation
Linear Solvers (SLES)
Ax = b
Evaluation of A and b
Tuning and Analysis Utilities
(inclusive and exclusive time, number of calls, hardware statistics, etc)
(functions, loops, basic blocks, user-defined “semantic” entities)
set the C compiler
PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact email@example.com for the corresponding TAU events). In this example we are just measuring FLOPS.
PESCAN is a code that uses the folded spectrum method for nonselfconsistent nanoscale calculations. It uses a planewave basis, and conventional Kleinman-Bylander nonlocal pseudopotetials in real space. It is parallelized using MPI and can calculate million atom systems.
machine tuned and dependent modules
may or may not need re-rewriting
difficult to predict
minimal to extensive rewriting
New or extended Physics:
extensive re-rewriting or increased overhead