Loading in 2 Seconds...

The ACTS Toolkit ( Lecture Notes: What can it do for you? )

Loading in 2 Seconds...

- By
**arden** - Follow User

- 96 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'The ACTS Toolkit ( Lecture Notes: What can it do for you? )' - arden

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### The ACTS Toolkit(Lecture Notes: What can it do for you?)

Tony Drummond and Osni Marques

Lawrence Berkeley National Laboratory (LBNL)

National Energy Research Scientific Computing Center (NERSC)

acts-support@nersc.gov

What is the ACTS Toolkit?

- Advanced Computational Testing and Simulation
- Tools for development of parallel applications
- 21 tools
- developed (primarily) at DOE labs
- originally conceived as autonomous tools
- ACTS is an “umbrella” project
- collect tools
- leverage numerous independently funded projects

NPACI Parallel Computing Institute

Recent Successful Cases

Scattering in a quantum system of three charged particles (Rescigno, Baertschy, Isaacs and McCurdy, Dec. 24, 1999).

Cosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000).

NPACI Parallel Computing Institute

NERSC Activities

- Make ACTS tools available on NERSC platforms
- Provide technical support (acts-support@nersc.gov)
- Perform independent evaluation of tools
- Maintain online ACTS information center
- Identify new users who can benefit from toolkit
- Work with users to integrate tools into applications

http://acts.nersc.gov

NPACI Parallel Computing Institute

ACTS Support

- Support at different levels
- applications
- code optimization
- tool selection
- tool utilization
- tool installation
- Leverage with developers
- Minimize risk to users

NPACI Parallel Computing Institute

Tools Categorization

- Numerical
- software that implements numerical algorithms
- Structural (“frameworks”)
- software that manages data, communication
- Infra-structural
- runtime, support tools, developer’s bag

NPACI Parallel Computing Institute

Numerical Tools

- Aztec: iterative methods for solving sparse linear systems
- Hypre: collection of advanced preconditioners
- Opt++: solution of nonlinear optimization problems
- PETSc: methods for the solution of PDE and ODE related problems
- PVODE: solvers for large systems of ODE’s
- ScaLAPACK: dense linear algebra computations
- SuperLU: direct methods for sparse linear systems

NPACI Parallel Computing Institute

Structural (Frameworks)

- Global Arrays: portable, distributed array library, shared memory style of programming
- Overture: library of grid functions which derives from P++ arrays
- POET (Parallel Object-oriented Environment and Toolkit): allows for “mixing and matching” of components
- POOMA (Parallel Object-Oriented Methods and Applications): C++ abstraction layer between algorithm and platform (similar to HPF)

NPACI Parallel Computing Institute

Infra-structural

- CUMULVS (Collaborative User Migration User Library for Visualization and Steering), PAWS (Parallel Application WorkSpace): computational steering, data post-processing, interactive visualization
- Globus: infrastructure for high performance distributed computing (computational grids)
- SILOON (Scripting Interface Languages for Object-Oriented Numerics): scripting features
- TAU (Tuning and Analysis Utilities): advanced performance analysis and tuning

NPACI Parallel Computing Institute

Infra-structural (cont.)

- Tulip: C++ applications with threads, global pointers and other parallel operations
- ATLAS (Automatically Tuned Linear Algebra Software), PHiPAC (Portable High Performance ANSI C): automatic generation of optimized numerical software (mainly BLAS)
- Nexus: multithreading, communication and resource management facilities
- PADRE (Parallel Asynchronous Data and Routing Engine) : abstracts the details of representing and managing distributed data
- PETE (Portable Expression Template Engine): efficient C++ operator overloading through expression templates

NPACI Parallel Computing Institute

Numerical Tools

- Aztec: iterative methods for solving sparse linear systems
- Hypre: collection of advanced preconditioners
- Opt++: solution of nonlinear optimization problems
- PETSc: methods for the solution of PDE related problems
- PVODE: solvers for large systems of ODE’s
- ScaLAPACK: dense linear algebra computations
- SuperLU: direct methods for sparse linear systems

NPACI Parallel Computing Institute

Aztec

- Solves large sparse linear systems of equations of the form:

Ax = b

Such as those which arise from applications which model complex physics problems using differential equations (e.g. finite differences or finite element methods)

NPACI Parallel Computing Institute

Aztec

Implements Krylov iterative methods (CG, CGS, Bi-CG-Stab, GMRES, TFQMR)

Suite of preconditioners (Jacobi, Gauss-Seidel, overlapping domain decomposition with sparse LU, ILU, BILU within domains)

Highly efficient, scalable (1000 processors on the “ASCI Red” machine)

NPACI Parallel Computing Institute

Aztec (applications)

TOUGH2 (Transport Of Unsaturated Groundwater and Heat) code, transport simu-lation in porous and fractured media (LBNL).

Co-flowing Annular Jet Combuster, a parallel 3D pseudo-transient simulation to steady state operation; MPSalsa code (SNL).

NPACI Parallel Computing Institute

Aztec (basic steps)

- Prepare your linear system
- distribute the matrix
- call AZ_transform
- set up right-hand side and initial guess
- call AZ_reorder_vec on initial guess and right-hand side
- selective an iterative solver and a preconditioner
- call AZ_solve
- call AZ_invorder_vec on solution

NPACI Parallel Computing Institute

PETSc

- Portable, Extensible Toolkit for Scientific Computing
- What can it do?:
- Support the development of parallel PDE solvers
- Implicit or semi-implicit solution methods, finite element, finite difference, or finite volume type discretizations.
- Specification of the mathematics of the problem
- Vectors (field variables) and matrices (operators)
- How to solve the problem?
- Linear, non-linear, and timestepping (ODE) solvers

NPACI Parallel Computing Institute

PETSc

- Parallelism
- Uses MPI
- Data Layout: structure and unstructured meshes
- Partitioning and coloring
- Viewers
- Printing Data Object information
- Visualization of a field and matrix data
- Profiling and performance Tuning
- -log_summary
- Profiling by stages of an application
- User define events

NPACI Parallel Computing Institute

PETSc

NPACI Parallel Computing Institute

Argc and Argv are used to passed

Run time commands to PETSc and MPI

PETSc (Simple example)/*

From: http://www.mcs.anl.gov/petsc/src/sys/examples/tutorials/ex1.c

*/

/* Program usage: mpirun ex1 [-help] [all PETSc options] */

static char help[] = "This is an introductory PETSc example that illustrates printing.\n\n";

/*

Concepts: Introduction to PETSc;

Routines: PetscInitialize(); PetscPrintf(); PetscFinalize();

Processors: n

*/

#include "petsc.h"

int main(int argc,char **argv)

NPACI Parallel Computing Institute

Every PETSc program should begin with the PetscInitializeroutine.

PETSc (Simple example cont. .){

int ierr,rank,size;

ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRA(ierr);

/*

The following MPI calls return the number of processes

being used and the rank of this process in the group.

*/

ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRA(ierr);

ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRA(ierr);

NPACI Parallel Computing Institute

Prints multiple message

A program must always

end with PetscFinalize

PETSc (Simple example cont. .)ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRA(ierr);

ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRA(ierr);

ierr = PetscPrintf(PETSC_COMM_WORLD,"Number of processors =

%d, rank = %d\n",size,rank);CHKERRA(ierr);

ierr = PetscPrintf(PETSC_COMM_SELF,"[%d] Jumbled Hello

World\n",rank);CHKERRA(ierr);

ierr = PetscFinalize();CHKERRA(ierr);

return 0;

}

NPACI Parallel Computing Institute

PETSc (applications)

Multiphase flow, 4 million cell blocks, 32 million DOF, over 10.6 Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes (Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler).

Prometheus code (unstructured meshes in solid mechanics), 26 million DOF, 640 nodes on NERSC’s Cray T3E (Adams and Demmel).

NPACI Parallel Computing Institute

PETSc’s SLES (Basic Steps)

- Define the linear system (Ax=b)
- MatCreate, MatSetValue, VecCreate
- Create the Solver
- SLESCreate, SLESSetOperators
- Solve System of Equations
- SLESSolve
- Clean up
- SLESDestroy

NPACI Parallel Computing Institute

PETSc’s SNES (Basic Steps)

- Non-linear equations of the form:
- F(x) = 0
- Unconstrained Minimization problems of the form:
- Min{f(x)}
- Create the Solver
- SNESCreate
- Create Matrices and vectors (like Jacobian matrix)
- MatCreate, MatSetValue, VecCreate
- Set evaluation routine and linear solver defaults
- Solve non-linear system : SNESSolve
- Clean up

NPACI Parallel Computing Institute

PETSc’s TS (Basic Steps)

- Consider the ODE u= F(u,t), where u is finite-dimensional vector
- Create a TS object
- TSCreate
- Select a solution Method (Euler, BEULER, PSEUDO)
- Set initial time and timestep
- TSSetTimeStep
- Set the total number of timesteps:
- TSSetDuration
- Set the timestep context
- Clean up

t

NPACI Parallel Computing Institute

ScalaPACK

- A collection of routines for solving:
- Linear systems of equations
- Least squares problems
- Eigenproblems
- Singular problems
- Dense linear algebra (BLAS)
- Direct solution of linear systems
- Dense matrix eigensolvers

NPACI Parallel Computing Institute

ScalaPACK Software Hierarchy

NPACI Parallel Computing Institute

ScalaPACK

- BLAS:
- Common linear algebra computations
- Dot products, matrix-vector multiplication and matrix-matrix multiplication
- Matrix-matrix operations can mask the effects of memory hierarchy (platform specific)
- Portability
- PBLAS
- Interface is very similar to BLAS
- Makes ScaLAPACK codes to be quite similar to LAPACK ones.

NPACI Parallel Computing Institute

ScalaPACK

- BLACS:
- Message Passing designed for linear algebra
- Data layout : 1 or 2 Dimensional grid of processes
- Operations:
- Synchronous send and receives
- Broadcast
- Global reductions
- Process Grouping and multi-membership

NPACI Parallel Computing Institute

ScalaPACK

- ScaLAPACK:
- High Efficiency on MIMD such Intel Paragon, Cray T3E, IBM SP series, clusters of workstations
- Message passing PVM and MPI (heterogeneous environments)
- Efficiency depends on the block partitioning algorithm and vendor supply implementations of BLACS and BLAS

NPACI Parallel Computing Institute

ScalaPACK (simple example)

Example Program solving Ax=b via ScaLAPACK routine PDGESV

* Initialize the process grid

CALL SL_INIT( ICTXT, NPROW, NPCOL )

CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

* Distribute the matrix on the process grid

CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA, INFO )

CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT, MXLLDB, INFO )

NPACI Parallel Computing Institute

ScalaPACK (simple example cont.)

Generate matrices A and B and distribute

CALL MATINIT( A, DESCA, B, DESCB )

Make a copy of A and B for checking purposes

CALL PDLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA )

CALL PDLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB )

Solve the linear system A * x = B

CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO )

NPACI Parallel Computing Institute

ScalaPACK (simple example cont.)

Compute residual ||A * X - B||/( ||X|| * ||A|| * eps * N )

EPS = PDLAMCH( ICTXT, 'Epsilon' )

ANORM = PDLANGE( 'I', N, N, A, 1, 1, DESCA, WORK )

BNORM = PDLANGE( 'I', N, NRHS, B, 1, 1, DESCB, WORK )

CALL PDGEMM( 'N', 'N', N, NRHS, N, ONE, A0, 1, 1, DESCA, B, 1, DESCB, -ONE, B0, 1, 1, DESCB )

XNORM = PDLANGE( 'I', N, NRHS, B0, 1, 1, DESCB, WORK )

RESID = XNORM / ( ANORM*BNORM*EPS*DBLE( N ) )

Release the process grid, free the BLACS context and Exit BLACS

CALL BLACS_GRIDEXIT( ICTXT )

CALL BLACS_EXIT( 0 )

NPACI Parallel Computing Institute

ScaLAPACK (applications)

Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon, Pfrommer and Canning).

Cosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000).

NPACI Parallel Computing Institute

SuperLU

Direct solution of large sparse linear systems

Shared and distributed memory implementations

Attained 8.3 Gflops on 512 nodes of the T3E

NPACI Parallel Computing Institute

SuperLU (applications)

Scattering in a quantum system of three charged particles (Rescigno, Baertschy, Isaacs and McCurdy, Dec. 24, 1999).

SuperLU speedup (matrix dimensions varying from 26028 to 120750).

NPACI Parallel Computing Institute

Structural (Frameworks)

- Global Arrays: portable, distributed array library, shared memory style of programming
- Overture: library of grid functions which derives from P++ arrays
- POET (Parallel Object-oriented Environment and Toolkit): allows for “mixing and matching” of components
- POOMA (Parallel Object-Oriented Methods and Applications): C++ abstraction layer between algorithm and platform (similar to HPF)

NPACI Parallel Computing Institute

Global Arrays

- Programming model is based on an explicit distinction between local and global
- Communication
- Accessing GA Distributed Arrays
- Conventional Message Passing (MPI)
- Support from data transfers between local and remote
- Support for synchronization

NPACI Parallel Computing Institute

Global Arrays

NPACI Parallel Computing Institute

Global Arrays

NPACI Parallel Computing Institute

Global Array Supported Operations

- Implementation dependent primitive operation
- Implementation independent constructs
- Collective primitive operations
- create and destroy an array
- create an array following a provided template
- Synchronize all processes

NPACI Parallel Computing Institute

Global Array Supported Operations

- Non-collective primitive operations
- fetch,store and accumulate into rectangular range of array
- gather and scatter array element
- direct access to local elements of an array
- Linear Algebra Operations
- vector operations (dot product, scale, etc. )
- matrix operations (multiply, eigenvalues, etc)

NPACI Parallel Computing Institute

Global Array (Basic Steps)

- ga_initialize()
- must be the first call before any other ga call
- ga_initialize_ltd(limit)
- memory usage (0 = unlimited)
- collective operation
- ga_terminate()
- delete all active arrays and clean up
- collective operation

NPACI Parallel Computing Institute

Global Array (Example Program)

Program example1

Include ‘mpif.h’

Integer IERR, P, ME

call MPI_Initialize(IERR)

call ga_initialize()

P = ga_nnodes()

ME = ga_nodeid()

write(*,*) ‘I am ‘, ME, ‘number of GA procs = ‘, P

call ga_terminate()

call MPI_Finalize()

stop

end

NPACI Parallel Computing Institute

Infra-structural

- CUMULVS (Collaborative User Migration User Library for Visualization and Steering), PAWS (Parallel Application WorkSpace): computational steering, data post-processing, interactive visualization
- Globus: infrastructure for high performance distributed computing (computational grids)
- SILOON (Scripting Interface Languages for Object-Oriented Numerics): scripting features
- TAU (Tuning and Analysis Utilities): advanced performance analysis and tuning

NPACI Parallel Computing Institute

CUMULVS

NPACI Parallel Computing Institute

CUMULVS COMPONENTS

- Generated at compilation time
- Communicate by invoking the necessary protocols

- Application-side library
- Viewer-library
- Fault recovery daemon
- Check Point Daemon (CPD) per host
- CPD’s can manage task migration
- CPD’s monitor other CPD’s for failure and recovery
- CPD’s coordinate the redundancy of checkpoint data

NPACI Parallel Computing Institute

CUMULVS VIEWERS

- Front end programs attached to a distributed applications.
- Different views of the same data, different data (sub-regions)
- Viewers can use any graphical system for rendering data fields views (AVS, Tcl/TK, virtual reality interface, or customized interface)

NPACI Parallel Computing Institute

CUMULVS

NPACI Parallel Computing Institute

CUMULVS (Basic Steps)

- Setup input parameters to be steered
- Specify the nature and decomposition of the data fields to be visualized. Standard data decompositions: Block, Block-Cyclic, particle decompositions and User defined decompositions
- Use existing interfacing to visualization packages or define a custom viewer on top of other visualization tools.
- Setup checkpoint/restart mechanism

NPACI Parallel Computing Institute

CUMULVS Instrumentation

- Initialization stv_init() or call stvfinit()
- Data Fields (for visualization and checkpoints)
- Data Distribution: dimension, decomposition, processor layout
- Local Allocation: Name, type, offsets
- Steering parameters
- Name, type, reference

NPACI Parallel Computing Institute

Globus

- Motivation for the GRID
- Dependable, consistent, pervasive access to (high-end) resources
- Online instrumentation
- Collaborative engineering
- Parameter studies
- Browsing of remote datasets
- Use of remote software
- Data-intensive computing
- Very large-scale simulations

NPACI Parallel Computing Institute

Globus

NPACI Parallel Computing Institute

Globus

- High throughput computing
- Schedule many tasks
- Issues:
- Resource discovery
- Data Access
- Scheduling
- Reservation
- Security
- Accounting
- Code management

NPACI Parallel Computing Institute

Globus - Conceptual Evolution

- Metacomputing: late ‘80s
- Focus on distributed computations
- Gigabit testbeds: early 90’s
- Research, primarily on networking
- I-WAY: 1995
- Demonstration of application feasibility
- NSF NPACI (National Technology Grid): 1998
- NASA Information Power Grid: 1999
- DOE ASCI DISCOM DRM: 1999
- European Grid: 2000

NPACI Parallel Computing Institute

Globus Approach

- A toolkit and collection of services addressing key technical problems
- Modular “bag of services” model
- Not a vertical integrated solution
- General infrastructure tools (aka middleware) that can be applied to many application domains
- Inter-domain issues, rather than clustering
- Integration of intra-domain solutions
- Distinguish between local and global services

NPACI Parallel Computing Institute

Globus Toolkit Grid services

- Security (GSI)
- Resource management (GRAM)
- Information Services (MDS)
- Remote file management (GASS)
- Communication (I/O, Nexus)
- Process Monitoring (HBM)

NPACI Parallel Computing Institute

Globus Applications

NASA’s Information Power Grid (IPG)

joins supercomputers and storage devices owned by participating organizations into a single, seamless computing environment.

With a Globus-enabled X-ray microto-mography program a fully 3-D reconstruction of an ant head (2mm in diameter) was obtained in less than 10 minutes using the acquisition hardware at a beamline at ANL and an SGI Origin 2000 at NPACI.

NPACI Parallel Computing Institute

TAU

Profiling of Fortran 90, C, C++, HPF, and HPC++ codes

Detailed information (much more than prof/gprof)

C++: per-class and per-instance profiling

Graphical display of profiling results (built-in viewers, interface to Vampir)

NPACI Parallel Computing Institute

TAU (Basic Steps)

- Instrument User’s code
- TAU_PROFILE(<args>)
- Link with TAU library
- g++ <link flags> -L $TAU_HOME/tau/lib -ltau
- Run code
- Generates profiling or tracing data
- Use viewers: RACY or VAMPIR

NPACI Parallel Computing Institute

TAU (Main Control Window)

- COSY: COmpile manager Status displaY
- FANCY: File ANd Class displaY
- CAGEY: CAll Graph Extended displaY
- CLASSY: CLASS hierarchY browser
- RACY: Routine and data ACcess profile displaY
- SPEEDY: Speedup and Parallel Execution Extrapolation DisplaY

NPACI Parallel Computing Institute

TAU (SPEEDY)

NPACI Parallel Computing Institute

Future Directions

- CCA (Common Component Architecture)
- Developing standardized ways of managing numerical components to allow mixing-and-matching
- Frameworks for gluing components together
- Similarities to CORBA, DCOM, Java Beans
- Scientific interface description language (allowing Fortran)
- ESI (Equation Solver Interface)
- Developing standardized interfaces for scalable linear solvers
- Specific test case for CCA component design

NPACI Parallel Computing Institute

acts-support@nersc.gov

Download Presentation

Connecting to Server..