CS 591x – Cluster Computing and Programming Parallel Computers

CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries • Recall that so far we have been – • Breaking up (decomposing) our “large” problems into smaller pieces… • Distributing the pieces of the problem to multiple processors • Explicitly moving data among processes through message passing

Parallel Libraries • Note that – • Large scientific and engineering problems often represent data in matrices and vectors • Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems

Parallel Libraries • MPI is designed to support the development of libraries • Consequently, there are a number of libraries, based on MPI, used to develop parallel software • Some libraries take care of much, or all of the parallelization • That means….

Parallel Libraries • … You don’t have to… • … but you still can… • … if you want • … sometimes…

Parallel Libraries • ScaLAPACK • Scalable Linear Algebra PACKage • PETSc • Portable, Extensible Toolkit for Scientific Computation

ScaLaPACK • Built on LAPACK – Linear Algebra Package • Powerful • Widely used in scientific and engineering computing • not scalable to distributed memory parallel computers • LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library

ScaLAPACK • uses PBLAS – Parallel BLAS • performs local matrix and vector operations in parallel application • uses BLAS • uses BLACS – Basic Linear Algebra Communications Subprograms library • handles interprocess communications for ScaLAPACK • uses MPI (other implementations also)

ScaLAPACK • Maps matrices and vectors to a process grid • called a BLACSgrid • similar to an MPI Cartesian topology • matrices and vectors decomposed into rectangular blocks – block cyclically distributed to BLACSgrid

ScaLAPACK – sample based on Pacheco pg. 345-350 MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &p); MPI_Comm_rank(MPI_COMM_WORLD,&myrank); Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols, &row_block_size, &col_block_size); m=n; Cblacs_get(0,0,&blacs_grid); /* build blacs grid */ /* R process grid will use row major order */ Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols); Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);

ScaLAPACK – sample cont. local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows); local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols); Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1); b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows); Allocate(my_rank,”b”,b_local,b_local_size,1); exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows); Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);

ScaLAPACK – sample cont. Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows); Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size); Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);

scaLAPACK – sample cont. Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size); Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript); Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0); MPI_Barrier(MPI_COMM_WORLD);

scaLAPACK – sample cont. /* psgesv solves Ax=b returns solution in b */ solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript); … Cblacs_exit(1); MPI_Finalize(); … }

scaLAPACK – sample cont. void Mat_vect_mult(int m, int n, float* A_local, int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) ( char transpose = ‘N’; … psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta, y_local, &first_row_y, &first_col_y, y_descript, y_increment); }

Crossing Languages – Some Issues • Calling routines from another language • calling Fortran subroutine in C • Using n dimensional arrays • remember row major vs column major • Passing arguments in routine/function calls • Fortran passes by address, C passes by value

PETSc • Portable, Extensible Toolkit for Scientific Computation • Large, powerful • Solves • Partial differential equations • Linear systems • Non-linear systems • Solves matrices – • Dense • Sparse

PETSc • PETSc routines return error codes • PETSc error checking routines to help troubleshoot problems • CHKERRRA(errorcode)

PETSc • Built on top of MPI • Developed primarily for C/C++ • unlike scaLAPACK • has Fortran interface • Dense and sparce matrices • same interface

PETSc • Includes many non-blocking operations • i.e. any process can update any cell matrix as non-blocking operation • --- other work can be going on while this update operation is carried out • Many options available from command line • PETSc includes many solvers • Solvers can be selected from command line • can change solvers without recompiling • PETSC_DECIDES

PETSc from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

PETSc – sample routines PetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg); VecSetType(Vec x, Vec_type vec_type); VecCreate(MPI_Comm comm, Vec *x); VecSetSizes(Vec x, int m, int M); VecDuplicate(Vec old, Vec new); MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A); MatSetValues(Mat A, int m, int* im, int n, int* in, PetscScalar *values, INSERT_VALUES);

PETSc – sample routines MatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY); KSPCreate(MPI_Comm comm, KSP *ksp); KSPSolve(KSP ksp, Vec b, Vec x); PetscInitialize(&argc, &argv); PetscFinalize();

BLAS (Basic Linear Algebra Subprograms • http://www.netlib.org/blas/ • LAPACK Linear Algebra PACKage • http://www.netlib.org/lapack/ • http://www.netlib.org/lapack/lug/index.html • ScaLaPACK • http://www.netlib.org/scalapack/scalapack_home.html

PETSc • http://www-unix.mcs.anl.gov/petsc/petsc-as/ • http://acts.nersc.gov/petsc/ • http://www.chuug.org/talks/petsc.pdf • http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/manual.html#Node0

CS 591x – Cluster Computing and Programming Parallel Computers