1 / 27

CS 591x – Cluster Computing and Programming Parallel Computers

CS 591x – Cluster Computing and Programming Parallel Computers. Parallel Libraries. Parallel Libraries. Recall that so far we have been – Breaking up (decomposing) our “large” problems into smaller pieces… Distributing the pieces of the problem to multiple processors

cardea
Download Presentation

CS 591x – Cluster Computing and Programming Parallel Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

  2. Parallel Libraries • Recall that so far we have been – • Breaking up (decomposing) our “large” problems into smaller pieces… • Distributing the pieces of the problem to multiple processors • Explicitly moving data among processes through message passing

  3. Parallel Libraries • Note that – • Large scientific and engineering problems often represent data in matrices and vectors • Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems

  4. Parallel Libraries • MPI is designed to support the development of libraries • Consequently, there are a number of libraries, based on MPI, used to develop parallel software • Some libraries take care of much, or all of the parallelization • That means….

  5. Parallel Libraries • … You don’t have to… • … but you still can… • … if you want • … sometimes…

  6. Parallel Libraries • ScaLAPACK • Scalable Linear Algebra PACKage • PETSc • Portable, Extensible Toolkit for Scientific Computation

  7. ScaLaPACK • Built on LAPACK – Linear Algebra Package • Powerful • Widely used in scientific and engineering computing • not scalable to distributed memory parallel computers • LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library

  8. ScaLAPACK • uses PBLAS – Parallel BLAS • performs local matrix and vector operations in parallel application • uses BLAS • uses BLACS – Basic Linear Algebra Communications Subprograms library • handles interprocess communications for ScaLAPACK • uses MPI (other implementations also)

  9. ScaLAPACK • Maps matrices and vectors to a process grid • called a BLACSgrid • similar to an MPI Cartesian topology • matrices and vectors decomposed into rectangular blocks – block cyclically distributed to BLACSgrid

  10. ScaLAPACK – sample based on Pacheco pg. 345-350 MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &p); MPI_Comm_rank(MPI_COMM_WORLD,&myrank); Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols, &row_block_size, &col_block_size); m=n; Cblacs_get(0,0,&blacs_grid); /* build blacs grid */ /* R process grid will use row major order */ Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols); Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);

  11. ScaLAPACK – sample cont. local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows); local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols); Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1); b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows); Allocate(my_rank,”b”,b_local,b_local_size,1); exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows); Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);

  12. ScaLAPACK – sample cont. Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows); Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size); Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);

  13. scaLAPACK – sample cont. Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size); Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript); Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0); MPI_Barrier(MPI_COMM_WORLD);

  14. scaLAPACK – sample cont. /* psgesv solves Ax=b returns solution in b */ solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript); … Cblacs_exit(1); MPI_Finalize(); … }

  15. scaLAPACK – sample cont. void Mat_vect_mult(int m, int n, float* A_local, int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) ( char transpose = ‘N’; … psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta, y_local, &first_row_y, &first_col_y, y_descript, y_increment); }

  16. Crossing Languages – Some Issues • Calling routines from another language • calling Fortran subroutine in C • Using n dimensional arrays • remember row major vs column major • Passing arguments in routine/function calls • Fortran passes by address, C passes by value

  17. PETSc • Portable, Extensible Toolkit for Scientific Computation • Large, powerful • Solves • Partial differential equations • Linear systems • Non-linear systems • Solves matrices – • Dense • Sparse

  18. PETSc • PETSc routines return error codes • PETSc error checking routines to help troubleshoot problems • CHKERRRA(errorcode)

  19. PETSc • Built on top of MPI • Developed primarily for C/C++ • unlike scaLAPACK • has Fortran interface • Dense and sparce matrices • same interface

  20. PETSc • Includes many non-blocking operations • i.e. any process can update any cell matrix as non-blocking operation • --- other work can be going on while this update operation is carried out • Many options available from command line • PETSc includes many solvers • Solvers can be selected from command line • can change solvers without recompiling • PETSC_DECIDES

  21. PETSc from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

  22. PETSc from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

  23. PETSc – sample routines PetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg); VecSetType(Vec x, Vec_type vec_type); VecCreate(MPI_Comm comm, Vec *x); VecSetSizes(Vec x, int m, int M); VecDuplicate(Vec old, Vec new); MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A); MatSetValues(Mat A, int m, int* im, int n, int* in, PetscScalar *values, INSERT_VALUES);

  24. PETSc – sample routines MatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY); KSPCreate(MPI_Comm comm, KSP *ksp); KSPSolve(KSP ksp, Vec b, Vec x); PetscInitialize(&argc, &argv); PetscFinalize();

  25. BLAS (Basic Linear Algebra Subprograms • http://www.netlib.org/blas/ • LAPACK Linear Algebra PACKage • http://www.netlib.org/lapack/ • http://www.netlib.org/lapack/lug/index.html • ScaLaPACK • http://www.netlib.org/scalapack/scalapack_home.html

  26. PETSc • http://www-unix.mcs.anl.gov/petsc/petsc-as/ • http://acts.nersc.gov/petsc/ • http://www.chuug.org/talks/petsc.pdf • http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/manual.html#Node0

More Related