1 / 59

MATLAB*P: Architecture

MATLAB*P: Architecture. Ron Choy, Alan Edelman Laboratory for Computer Science MIT. Outline. The “p” is for parallel MATLAB is what people want. Supercomputing in 2003. The impact of the Earth Simulator The impact of the internet bubble The few big players left The rise of clusters.

yori
Download Presentation

MATLAB*P: Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MATLAB*P: Architecture Ron Choy, Alan Edelman Laboratory for Computer Science MIT

  2. Outline • The “p” is for parallel • MATLAB is what people want

  3. Supercomputing in 2003 • The impact of the Earth Simulator • The impact of the internet bubble • The few big players left • The rise of clusters

  4. The Rise of Clusters • Cheap & everyone seems to have one

  5. Some MIT Clusters • Cheap & everyone seems to have one

  6. Rise of Clusters • One to Three Beowulfs is/are in the top 5 of the top500 list! • Increased computing power does not necessarily translate to increased productivity • Someone has to program these parallel machines, and this can be hard

  7. Classes of interesting problems • Large collection of small problems – easy, usually involves running a large number of serial programs – embarrassingly parallel • Space of problems? • Large problems – might not even fit into the memory of a machine. Much harder since it usually involves communication between processes

  8. Parallel programming (1990 – now) • Dominant model: MPI (Message Passing Interface) • Low level control of communications between processes • Send, Recv, Scatter, Gather …

  9. MPI • It is not an easy way to do parallel programming. • Error prone – it is hard to get complex low level communications right • Hard to debug – MPI programs tend to die with obscure error messages. There is also no good debugger for MPI.

  10. C with MPI #include <stdio.h> #include <stdlib.h> #include <math.h> #include "mpi.h" #define A(i,j) ( 1.0/((1.0*(i)+(j))*(1.0*(i)+(j)+1)/2 + (1.0*(i)+1)) ) void errorExit(void); double normalize(double* x, int mat_size); int main(int argc, char **argv) { int num_procs; int rank; int mat_size = 64000; int num_components; double *x = NULL; double *y_local = NULL; double norm_old = 1; double norm = 0; int i,j; int count; if (MPI_SUCCESS != MPI_Init(&argc, &argv)) exit(1); if (MPI_SUCCESS != MPI_Comm_size(MPI_COMM_WORLD,&num_procs)) errorExit();

  11. C with MPI (2) if (0 == mat_size % num_procs) num_components = mat_size/num_procs; else num_components = (mat_size/num_procs + 1); mat_size = num_components * num_procs; if (0 == rank) printf("Matrix Size = %d\n", mat_size); if (0 == rank) printf("Num Components = %d\n", num_components); if (0 == rank) printf("Num Processes = %d\n", num_procs); x = (double*) malloc(mat_size * sizeof(double)); y_local = (double*) malloc(num_components * sizeof(double)); if ( (NULL == x) || (NULL == y_local) ) { free(x); free(y_local); errorExit(); } if (0 == rank) { for (i=0; i<mat_size; i++) { x[i] = rand(); } norm = normalize(x,mat_size); }

  12. C with MPI (3) if (MPI_SUCCESS != MPI_Bcast(x, mat_size, MPI_DOUBLE, 0, MPI_COMM_WORLD)) errorExit(); count = 0; while (fabs(norm-norm_old) > TOL) { count++; norm_old = norm; for (i=0; i<num_components; i++) { y_local[i] = 0; } for (i=0; i<num_components && (i+num_components*rank)<mat_size; i++) { for (j=mat_size-1; j>=0; j--) { y_local[i] += A(i+rank*num_components,j) * x[j]; } } if (MPI_SUCCESS != MPI_Allgather(y_local, num_components, MPI_DOUBLE, x, num_components, MPI_DOUBLE, MPI_COMM_WORLD)) errorExit();

  13. C with MPI (4) norm = normalize(x, mat_size); } if (0 == rank) { printf("result = %16.15e\n", norm); } free(x); free(y_local); MPI_Finalize(); exit(0); } void errorExit(void) { int rank; MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf("%d died\n",rank); MPI_Finalize(); exit(1); }

  14. C with MPI (5) double normalize(double* x, int mat_size) { int i; double norm = 0; for (i=mat_size-1; i>=0; i--) { norm += x[i] * x[i]; } norm = sqrt(norm); for (i=0; i<mat_size; i++) { x[i] /= norm; } return norm; }

  15. The result? • A lot of researchers in the sciences need the power of parallel computing, but do not have the expertise to tackle the programming • Time saved by using parallel computers is offset by the additional time needed to program them

  16. The result? (2) • A high level tool is needed to hide the complexity away from the users of parallel computers

  17. Let’s go back to the serial worldfor a moment

  18. Outline • The “p” is for parallel • MATLAB is what people want

  19. What do people want? • Real applications programmers care less about “p-fold speedups” • Ease of programming, debugging, correct answer matters much more

  20. MATLAB • MATrix LABoratory • Started out as an interactive frontend to LINPACK and EISPACK • Has since grown into a powerful computing environment with ‘Toolboxes’ ranging from neural networks to computational finance to image processing. • Many current users know nothing about linear algebra

  21. MATLAB ‘language’ • MATLAB has a C-like scripting language, with HPF-like array indexing • Interpreted language • A wide range of linear algebra routines • Very popular in the scientific community as a prototyping tool – easier to use than C/FORTRAN

  22. MATLAB ‘language’ (2) • MATLAB makes use of ATLAS (optimized BLAS) for basic linear algebra routines • Still, performance is not as good as C/FORTRAN, especially for loops • However MATLAB is catching up with its release 13 – (just-in-time compiling?)

  23. Why people like MATLAB? • Convenience, not performance!

  24. Wouldn’t it be nice to mixthe convenience of MATLABwithparallel computing?

  25. Parallel MATLAB Survey • http://theory.lcs.mit.edu/~cly/survey.html • 27 parallel MATLAB projects found, using 4 approaches • Embarrassingly Parallel • Message Passing • Backend Support • Compiler

  26. Embarrassingly Parallel

  27. Embarrassingly Parallel (2) • Multiple MATLABs needed • Scatter/Gather at the beginning and end • No lateral communication

  28. Message Passing

  29. Message Passing (2) • Multiple MATLABs required • MATLABs talk to each other in a general MPI fashion • Superset of embarrassingly parallel

  30. Backend support ScaLAPACK FFTW PBLAS PARPACK ……

  31. Backend support • Requires 1 MATLAB • MATLAB is used as front end to a parallel computation engine

  32. Compiler

  33. Compiler (2) • Requires zero to one MATLAB • Compiles MATLAB scripts into native code and link with parallel numerical libraries

  34. MATLAB*P • Project started by P. Husbands, who is now at NERSC • Goal: To provide a user-friendly tool for parallel computing • Uses MATLAB as a front end to a parallel linear algebra server • Encompasses the backend support, message passing and embarrassingly parallel approaches

  35. MATLAB*P (2) • Server leverages on popular parallel numerical libraries: • ScaLAPACK • FFTW • PARPACK • …

  36. Matlab*P A = rand(4000*p, 4000*p); x = randn(4000*p, 1); y = zeros(size(x)); while norm(x-y) / norm(x) > 1e-11 y = x; x = A*x; x = x / norm(x); end;

  37. Structure of the System Package Manager Client Manager Proxy Client Matrix Manager Server Manager MATLAB

  38. Structure of the system (2) • Client (C++) • Attaches to MATLAB through MEX interface • Relays commands to server and process returns • Server (C++/MPI) • Manipulates/processes data (matrices) • Performs the actual parallel computation by calling the appropriate library routines

  39. Functionalities provided • PBLAS, ScaLAPACK • solve, eig, svd, chol, matmul, +, -, … • FFTW • 1D and 2D FFT • PARPACK • eigs, svds • Support provided for s,d,c,z

  40. Focus • Requires minimal learning on user’s part • Reuses of existing scripts • Mimics MATLAB behaviour • Data stay on backend until explicitly retrieved by user • Extendable backend

  41. Comparison

  42. What is *p? • MATLAB*P creates a new MATLAB class called layoot • By providing overloaded functions for this new class, parallelism is achieved transparently • *p is layoot(1)

  43. Example • A = randn(1024*p,1024*p); • E = eig(A); • e = pp2matlab(E); • plot(e,’*’);

  44. Example 2 • H = hilb(n) % H = hilb(8000*p) • J = 1:n; • J = J(ones(n,1),:); • I = J’; • E = ones(n,n); • H = E./(I+J-1);

  45. Built-in MATLAB functions • Hilb is a MATLAB function that get parallelized automatically • Another example of a MATLAB function that works is cgs • We do not know how many of them works

  46. Where is the data? • One often-asked question in parallel computing is: where are the data residing? • MATLAB*P supports 3 data distributions: • A = randn(m*p,n) – row distribution • B = randn(m,n*p) – column distribution • C = randn(m*p,n*p) – 2D block cyclic distribution

  47. Flow of a MATLAB*P call • User types A = randn(256*p) in MATLAB • The overloaded randn in layoot class is called • The overloaded method calls MATLAB*P client with the arguments ‘ppbase_addDense’, 256, 3 (for cyclic dist) • Client relays the command to server

More Related