Two example parallel programs using mpi
Download
1 / 28

Two Example Parallel Programs using MPI - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Two Example Parallel Programs using MPI. UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007. Matrix Multiplication. Matrices are multiplied together using the dot product of each row of the first matrix with each column of the second matrix. B. A. C. =. *. Matrix Multiplication.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Two Example Parallel Programs using MPI' - tracy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Two example parallel programs using mpi
Two Example Parallel Programs using MPI

UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007


Matrix multiplication
Matrix Multiplication

  • Matrices are multiplied together using the dot product of each row of the first matrix with each column of the second matrix

B

A

C

=

*


Matrix multiplication1
Matrix Multiplication

  • For each value at row i and column j, the result is the dot product of the ith row from A and the jth column from B:


Matrix multiplication2
Matrix Multiplication

  • For each row i from [0..N-1] and each column j from [0..N-1] the value for position [i][j] of the resulting matrix is computed:

    for (i = 0; i < N; i++)

    for (j = 0; j < N; j++) {

    C[i][j] = 0;

    for (k = 0; k < N; j++)

    C[i][j] += A[i][k] * B[k][j];

    }


Matrix multiplication3
Matrix Multiplication

  • This can be implemented on multiple processors where each processor is responsible for computing a different set of rows in the final matrix

  • As long as each processor has the parts of the A and B matrix, they can do this without communication

C


Matrix multiplication4
Matrix Multiplication

  • If there are N rows and P processors, then each processor is responsible for N/P rows.

  • Each processor is responsible for the rows from my_rank * N/P up to (but excluding) (my_rank + 1) * N/P

0 * N/P

{

my_rank = 0

1 * N/P

{

my_rank = 1

2 * N/P

{

my_rank = 2

3 * N/P


Matrix multiplication5
Matrix Multiplication

  • This is coded as:

    for (i = 0 + my_rank * N/P;

    i < 0 + (my_rank + 1) * N/P;

    i++)

    for (j = 0; j < N; j++) {

    C[i][j] = 0;

    for (k = 0; k < N; j++)

    C[i][j] += A[i][k] * B[k][j];

    }


Matrix multiplication6
Matrix Multiplication

  • One Problem: What if N/P is not an integer?

  • The last processor has fewer than N/P rows for which it is responsible.

  • The code on the previous slide will cause the last processors (or last couple of processors) to compute beyond the last row of the matrix


Matrix multiplication7
Matrix Multiplication

  • This is dealt with as follows:

    blksz = (int) ceil((float) N / P);

    for (i = 0 + my_rank * blksz;

    i < min(N, 0 + (my_rank + 1) * blksz);

    i++)

    for (j = 0; j < N; j++) {

    C[i][j] = 0;

    for (k = 0; k < N; j++)

    C[i][j] += A[i][k] * B[k][j];

    }


Matrix multiplication8
Matrix Multiplication

  • For example suppose N=13 and P=4. Then:

    blksz = ceiling(13/4) = 4

    Processor 0 : i = [0*4..1*4) = [0..4)

    Processor 1 : i = [1*4..2*4) = [4..8)

    Processor 2 : i = [2*4..3*4) = [8..12)

    Processor 3 : i = [3*4..min(13,4*4))=[12..13)


Matrix multiplication9
Matrix Multiplication

  • The assignment deals with the parallel execution of matrix multiplication


Numerical integration
Numerical Integration

  • Suppose we have a non-negative, continuous function f and we want to compute the integral of f from a to b:

y

x

a

b


Numerical integration1
Numerical Integration

  • We can approximate the integral by dividing the area into trapezoids and summing the area of the trapezoids

y

x

a

b


Numerical integration2
Numerical Integration

  • If we use equal width partitions, then each partition is h=(a+b)/n

y

x

a

b


Numerical integration3
Numerical Integration

  • The area of the ith trapezoid is:

y

x

h

a

b


Numerical integration4
Numerical Integration

  • The area for all trapezoids is:


Numerical integration sequential program
Numerical Integration Sequential program

double f(double x);

main (int argc, char *argv[])

{

int N, i;

double a, b, h, x, integral;

char *usage = "Usage: %s a b N \n";

double elapsed_time;

struct timeval tv1, tv2;


Numerical integration sequential program1
Numerical Integration Sequential program

if (argc < 4) {

fprintf (stderr, usage, argv[0]);

return -1;

}

a = atof(argv[1]);

b = atof(argv[2]);

N = atoi(argv[3]);


Numerical integration sequential program2
Numerical Integration Sequential program

gettimeofday(&tv1, NULL);

h = (b - a) / N;

integral = (f(a) + f(b))/2.0;

x = a + h;

for (i = 1; i < N; i++) {

integral += f(x);

x += h;

}

integral = integral*h;

gettimeofday(&tv2, NULL);


Numerical integration sequential program3
Numerical Integration Sequential program

elapsed_time = (tv2.tv_sec - tv1.tv_sec) +

((tv2.tv_usec - tv1.tv_usec) / 1000000.0);

printf ("elapsed_time=\t%lf seconds\n",

elapsed_time);

printf ("With N = %d trapezoids, \n", N);

printf ("estimate of integral from %f to %f = %f\n", N, a, b, integral);

}


Numerical integration sequential program4
Numerical Integration Sequential program

double f(double x)

{

return 6*x*x - 5*x;

}


Numerical integration sequential program5
Numerical Integration Sequential program

$ ./integ 1 3 10000

a = 1.000000, b = 3.000000, N = 10000

elapsed_time= 0.000567 seconds

With N = 10000 trapezoids,

estimate of integral from 1.000000 to 3.000000 = 32.000000


Numerical integration parallel program
Numerical Integration Parallel program

  • Each processor will be responsible for computing the area of a subset of trapezoids

y

{

{

{

x

a

b

P2

P0

P1


Numerical integration parallel program1
Numerical Integration Parallel program

double f (double x);

int main(int argc, char *argv[])

{

int N, P, mypid, blksz, i;

double a, b, h, x, integral, localA, localB,

total;

char *usage = "Usage: %s a b N \n";

double elapsed_time;

struct timeval tv1, tv2;

int abort = 0;


Numerical integration parallel program2
Numerical Integration Parallel program

a = atof(argv[1]);

b = atof(argv[2]);

N = atoi(argv[3]);

MPI_Bcast (&a, 1, MPI_DOUBLE, 0,

MPI_COMM_WORLD);

MPI_Bcast (&b, 1, MPI_DOUBLE, 0,

MPI_COMM_WORLD);

MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD);

h = (b - a) / N;


Numerical integration parallel program3
Numerical Integration Parallel program

blksz = (int) ceil ( ((float) N) / P);

localA = a + mypid * blksz * h;

localB = min(b, a + (mypid + 1) * blksz * h);

integral = (f(localA) + f(localB))/2.0;

x = localA + h;

for (i = 1; i < blksz && x <= localB; i++) {

integral += f(x);

x += h;

}

integral = integral*h;


Numerical integration parallel program4
Numerical Integration Parallel program

MPI_Reduce (&integral, &total, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD);

if (mypid == 0)

printf ("integral = %f\n", total);

}

float f(float x)

{

return 6*x*x - 5*x;

}


Numerical integration parallel program5
Numerical Integration Parallel program

$ mpicc mpiInteg.c -o mpiInteg -lm

$ mpirun -nolocal -np 4 mpiInteg 1 3 10000

elapsed_time= 0.001416 seconds

integral = 32.000000