- By
**tracy** - Follow User

- 88 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Two Example Parallel Programs using MPI' - tracy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Two Example Parallel Programs using MPI

UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007

Matrix Multiplication

- Matrices are multiplied together using the dot product of each row of the first matrix with each column of the second matrix

B

A

C

=

*

Matrix Multiplication

- For each value at row i and column j, the result is the dot product of the ith row from A and the jth column from B:

Matrix Multiplication

- For each row i from [0..N-1] and each column j from [0..N-1] the value for position [i][j] of the resulting matrix is computed:

for (i = 0; i < N; i++)

for (j = 0; j < N; j++) {

C[i][j] = 0;

for (k = 0; k < N; j++)

C[i][j] += A[i][k] * B[k][j];

}

Matrix Multiplication

- This can be implemented on multiple processors where each processor is responsible for computing a different set of rows in the final matrix
- As long as each processor has the parts of the A and B matrix, they can do this without communication

C

Matrix Multiplication

- If there are N rows and P processors, then each processor is responsible for N/P rows.
- Each processor is responsible for the rows from my_rank * N/P up to (but excluding) (my_rank + 1) * N/P

0 * N/P

{

my_rank = 0

1 * N/P

{

my_rank = 1

2 * N/P

{

my_rank = 2

3 * N/P

Matrix Multiplication

- This is coded as:

for (i = 0 + my_rank * N/P;

i < 0 + (my_rank + 1) * N/P;

i++)

for (j = 0; j < N; j++) {

C[i][j] = 0;

for (k = 0; k < N; j++)

C[i][j] += A[i][k] * B[k][j];

}

Matrix Multiplication

- One Problem: What if N/P is not an integer?
- The last processor has fewer than N/P rows for which it is responsible.
- The code on the previous slide will cause the last processors (or last couple of processors) to compute beyond the last row of the matrix

Matrix Multiplication

- This is dealt with as follows:

blksz = (int) ceil((float) N / P);

for (i = 0 + my_rank * blksz;

i < min(N, 0 + (my_rank + 1) * blksz);

i++)

for (j = 0; j < N; j++) {

C[i][j] = 0;

for (k = 0; k < N; j++)

C[i][j] += A[i][k] * B[k][j];

}

Matrix Multiplication

- For example suppose N=13 and P=4. Then:

blksz = ceiling(13/4) = 4

Processor 0 : i = [0*4..1*4) = [0..4)

Processor 1 : i = [1*4..2*4) = [4..8)

Processor 2 : i = [2*4..3*4) = [8..12)

Processor 3 : i = [3*4..min(13,4*4))=[12..13)

Matrix Multiplication

- The assignment deals with the parallel execution of matrix multiplication

Numerical Integration

- Suppose we have a non-negative, continuous function f and we want to compute the integral of f from a to b:

y

x

a

b

Numerical Integration

- We can approximate the integral by dividing the area into trapezoids and summing the area of the trapezoids

y

x

a

b

Numerical Integration

- The area for all trapezoids is:

Numerical Integration Sequential program

double f(double x);

main (int argc, char *argv[])

{

int N, i;

double a, b, h, x, integral;

char *usage = "Usage: %s a b N \n";

double elapsed_time;

struct timeval tv1, tv2;

Numerical Integration Sequential program

if (argc < 4) {

fprintf (stderr, usage, argv[0]);

return -1;

}

a = atof(argv[1]);

b = atof(argv[2]);

N = atoi(argv[3]);

Numerical Integration Sequential program

gettimeofday(&tv1, NULL);

h = (b - a) / N;

integral = (f(a) + f(b))/2.0;

x = a + h;

for (i = 1; i < N; i++) {

integral += f(x);

x += h;

}

integral = integral*h;

gettimeofday(&tv2, NULL);

Numerical Integration Sequential program

elapsed_time = (tv2.tv_sec - tv1.tv_sec) +

((tv2.tv_usec - tv1.tv_usec) / 1000000.0);

printf ("elapsed_time=\t%lf seconds\n",

elapsed_time);

printf ("With N = %d trapezoids, \n", N);

printf ("estimate of integral from %f to %f = %f\n", N, a, b, integral);

}

Numerical Integration Sequential program

$ ./integ 1 3 10000

a = 1.000000, b = 3.000000, N = 10000

elapsed_time= 0.000567 seconds

With N = 10000 trapezoids,

estimate of integral from 1.000000 to 3.000000 = 32.000000

Numerical Integration Parallel program

- Each processor will be responsible for computing the area of a subset of trapezoids

y

{

{

{

x

a

b

P2

P0

P1

Numerical Integration Parallel program

double f (double x);

int main(int argc, char *argv[])

{

int N, P, mypid, blksz, i;

double a, b, h, x, integral, localA, localB,

total;

char *usage = "Usage: %s a b N \n";

double elapsed_time;

struct timeval tv1, tv2;

int abort = 0;

Numerical Integration Parallel program

a = atof(argv[1]);

b = atof(argv[2]);

N = atoi(argv[3]);

MPI_Bcast (&a, 1, MPI_DOUBLE, 0,

MPI_COMM_WORLD);

MPI_Bcast (&b, 1, MPI_DOUBLE, 0,

MPI_COMM_WORLD);

MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD);

h = (b - a) / N;

Numerical Integration Parallel program

blksz = (int) ceil ( ((float) N) / P);

localA = a + mypid * blksz * h;

localB = min(b, a + (mypid + 1) * blksz * h);

integral = (f(localA) + f(localB))/2.0;

x = localA + h;

for (i = 1; i < blksz && x <= localB; i++) {

integral += f(x);

x += h;

}

integral = integral*h;

Numerical Integration Parallel program

MPI_Reduce (&integral, &total, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD);

if (mypid == 0)

printf ("integral = %f\n", total);

}

float f(float x)

{

return 6*x*x - 5*x;

}

Numerical Integration Parallel program

$ mpicc mpiInteg.c -o mpiInteg -lm

$ mpirun -nolocal -np 4 mpiInteg 1 3 10000

elapsed_time= 0.001416 seconds

integral = 32.000000

Download Presentation

Connecting to Server..