1 / 37

Investigate and Parallel Processing using E1350 IBM eServer Cluster

Investigate and Parallel Processing using E1350 IBM eServer Cluster. Ayaz ul Hassan Khan (g201002860). Objectives. Explore the architecture of E1350 IBM eServer Cluster Parallel Programming: OpenMP MPI MPI+OpenMP Analyzing the effects of above programming models on speedup

sylvia
Download Presentation

Investigate and Parallel Processing using E1350 IBM eServer Cluster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigate and Parallel Processing using E1350 IBM eServer Cluster AyazulHassan Khan (g201002860)

  2. Objectives • Explore the architecture of E1350 IBM eServer Cluster • Parallel Programming: • OpenMP • MPI • MPI+OpenMP • Analyzing the effects of above programming models on speedup • Finding out overheads and optimize as much as possible

  3. IBM E1350 Cluster

  4. Cluster System • The cluster is unique in its dual-boot capability with Microsoft Windows HPC Server 2008 and Red Hat Enterprise Linux 5 operating systems. • The cluster has 3 master nodes, one for Red Hat Linux, one for Windows HPC Server 2008 and one for cluster management. • The cluster has 128 compute nodes. • Each compute node of the cluster is dual-processor having two 2.0 GHz x3550 Xeon Quad-core E5405 processors. • The total number of cores in the cluster is 1024. • Each master node has 1 TB of hard disk space and each compute node has 500 GB of hard disk. • Each master node has 8 GB of RAM. • Each compute node has 4 GB of RAM. • The interconnect is 10 GBASE-SR

  5. Experimental Environment • Nodes: hpc081, hpc082, hpc083, hpc084 • Compilers: • icc: for sequential and OpenMP programs • mpiicc: for MPI and MPI+OpenMP programs • Profiling Tools: • ompP: for OpenMP profiling • mpiP: for MPI profiling

  6. Applications Used/Implemented • Jacobi Iterative Method • Max Speedup = 7.1 (OpenMP, Threads = 8) • Max Speedup = 3.7 (MPI, Nodes = 4) • Max Speedup = 9.3 (MPI+OpenMP, Nodes = 2, Threads = 8) • Alternating Direction Integration (ADI) • Max Speedup = 5.0 (OpenMP, Threads = 8) • Max Speedup = 0.8 (MPI, Nodes = 1) • Max Speedup = 1.7 (MPI+OpenMP, Nodes = 1, Threads = 8)

  7. Jacobi IterativeMethod • Solving systems of linear equations

  8. Jacobi IterativeMethod • Sequential Code for(i = 0; i < N; i++){ x[i] = b[i]; } for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } for(i=0; i < N; i++) x[i] = new_x[i];

  9. Jacobi IterativeMethod • OpenMP Code #pragma omp parallel private(k,i,j, sum) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } #pragmaomp for for(i=0; i < N; i++) x[i] = new_x[i]; } }

  10. Jacobi IterativeMethod • OpenMP Performance

  11. Jacobi IterativeMethod • ompP results (barrier) R00002 jacobi_openmp.c (46-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.09 100 0.07 0.01 0.00 1 0.08 100 0.07 0.00 0.00 2 0.08 100 0.07 0.01 0.00 3 0.08 100 0.07 0.01 0.00 4 0.08 100 0.07 0.01 0.00 5 0.08 100 0.07 0.01 0.00 6 0.08 100 0.07 0.01 0.00 7 0.08 100 0.07 0.01 0.00 SUM 0.65 800 0.59 0.06 0.00 R00003 jacobi_openmp.c (56-58) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.01 800 0.00 0.01 0.00

  12. Jacobi IterativeMethod • ompP results (nowait) R00002 jacobi_openmp.c (43-52) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.08 100 0.08 0.00 0.00 1 0.08 100 0.08 0.00 0.00 2 0.08 100 0.08 0.00 0.00 3 0.08 100 0.08 0.00 0.00 4 0.08 100 0.08 0.00 0.00 5 0.08 100 0.08 0.00 0.00 6 0.08 100 0.08 0.00 0.00 7 0.08 100 0.08 0.00 0.00 SUM 0.63 800 0.63 0.00 0.00 R00003 jacobi_openmp.c (53-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.00 800 0.00 0.00 0.00

  13. Jacobi IterativeMethod • MPI Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; for(k = 0; k < MAX_ITER; k++){ for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); }

  14. Jacobi IterativeMethod • MPI Performance

  15. Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Allgather 1 60.1 6.24 19.16 0.00 Allgather 2 58.8 6.11 18.77 0.00 Allgather 3 57.3 5.96 18.29 0.00 Scatter 4 34.6 3.59 11.03 0.00 Scatter 3 31.8 3.30 10.14 0.00 Scatter 1 30.1 3.13 9.61 0.00 Scatter 2 27 2.81 8.62 0.00 Bcast 2 7.05 0.73 2.25 0.00 Allgather 4 4.33 0.45 1.38 0.00 Bcast 3 2.25 0.23 0.72 0.00 Bcast 1 0.083 0.01 0.03 0.00 Bcast 4 0.029 0.00 0.01 0.00

  16. Jacobi IterativeMethod • MPI+OpenMP Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; omp_set_num_threads(T); #pragma omp parallel private(k, i, j, index) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } #pragmaomp master { MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); } } }

  17. Jacobi IterativeMethod • MPI+OpenMP Performance

  18. Jacobi IterativeMethod • ompP results R00002 jacobi_mpi_openmp.c (55-65) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.03 100 0.02 0.01 0.00 1 0.24 100 0.02 0.23 0.00 2 0.24 100 0.02 0.22 0.00 3 0.24 100 0.02 0.22 0.00 4 0.24 100 0.02 0.22 0.00 5 0.24 100 0.02 0.22 0.00 6 0.24 100 0.02 0.22 0.00 7 0.24 100 0.02 0.22 0.00 SUM 1.72 800 0.15 1.56 0.00 R00003 jacobi_mpi_openmp.c (67-70) MASTER TID execTexecC 0 0.22 100 SUM 0.22 100

  19. Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Scatter 8 34.7 9.62 14.11 0.00 Allgather 1 32.6 9.05 13.28 0.00 Scatter 6 31.3 8.70 12.76 0.00 Scatter 2 30.2 8.39 12.31 0.00 Allgather 3 29.9 8.30 12.18 0.00 Allgather 5 27.6 7.67 11.25 0.00 Scatter 4 27.1 7.51 11.02 0.00 Allgather 7 22.1 6.14 9.00 0.00 Bcast 4 7.12 1.98 2.90 0.00 Bcast 6 2.81 0.78 1.14 0.00 Bcast 2 0.09 0.02 0.04 0.00 Bcast 8 0.033 0.01 0.01 0.00

  20. ADI • Alternating Direction Integration

  21. ADI • Sequential Code • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j];

  22. ADI • #pragmaomp parallel private(iter) • { • for(iter = 1; iter <= MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • #pragma omp for private(i,j) nowait • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • #pragmaomp for private(i,j) • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j]; • } • OpenMP Code

  23. ADI • OpenMP Performance

  24. ADI • ompP results R00002 adi_openmp.c (43-50) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.18 100 0.18 0.00 0.00 1 0.18 100 0.18 0.00 0.00 2 0.18 100 0.18 0.00 0.00 3 0.18 100 0.18 0.00 0.00 4 0.18 100 0.18 0.00 0.00 5 0.18 100 0.18 0.00 0.00 6 0.18 100 0.18 0.00 0.00 7 0.18 100 0.18 0.00 0.00 SUM 1.47 800 1.47 0.00 0.00 R00003 adi_openmp.c (52-57) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.11 100 0.10 0.01 0.00 1 0.11 100 0.10 0.01 0.00 2 0.11 100 0.10 0.01 0.00 3 0.10 100 0.10 0.00 0.00 4 0.11 100 0.10 0.01 0.00 5 0.10 100 0.10 0.01 0.00 6 0.10 100 0.10 0.01 0.00 7 0.10 100 0.10 0.00 0.00 SUM 0.84 800 0.78 0.06 0.00 R00004 adi_openmp.c (61-68) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.38 100 0.38 0.00 0.00 1 0.31 100 0.31 0.00 0.00 2 0.35 100 0.35 0.00 0.00 3 0.29 100 0.29 0.00 0.00 4 0.35 100 0.35 0.00 0.00 5 0.36 100 0.36 0.00 0.00 6 0.36 100 0.36 0.00 0.00 7 0.37 100 0.37 0.00 0.00 SUM 2.77 800 2.77 0.00 0.00 R00005 adi_openmp.c (70-75) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.16 100 0.16 0.00 0.00 1 0.23 100 0.15 0.07 0.00 2 0.19 100 0.14 0.05 0.00 3 0.25 100 0.16 0.09 0.00 4 0.19 100 0.14 0.05 0.00 5 0.18 100 0.17 0.01 0.00 6 0.18 100 0.17 0.01 0.00 7 0.17 100 0.17 0.01 0.00 SUM 1.55 800 1.26 0.29 0.00

  25. ADI • MPI Code • MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • for(i=myrank*(N/P), k=0; k<N/P; i++, k++) • for(j=0;j<N;j++) • apart[k][j] = a[i][j]; • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • for (i = 0; i < N/P; i++){ • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j];

  26. ADI • MPI Code MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); //transpose matrices trans(x, N, N); trans(b, N, N); trans(a, N, N); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); for(i=myrank*(N/P), k=0; k<N/P; i++, k++) for(j=0;j<N;j++) apart[k][j] = a[i][j]; }

  27. ADI • MPI Performance

  28. ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 1 8.63e+04 22.83 23.54 0.00 Gather 3 6.29e+04 16.63 17.15 0.00 Gather 2 6.08e+04 16.10 16.60 0.00 Gather 4 5.83e+04 15.43 15.91 0.00 Scatter 4 3.31e+04 8.76 9.03 0.00 Scatter 2 3.08e+04 8.14 8.39 0.00 Scatter 3 2.87e+04 7.58 7.81 0.00 Scatter 1 5.53e+03 1.46 1.51 0.00 Bcast 2 50.8 0.01 0.01 0.00 Bcast 4 50.8 0.01 0.01 0.00 Bcast 3 49.5 0.01 0.01 0.00 Bcast 1 40.4 0.01 0.01 0.00 Reduce 1 2.57 0.00 0.00 0.00 Reduce 3 0.259 0.00 0.00 0.00 Reduce 2 0.056 0.00 0.00 0.00 Reduce 4 0.052 0.00 0.00 0.00

  29. ADI • MPI+OpenMP Code MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); omp_set_num_threads(T); #pragmaomp parallel private(iter) { int id, sindex, eindex; intm,n; id = omp_get_thread_num(); sindex = id * node_rows/T; eindex = sindex + node_rows/T; int l = myrank*(N/P); for(m=sindex; m<eindex; m++){ for(n=0;n<N;n++) apart[m][n] = a[l+m][n]; l++; }

  30. ADI • MPI+OpenMP Code • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N/P; i++) • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j]; • #pragmaomp master • { • MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • #pragmaomp barrier

  31. ADI • MPI+OpenMP Code • #pragmaomp sections • { • #pragmaomp section • { trans(x, N, N); } • #pragmaomp section • { trans(b, N, N); } • #pragmaomp section • { trans(a, N, N); } • } • #pragmaomp barrier • #pragmaomp master • { • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • l = myrank*(N/P); • for(m=sindex; m<eindex; m++){ • for(n=0;n<N;n++) • apart[m][n] = a[l+m][n]; • l++; • } • } • #pragmaomp barrier • }

  32. ADI • MPI+OpenMP Performance

  33. ADI • ompP results R00002 adi_mpi_scatter_openmp.c (89-96) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.05 200 0.05 0.00 0.00 1 0.05 200 0.05 0.00 0.00 2 0.08 200 0.08 0.00 0.00 3 0.08 200 0.08 0.00 0.00 4 0.08 200 0.08 0.00 0.00 5 0.08 200 0.08 0.00 0.00 6 0.08 200 0.08 0.00 0.00 7 0.08 200 0.08 0.00 0.00 SUM 0.58 1600 0.58 0.00 0.00 R00003 adi_mpi_scatter_openmp.c (99-104) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.06 200 0.05 0.01 0.00 1 34.23 200 0.05 34.18 0.00 2 34.22 200 0.05 34.17 0.00 3 34.22 200 0.05 34.17 0.00 4 34.21 200 0.05 34.16 0.00 5 34.20 200 0.05 34.15 0.00 6 34.21 200 0.05 34.16 0.00 7 34.20 200 0.05 34.15 0.00 SUM 239.54 1600 0.39 239.14 0.00

  34. ADI R00005 adi_mpi_scatter_openmp.c (113) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 64.29 200 0.00 2 64.29 200 0.00 3 64.29 200 0.00 4 64.29 200 0.00 5 64.29 200 0.00 6 64.29 200 0.00 7 64.29 200 0.00 SUM 450.02 1600 0.00 R00004 adi_mpi_scatter_openmp.c (106-111) MASTER TID execTexecC 0 64.28 200 SUM 64.28 200 R00006 adi_mpi_scatter_openmp.c (116-130) SECTIONS TID execTexecCsectTsectCexitBarTmgmtTtaskT 0 0.85 200 0.85 200 0.00 0.00 0.00 1 0.85 200 0.83 200 0.02 0.00 0.00 2 0.85 200 0.44 200 0.41 0.00 0.00 3 0.85 200 0.00 0 0.85 0.00 0.00 4 0.85 200 0.00 0 0.85 0.00 0.00 5 0.85 200 0.00 0 0.85 0.00 0.00 6 0.85 200 0.00 0 0.85 0.00 0.00 7 0.85 200 0.00 0 0.85 0.00 0.00 SUM 6.80 1600 2.12 600 4.67 0.01 0.00 • ompP results

  35. ADI R00007 adi_mpi_scatter_openmp.c (132) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 0.00 200 0.00 2 0.00 200 0.00 3 0.00 200 0.00 4 0.00 200 0.00 5 0.00 200 0.00 6 0.00 200 0.00 7 0.00 200 0.00 SUM 0.01 1600 0.00 R00008 adi_mpi_scatter_openmp.c (134-138) MASTER TID execTexecC 0 34.46 200 SUM 34.46 200 R00009 adi_mpi_scatter_openmp.c (149) BARRIER TID execTexecCtaskT 0 0.00 1 0.00 1 0.28 1 0.00 2 0.28 1 0.00 3 0.28 1 0.00 4 0.28 1 0.00 5 0.28 1 0.00 6 0.28 1 0.00 7 0.28 1 0.00 SUM 1.94 8 0.00 • ompP results

  36. ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 2 8.98e+04 23.32 23.52 0.00 Gather 6 6.57e+04 17.05 17.19 0.00 Gather 8 6.45e+04 16.74 16.89 0.00 Gather 4 6.17e+04 16.03 16.16 0.00 Scatter 4 3.39e+04 8.79 8.87 0.00 Scatter 8 3.1e+04 8.06 8.13 0.00 Scatter 6 2.96e+04 7.68 7.75 0.00 Scatter 2 5.4e+03 1.40 1.41 0.00 Bcast 7 49.5 0.01 0.01 0.00 Bcast 3 49.3 0.01 0.01 0.00 Bcast 5 47.8 0.01 0.01 0.00 Bcast 1 40 0.01 0.01 0.00 Scatter 1 30.5 0.01 0.01 0.00 Scatter 5 30.3 0.01 0.01 0.00 Scatter 7 30.3 0.01 0.01 0.00 Scatter 3 28.8 0.01 0.01 0.00 Reduce 1 1.8 0.00 0.00 0.00 Reduce 5 0.062 0.00 0.00 0.00 Reduce 3 0.049 0.00 0.00 0.00 Reduce 7 0.049 0.00 0.00 0.00

  37. Thanks • Q & A • Any Suggestions?

More Related