370 likes | 499 Views
This study explores the architecture and capabilities of the IBM E1350 eServer Cluster, focusing on parallel programming models including OpenMP and MPI. By analyzing various applications such as the Jacobi Iterative Method and Alternating Direction Integration, the study measures speedup and overhead reduction. The cluster's unique dual-boot capability with Windows HPC Server 2008 and Red Hat Linux, featuring 128 dual-processor compute nodes, is highlighted. Results demonstrate significant speedup gains through optimizations tailored for each programming model.
E N D
Investigate and Parallel Processing using E1350 IBM eServer Cluster AyazulHassan Khan (g201002860)
Objectives • Explore the architecture of E1350 IBM eServer Cluster • Parallel Programming: • OpenMP • MPI • MPI+OpenMP • Analyzing the effects of above programming models on speedup • Finding out overheads and optimize as much as possible
Cluster System • The cluster is unique in its dual-boot capability with Microsoft Windows HPC Server 2008 and Red Hat Enterprise Linux 5 operating systems. • The cluster has 3 master nodes, one for Red Hat Linux, one for Windows HPC Server 2008 and one for cluster management. • The cluster has 128 compute nodes. • Each compute node of the cluster is dual-processor having two 2.0 GHz x3550 Xeon Quad-core E5405 processors. • The total number of cores in the cluster is 1024. • Each master node has 1 TB of hard disk space and each compute node has 500 GB of hard disk. • Each master node has 8 GB of RAM. • Each compute node has 4 GB of RAM. • The interconnect is 10 GBASE-SR
Experimental Environment • Nodes: hpc081, hpc082, hpc083, hpc084 • Compilers: • icc: for sequential and OpenMP programs • mpiicc: for MPI and MPI+OpenMP programs • Profiling Tools: • ompP: for OpenMP profiling • mpiP: for MPI profiling
Applications Used/Implemented • Jacobi Iterative Method • Max Speedup = 7.1 (OpenMP, Threads = 8) • Max Speedup = 3.7 (MPI, Nodes = 4) • Max Speedup = 9.3 (MPI+OpenMP, Nodes = 2, Threads = 8) • Alternating Direction Integration (ADI) • Max Speedup = 5.0 (OpenMP, Threads = 8) • Max Speedup = 0.8 (MPI, Nodes = 1) • Max Speedup = 1.7 (MPI+OpenMP, Nodes = 1, Threads = 8)
Jacobi IterativeMethod • Solving systems of linear equations
Jacobi IterativeMethod • Sequential Code for(i = 0; i < N; i++){ x[i] = b[i]; } for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } for(i=0; i < N; i++) x[i] = new_x[i];
Jacobi IterativeMethod • OpenMP Code #pragma omp parallel private(k,i,j, sum) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N; i++){ sum = 0.0; for(j=0; j<N; j++){ if(i != j){ sum += a[i][j] * x[j]; new_x[i] = (b[i] - sum)/a[i][i]; } } } #pragmaomp for for(i=0; i < N; i++) x[i] = new_x[i]; } }
Jacobi IterativeMethod • OpenMP Performance
Jacobi IterativeMethod • ompP results (barrier) R00002 jacobi_openmp.c (46-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.09 100 0.07 0.01 0.00 1 0.08 100 0.07 0.00 0.00 2 0.08 100 0.07 0.01 0.00 3 0.08 100 0.07 0.01 0.00 4 0.08 100 0.07 0.01 0.00 5 0.08 100 0.07 0.01 0.00 6 0.08 100 0.07 0.01 0.00 7 0.08 100 0.07 0.01 0.00 SUM 0.65 800 0.59 0.06 0.00 R00003 jacobi_openmp.c (56-58) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.01 800 0.00 0.01 0.00
Jacobi IterativeMethod • ompP results (nowait) R00002 jacobi_openmp.c (43-52) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.08 100 0.08 0.00 0.00 1 0.08 100 0.08 0.00 0.00 2 0.08 100 0.08 0.00 0.00 3 0.08 100 0.08 0.00 0.00 4 0.08 100 0.08 0.00 0.00 5 0.08 100 0.08 0.00 0.00 6 0.08 100 0.08 0.00 0.00 7 0.08 100 0.08 0.00 0.00 SUM 0.63 800 0.63 0.00 0.00 R00003 jacobi_openmp.c (53-55) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.00 800 0.00 0.00 0.00
Jacobi IterativeMethod • MPI Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; for(k = 0; k < MAX_ITER; k++){ for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); }
Jacobi IterativeMethod • MPI Performance
Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Allgather 1 60.1 6.24 19.16 0.00 Allgather 2 58.8 6.11 18.77 0.00 Allgather 3 57.3 5.96 18.29 0.00 Scatter 4 34.6 3.59 11.03 0.00 Scatter 3 31.8 3.30 10.14 0.00 Scatter 1 30.1 3.13 9.61 0.00 Scatter 2 27 2.81 8.62 0.00 Bcast 2 7.05 0.73 2.25 0.00 Allgather 4 4.33 0.45 1.38 0.00 Bcast 3 2.25 0.23 0.72 0.00 Bcast 1 0.083 0.01 0.03 0.00 Bcast 4 0.029 0.00 0.01 0.00
Jacobi IterativeMethod • MPI+OpenMP Code MPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD); for(i=myrank*N/P, k=0; k<N/P; i++, k++) bpart[k] = x[i]; omp_set_num_threads(T); #pragma omp parallel private(k, i, j, index) { for(k = 0; k < MAX_ITER; k++){ #pragmaomp for for(i=0; i<N/P; i++){ sum = 0.0; for(j=0; j<N; j++){ index = i+((N/P)*myrank); if(index != j){ sum += apart[i][j] * x[j]; new_x[i] = (bpart[i] - sum)/apart[i][index]; } } } #pragmaomp master { MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD); } } }
Jacobi IterativeMethod • MPI+OpenMP Performance
Jacobi IterativeMethod • ompP results R00002 jacobi_mpi_openmp.c (55-65) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.03 100 0.02 0.01 0.00 1 0.24 100 0.02 0.23 0.00 2 0.24 100 0.02 0.22 0.00 3 0.24 100 0.02 0.22 0.00 4 0.24 100 0.02 0.22 0.00 5 0.24 100 0.02 0.22 0.00 6 0.24 100 0.02 0.22 0.00 7 0.24 100 0.02 0.22 0.00 SUM 1.72 800 0.15 1.56 0.00 R00003 jacobi_mpi_openmp.c (67-70) MASTER TID execTexecC 0 0.22 100 SUM 0.22 100
Jacobi IterativeMethod • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Scatter 8 34.7 9.62 14.11 0.00 Allgather 1 32.6 9.05 13.28 0.00 Scatter 6 31.3 8.70 12.76 0.00 Scatter 2 30.2 8.39 12.31 0.00 Allgather 3 29.9 8.30 12.18 0.00 Allgather 5 27.6 7.67 11.25 0.00 Scatter 4 27.1 7.51 11.02 0.00 Allgather 7 22.1 6.14 9.00 0.00 Bcast 4 7.12 1.98 2.90 0.00 Bcast 6 2.81 0.78 1.14 0.00 Bcast 2 0.09 0.02 0.04 0.00 Bcast 8 0.033 0.01 0.01 0.00
ADI • Alternating Direction Integration
ADI • Sequential Code • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j];
ADI • #pragmaomp parallel private(iter) • { • for(iter = 1; iter <= MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N; i++){ • for (j = 1; j < N; j++){ • x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1]; • } • x[i][N-1] = x[i][N-1]/b[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N; i++) • for (j = N-2; j > 1; j--) • x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j]; • ////// ADI forward & backward sweep along columns////// • #pragma omp for private(i,j) nowait • for (j = 0; j < N; j++){ • for (i = 1; i < N; i++){ • x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j]; • b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j]; • } • x[N-1][j] = x[N-1][j]/b[N-1][j]; • } • #pragmaomp for private(i,j) • for (j = 0; j < N; j++) • for (i = N-2; i > 1; i--) • x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j]; • } • OpenMP Code
ADI • OpenMP Performance
ADI • ompP results R00002 adi_openmp.c (43-50) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.18 100 0.18 0.00 0.00 1 0.18 100 0.18 0.00 0.00 2 0.18 100 0.18 0.00 0.00 3 0.18 100 0.18 0.00 0.00 4 0.18 100 0.18 0.00 0.00 5 0.18 100 0.18 0.00 0.00 6 0.18 100 0.18 0.00 0.00 7 0.18 100 0.18 0.00 0.00 SUM 1.47 800 1.47 0.00 0.00 R00003 adi_openmp.c (52-57) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.11 100 0.10 0.01 0.00 1 0.11 100 0.10 0.01 0.00 2 0.11 100 0.10 0.01 0.00 3 0.10 100 0.10 0.00 0.00 4 0.11 100 0.10 0.01 0.00 5 0.10 100 0.10 0.01 0.00 6 0.10 100 0.10 0.01 0.00 7 0.10 100 0.10 0.00 0.00 SUM 0.84 800 0.78 0.06 0.00 R00004 adi_openmp.c (61-68) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.38 100 0.38 0.00 0.00 1 0.31 100 0.31 0.00 0.00 2 0.35 100 0.35 0.00 0.00 3 0.29 100 0.29 0.00 0.00 4 0.35 100 0.35 0.00 0.00 5 0.36 100 0.36 0.00 0.00 6 0.36 100 0.36 0.00 0.00 7 0.37 100 0.37 0.00 0.00 SUM 2.77 800 2.77 0.00 0.00 R00005 adi_openmp.c (70-75) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.16 100 0.16 0.00 0.00 1 0.23 100 0.15 0.07 0.00 2 0.19 100 0.14 0.05 0.00 3 0.25 100 0.16 0.09 0.00 4 0.19 100 0.14 0.05 0.00 5 0.18 100 0.17 0.01 0.00 6 0.18 100 0.17 0.01 0.00 7 0.17 100 0.17 0.01 0.00 SUM 1.55 800 1.26 0.29 0.00
ADI • MPI Code • MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • for(i=myrank*(N/P), k=0; k<N/P; i++, k++) • for(j=0;j<N;j++) • apart[k][j] = a[i][j]; • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • for (i = 0; i < N/P; i++){ • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j];
ADI • MPI Code MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); //transpose matrices trans(x, N, N); trans(b, N, N); trans(a, N, N); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); for(i=myrank*(N/P), k=0; k<N/P; i++, k++) for(j=0;j<N;j++) apart[k][j] = a[i][j]; }
ADI • MPI Performance
ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 1 8.63e+04 22.83 23.54 0.00 Gather 3 6.29e+04 16.63 17.15 0.00 Gather 2 6.08e+04 16.10 16.60 0.00 Gather 4 5.83e+04 15.43 15.91 0.00 Scatter 4 3.31e+04 8.76 9.03 0.00 Scatter 2 3.08e+04 8.14 8.39 0.00 Scatter 3 2.87e+04 7.58 7.81 0.00 Scatter 1 5.53e+03 1.46 1.51 0.00 Bcast 2 50.8 0.01 0.01 0.00 Bcast 4 50.8 0.01 0.01 0.00 Bcast 3 49.5 0.01 0.01 0.00 Bcast 1 40.4 0.01 0.01 0.00 Reduce 1 2.57 0.00 0.00 0.00 Reduce 3 0.259 0.00 0.00 0.00 Reduce 2 0.056 0.00 0.00 0.00 Reduce 4 0.052 0.00 0.00 0.00
ADI • MPI+OpenMP Code MPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); omp_set_num_threads(T); #pragmaomp parallel private(iter) { int id, sindex, eindex; intm,n; id = omp_get_thread_num(); sindex = id * node_rows/T; eindex = sindex + node_rows/T; int l = myrank*(N/P); for(m=sindex; m<eindex; m++){ for(n=0;n<N;n++) apart[m][n] = a[l+m][n]; l++; }
ADI • MPI+OpenMP Code • for(iter = 1; iter <= 2*MAXITER; iter++){ • //////ADI forward & backword sweep along rows////// • #pragma omp for private(i,j) nowait • for (i = 0; i < N/P; i++){ • for (j = 1; j < N; j++){ • xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1]; • bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1]; • } • xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1]; • } • #pragmaomp for private(i,j) • for (i = 0; i < N/P; i++) • for (j = N-2; j > 1; j--) • xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j]; • #pragmaomp master • { • MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • #pragmaomp barrier
ADI • MPI+OpenMP Code • #pragmaomp sections • { • #pragmaomp section • { trans(x, N, N); } • #pragmaomp section • { trans(b, N, N); } • #pragmaomp section • { trans(a, N, N); } • } • #pragmaomp barrier • #pragmaomp master • { • MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD); • } • l = myrank*(N/P); • for(m=sindex; m<eindex; m++){ • for(n=0;n<N;n++) • apart[m][n] = a[l+m][n]; • l++; • } • } • #pragmaomp barrier • }
ADI • MPI+OpenMP Performance
ADI • ompP results R00002 adi_mpi_scatter_openmp.c (89-96) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.05 200 0.05 0.00 0.00 1 0.05 200 0.05 0.00 0.00 2 0.08 200 0.08 0.00 0.00 3 0.08 200 0.08 0.00 0.00 4 0.08 200 0.08 0.00 0.00 5 0.08 200 0.08 0.00 0.00 6 0.08 200 0.08 0.00 0.00 7 0.08 200 0.08 0.00 0.00 SUM 0.58 1600 0.58 0.00 0.00 R00003 adi_mpi_scatter_openmp.c (99-104) LOOP TID execTexecCbodyTexitBarTtaskT 0 0.06 200 0.05 0.01 0.00 1 34.23 200 0.05 34.18 0.00 2 34.22 200 0.05 34.17 0.00 3 34.22 200 0.05 34.17 0.00 4 34.21 200 0.05 34.16 0.00 5 34.20 200 0.05 34.15 0.00 6 34.21 200 0.05 34.16 0.00 7 34.20 200 0.05 34.15 0.00 SUM 239.54 1600 0.39 239.14 0.00
ADI R00005 adi_mpi_scatter_openmp.c (113) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 64.29 200 0.00 2 64.29 200 0.00 3 64.29 200 0.00 4 64.29 200 0.00 5 64.29 200 0.00 6 64.29 200 0.00 7 64.29 200 0.00 SUM 450.02 1600 0.00 R00004 adi_mpi_scatter_openmp.c (106-111) MASTER TID execTexecC 0 64.28 200 SUM 64.28 200 R00006 adi_mpi_scatter_openmp.c (116-130) SECTIONS TID execTexecCsectTsectCexitBarTmgmtTtaskT 0 0.85 200 0.85 200 0.00 0.00 0.00 1 0.85 200 0.83 200 0.02 0.00 0.00 2 0.85 200 0.44 200 0.41 0.00 0.00 3 0.85 200 0.00 0 0.85 0.00 0.00 4 0.85 200 0.00 0 0.85 0.00 0.00 5 0.85 200 0.00 0 0.85 0.00 0.00 6 0.85 200 0.00 0 0.85 0.00 0.00 7 0.85 200 0.00 0 0.85 0.00 0.00 SUM 6.80 1600 2.12 600 4.67 0.01 0.00 • ompP results
ADI R00007 adi_mpi_scatter_openmp.c (132) BARRIER TID execTexecCtaskT 0 0.00 200 0.00 1 0.00 200 0.00 2 0.00 200 0.00 3 0.00 200 0.00 4 0.00 200 0.00 5 0.00 200 0.00 6 0.00 200 0.00 7 0.00 200 0.00 SUM 0.01 1600 0.00 R00008 adi_mpi_scatter_openmp.c (134-138) MASTER TID execTexecC 0 34.46 200 SUM 34.46 200 R00009 adi_mpi_scatter_openmp.c (149) BARRIER TID execTexecCtaskT 0 0.00 1 0.00 1 0.28 1 0.00 2 0.28 1 0.00 3 0.28 1 0.00 4 0.28 1 0.00 5 0.28 1 0.00 6 0.28 1 0.00 7 0.28 1 0.00 SUM 1.94 8 0.00 • ompP results
ADI • mpiP results --------------------------------------------------------------------------- @--- Aggregate Time (top twenty, descending, milliseconds) ---------------- --------------------------------------------------------------------------- Call Site Time App% MPI% COV Gather 2 8.98e+04 23.32 23.52 0.00 Gather 6 6.57e+04 17.05 17.19 0.00 Gather 8 6.45e+04 16.74 16.89 0.00 Gather 4 6.17e+04 16.03 16.16 0.00 Scatter 4 3.39e+04 8.79 8.87 0.00 Scatter 8 3.1e+04 8.06 8.13 0.00 Scatter 6 2.96e+04 7.68 7.75 0.00 Scatter 2 5.4e+03 1.40 1.41 0.00 Bcast 7 49.5 0.01 0.01 0.00 Bcast 3 49.3 0.01 0.01 0.00 Bcast 5 47.8 0.01 0.01 0.00 Bcast 1 40 0.01 0.01 0.00 Scatter 1 30.5 0.01 0.01 0.00 Scatter 5 30.3 0.01 0.01 0.00 Scatter 7 30.3 0.01 0.01 0.00 Scatter 3 28.8 0.01 0.01 0.00 Reduce 1 1.8 0.00 0.00 0.00 Reduce 5 0.062 0.00 0.00 0.00 Reduce 3 0.049 0.00 0.00 0.00 Reduce 7 0.049 0.00 0.00 0.00
Thanks • Q & A • Any Suggestions?