Parallel Techniques 之易并行计算

Parallel Techniques之易并行计算 Lecture 4 张少强 http://bioinfo.uncc.edu/szhang sqzhang@163.com

Parallel Techniques Embarrassingly Parallel Computations

Embarrassingly Parallel Computations A computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or).计算可以很显然地被分成一组完全独立的部分，每部分可分别由一个进程执行。进程之间没有或有很少的通信每个进程可以不与其他进程有交互来运行任务

Practical embarrassingly parallel computation with static process creation and master-slave approach

Embarrassingly Parallel Computation Examples • Low level image processing • Mandelbrot set • Monte Carlo Calculations

Low level image processing Many low level image processing operations only involve local data with very limited if any communication between areas of interest.

Image coordinate system Origin (0,0) x . (x, y) y

Some geometrical operations Shifting平移 Object shifted by Dx in x-dimension and Dy in y-dimension: x¢ = x + Dx y¢ = y + Dy where x and y are original coordinates and x¢ and y¢ are the new coordinates. x y Dx Dy

Some geometrical operations Scaling缩放 Object scaled by factor Sxin x-direction and Syin y-direction: x¢ = xSx y¢ = ySy x y 3.8

Rotation翻转 Object rotated through an angle q about the origin of the coordinate system: x¢ = x cosq + y sinq y¢ = -x sinq + y cosq x y 3.8

Parallelizing Image Operations Partitioning into regions for individual processes Example Square region for each process (can also use strips) 3.9

主进程master process 480行，10行一组分割；分出48个从进程；每个进程处理640*10区域。 for(i=0,row=0;i<48;i++,row=row+10) Send(row,P(i)); /*send row to process i*/ for(i=0;i<480;i++) for(i=0;i<640;j++) temp_map[i][j]=0; for(i=0;i<(640*480);i++) Recv(oldrow,oldcol,newrow,newcol,P_any); if!((newrow<0)||(newrow>=480)||(newcol<0)||(newcol>=640)) temp_map[newrow][newcol]=map[oldrow][oldcol]; } for(i=0;i<480;i++) for(j=0;j<640;j++) map[i][j]=temp_map[i][j];

从进程slave process 平移的从进程 Recv(row, P(0)); /*revecive from master process*/ for(oldrow=row;oldrow<(row+10);oldrow++) for(oldcol=0;oldcol<640;oldcol++){ newrow=oldrow+delta_x; newcol=oldcol+delta_y; Send(oldrow,oldcol,newrow,newcol,P(0)); } 从进程之间没有通信

Mandelbrot Set曼德博集合 是一种在复平面上组成分形的点的集合 Set of points in a complex plane that are quasi-stable (will increase and decrease, but not exceed some limit) when computed by iterating the function: where zk +1 is the (k + 1)th iteration of complex number: z = a + bi and c is a complex number giving position of point in the complex plane. The initial value for z is zero.

Mandelbrot Set continued Zk 或者延伸到无限大，或者只停留在有限半径的圆盘内。曼德博集合就是使以上序列不延伸至无限大的所有c点的集合。 • Iterations continued until magnitude of z is: • Greater than 2 • or • Number of iterations reaches arbitrary limit. • Magnitude of z is the length of the vector given by:

对一点(像素)的值进行计算并返回迭代次数(色)对一点(像素)的值进行计算并返回迭代次数(色) structure complex { float real; float imag; }; int cal_pixel(complex c) { int count, max; complex z; float temp, lengthsq; max = 256; /*若令256为黑色*/ z.real = 0; z.imag = 0; count = 0; /* number of iterations */ do { temp = z.real * z.real - z.imag * z.imag + c.real; z.imag = 2 * z.real * z.imag + c.imag; z.real = temp; lengthsq = z.real * z.real + z.imag * z.imag; count++; } while ((lengthsq < 4.0) && (count < max)); return count; }

Mandelbrot set

Parallelizing Mandelbrot Set Computation Static Task Assignment Simply divide the region in to fixed number of parts, each computed by a separate processor. Not very successful because different regions require different numbers of iterations and time.

Dynamic Task Assignment Have processor request regions after computing previous regions

动态主进程 Count=0; Row=0; For(k=0;k<num_proc;k++) /*进程数<row/ send(&row,Pk,data_tag); count++; row++; } do{ recv(&slave,&r,color,P_any,result_tag); count--； if(row <disp_height){ send(&row,P_slave,data_tag); row++; count++; }else send(&row, P_slave,terminator_tag); display(); }while(count>0)

Monte Carlo Methods Another embarrassingly parallel computation. Monte Carlo methods use of random selections.

2 x 2的正方形内接一个圆. 圆与正方形的面具比值为: 在正方形内随机选择很多“点”. 记下有多少个“点”落在圆中。 Fraction of points within circle will be , given sufficient number of randomly selected samples.

Computing an Integral One quadrant can be described by integral Random pairs of numbers, (xr,yr) generated, each between 0 and 1. Counted as in circle if

Alternative (better) Method Use random values of x to compute f(x) and sum values of f(x): where xrare randomly generated values of x between x1 and x2. Monte Carlo method very useful if the function cannot be integrated numerically (maybe having a large number of variables)

Example Computing the integral Sequential Code sum = 0; for (i = 0; i < N; i++) { /* N random samples */ xr = rand_v(x1, x2); /* generate next random value */ sum = sum + xr * xr - 3 * xr; /* compute f(xr) */ } area = (sum / N) * (x2 - x1); Routine randv(x1, x2) returns a pseudorandom number between x1 and x2.

For parallelizing Monte Carlo code, must address best way to generate random numbers in parallel

/*********************************************************************************/********************************************************************************* pi_calc.cpp calculates value of pi and compares with actual value (to 25digits) of pi to give error. Integrates function f(x)=4/(1+x^2). **********************************************************************************/ #include <math.h> //include files #include <iostream.h> #include "mpi.h " void printit(); //function prototypes int main(int argc, char *argv[]) { double actual_pi = 3.141592653589793238462643; //for comparison later int n, rank, num_proc, i; double temp_pi, calc_pi, int_size, part_sum, x; char response = 'y', resp1 = 'y'; MPI::Init(argc, argv);//initiate MPI

num_proc = MPI::COMM_WORLD.Get_size(); rank = MPI::COMM_WORLD.Get_rank(); if (rank == 0) printit(); /* I am root node, print out welcome */ while (response == 'y') { if (resp1 == 'y') { if (rank == 0) { /*I am root node*/ cout <<"__________________________________" <<endl; cout <<"\nEnter the number of intervals: (0 will exit)" << endl; cin >> n;} } else n = 0; MPI::COMM_WORLD.Bcast(&n, 1, MPI::INT, 0); //broadcast n if (n==0) break; //check for quit condition

else { int_size = 1.0 / (double) n; //calcs interval size part_sum = 0.0; for (i = rank + 1; i <= n; i += num_proc) { //calcs partial sums x = int_size * ((double)i - 0.5); part_sum += (4.0 / (1.0 + x*x)); } temp_pi = int_size * part_sum; //collects all partial sums computes pi MPI::COMM_WORLD.Reduce(&temp_pi,&calc_pi, 1, MPI::DOUBLE, MPI::SUM, 0);

if (rank == 0) { /*I am server*/ cout << "pi is approximately " << calc_pi << ". Error is " << fabs(calc_pi - actual_pi) << endl <<"_______________________________________" << endl; } } //end else if (rank == 0) { /*I am root node*/ cout << "\nCompute with new intervals? (y/n)" << endl; cin >> resp1; } }//end while MPI::Finalize();//terminate MPI return 0; } //end main

//functions void printit() { cout << "\n*********************************" << endl << "Welcome to the pi calculator!" << endl << "You set the number of divisions \nfor estimating the integral: \n\tf(x)=4/(1+x^2)" << endl << "*********************************" << endl; } //end printit

Parallel Techniques 之易并行计算