1 / 32

Lecture 3: Parallel Algorithm Design

Lecture 3: Parallel Algorithm Design. Techniques of Parallel Algorithm Design. Balanced binary tree Pointer jumping Accelerated cascading Divide and conquer Pipelining Multi-level divide and conquer . . . . . Balanced binary tree.

chibale
Download Presentation

Lecture 3: Parallel Algorithm Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3: Parallel Algorithm Design

  2. Techniques of Parallel Algorithm Design • Balanced binary tree • Pointer jumping • Accelerated cascading • Divide and conquer • Pipelining • Multi-level divide and conquer . . . . .

  3. Balanced binary tree • Processing on binary tree: Let the leaves correspond to input and internal nodes to processors. • Example Find the sum of n integers (x1, x2, ... , xn).

  4. Problem of finding Prefix Sum Balanced binary tree • Definition of Prefix Sum • Input: n integers put in array A[1..n] on the shared memory • Output: array B[1..n], where for each B[i] (1≦i≦n) • B[i] = A[1] + A[2] + .... + A[i] • ExampleInput: A[1..5] = (5, 8, -7, -10, 3), • Output: B[1..5] = (5, 13, 6, -4, -1) • Sequential algorithm for Prefix Sum • main (){ • B[1] = A[1]; • for (i = 2; i≦n; i++) { • B[i] = B[i-1] + A[i]; • } • }

  5. Solving Prefix Sum problem on balanced binary tree (1) Balanced binary tree • Outline of the parallel algorithm for prefix sum • To simplify the problem, let n = 2k(k is an integer) • Calculate the sub-sum from the leaves to the root in bottom up style. • Using the sub-sum obtained in (1) , calculate the prefix sum from the root to the leaves in up down style.

  6. 12-10 12 =2 P 1 12-5 2 -(-4) P 12 P 2 1 2 =7 =6 P P P P 1 2 3 4 6 2 12-(-2) 12 7 7 -(-3) 2 -5 6 -2 =14 =10 =-3 =4 Solving Prefix Sum problem on balanced binary tree (2) Balanced binary tree • First read the input at the leaves. Then, • calculate the sub-sum from the leaves • to the root in bottom up style. • (2) From the root to the leaves, do • the following: send the right son • its sub-sum obtained in (1), and • send the left son the value of • (its sub-sum) – the right son’s sub-sum).

  7. Solving Prefix Sum problem on balanced binary tree (3) Balanced binary tree • Correctness of the algorithm • When step (1) finished, the sub-sum in each internal node is the sum of its subtree.

  8. P 1 12-10 P P 1 2 =2 P P P P 1 2 3 4 5 7 (a) (b) Solving Prefix Sum problem on balanced binary tree (4) Balanced binary tree • In step (2), at each internal node • The sub-sum sent to the right son is the summation of its subtree. • (b) The sub-sum sent to the left son is the sum of its subtree subtracted by the sum of its right son’s subtree. Correctness of the algorithm - Continue 12

  9. Solving Prefix Sum problem on balanced binary tree (5) Balanced binary tree • Algorithm Parallel-PrefixSum (EREW PRAM algorithm) • main (){ • if (number of processor == i) B[0, i] = A[i]; • for (h=1; h≦log n; h++) { • if (number of processor j ≦ n/2h) { • B[h, j] = B[h-1, 2j-1] + B[h-1, 2j]; • } • } • C[log n, 1] = B[log n, 1] • for (h = (log n) - 1; h≧0; h--) { • if (number of processor j ≦ n/2h) { • if (j is even) C[h, j] = C[h+1, j/2]; • if (j is odd) C[h, j] = C[h+1, (j+1)/2] - B[h, j+1]; • } • } • }

  10. B[2,1] B[2,2] B[1,1] B[1,2] B[1,3] B[1,4] B[0,1] B[0,3] B[0,4] B[0,5] B[0,6] B[0,7] B[0,8] B[0,2] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] Solving Prefix Sum problem on balanced binary tree (6) Balanced binary tree • (First step) B[3,1]

  11. C[3,1] C [2,1] C [2,2] C [1,1] C [1,2] C [1,3] C [1,4] C [0,1] C [0,3] C [0,4] C [0,5] C [0,6] C [0,7] C [0,8] C [0,2] Solving Prefix Sum problem on balanced binary tree (7) Balanced binary tree • (Second step)

  12. Solving Prefix Sum problem on balanced binary tree (8) Balanced binary tree • Analysis of the algorithm • Computing time: for loop repeated log n times and each loop can be executed in O(1) time →O(log n) time • Number of processors: Not larger than n →n processors • Speed up: O(n/log n) • Cost: O(n log n) It is not cost optimal since the running time of the optimal Θ(n).

  13. Balanced binary tree • To reduce the cost, solve the problem sequentially when the size of the problem is small. Accelerated cascading Accelerated cascading is used, usually, with balanced binary tree and divide and conquer techniques.

  14. Balanced binary tree • Policy for improving the algorithm • To make the algorithm cost optimal, we decrease the number of processors from n to n/logn. • (Note: Computing time of the algorithm is O(logn).) • Steps: • Instead of processing n elements in parallel, divide n elements into n/logn groups with logn elements each. • To each group assign one processor and solve the problem for the group sequentially. Accelerated cascading for Prefix Sum problem

  15. A[1..n] log n elements Balanced binary tree Accelerated cascading for Prefix Sum problem • Improved algorithm Parallel-PrefixSum • Divide n elements in A[1..n] in to n/log n groups with log n elements each. •  (O(1) time,O(n/log n) processors) • (2)Assign each group one processor and find the prefix sum for each group. • (O(log n) time,O(n/log n) processors)

  16. Balanced binary tree • Improved algorithm Parallel-PrefixSum - continue • (3)Let S be the set of the last element in each group (it is the sum of the group). Use algorithm Parallel-PrefixSum to find the prefix sum of S. • ( O(log (n/log n) ) = O(log n) time,O(n/log n) processors) Accelerated cascading for Prefix Sum problem (3) Algorithm Parallel-PrefixSum Last element in each group

  17. Balanced binary tree • Improved algorithm Parallel-PrefixSum - continue • (4)Use the prefix sum of S to find the prefix sum of the input • A[1..n]. • (O(log n) time,O(n/log n) processor) Accelerated cascading for Prefix Sum problem (4) Result of (3)

  18. It is cost optimal. • It is also time optimal • ( Don’t show the proof here) • It is optimal algorithm. Balanced binary tree Accelerated cascading for Prefix Sum problem (5) Analysis of the improved algorithm Computing time and the number of processors: • Each step: O(log n) time, O(n/log n) • → Totally, O(log n) time, O(n/log n) processors • Speed up = O(n/log n) • Cost: O(log n × n/log n) = O(n)

  19. Divide and Conquer Divide and conquer • (1)2 divide and conquer • (2) n divide and conquer(ε<1) ε

  20. Divide and Conquer Divide and conquer technique • Well known technique in algorithm design • Solving problems recursively • Used very often in both sequential and parallel algorithms • How to divide and conquer • Dividing step: dividing the problem into a number of subproblems. • Conquering step: solving each subproblem recursively. • (3) Merging step: merging the solutions of subproblems to the solution of the original problem.

  21. P Upper convex hull 8 P 1 P 7 P P 4 2 P P 6 9 P 0 P 0 3 5 9 8 1 3 Lower convex hull P 5 Output: ( P ,P ,P ,P ,P ,P ) Divide and Conquer • Convex hull problem • Input: a set of n points in the plane. • Output: the smallest convex polygon which contains all points of the • input. (The convex polygon is represented by the list of its • vertices in order of clockwise.) • Basic problems in computational geometry. • A lot of applications. • Solved in O(nlogn) time sequentially. In the following we only consider the upper convex hull. (Upper convex hull: ( P9, P8, P1, P0 ) )

  22. Common upper tangent = (p ,p ) 3 8 p 2 p 3 p p p 9 4 8 p p 6 7 p 1 p p 5 10 It is known that common tangents can be found in O(log n) time sequentially. Merging of two upper convex hulls Divide and Conquer Finding the upper common tangent

  23. 2 divide and conquer (1) Divide and Conquer • Outline of the algorithm Parallel-UpperConvexHull • PreprocessingSort all the points according to their x coordinates, and let the result is the sequence (p1, p2, p3, ... , pn). • If the size of sequence is 2, return the sequence. • Divide (p1, p2, p3, ... , pn) to the left half part and the right half part, and find the upper convex hull of each recursively. • (3) Find the upper common tangent of two upper convex hulls obtained in (2), and output the solution of the problem.

  24. 2 divide and conquer (2) Divide and Conquer • How 2 divide and conquer works Find the upper common tangent for two upper convex hulls of two vertices each. Find the upper common tangent for two upper convex hulls of four vertices each. Find the upper common tangent for two upper convex hulls of eight vertices each.

  25. n n/2 n/2 Height= n/4 n/4 n/4 log n n/4 2 2 2 2 2 2 2 Divide and Conquer 2 divide and conquer(3) Recursive execution • When the problem is divide once, the size of the subproblem becomes half. • Suppose the size of the subproblems becomes 2 when the problem is divided k times. n/2k= 2 ⇒  k = log2 n - 1

  26. Divide and Conquer 2divide and conquer (4) • Complexity of the algorithm • Preprocessing: O(log n) time,n processors • Steps (1)〜(3):each step runs O(log n) time,use n/2 processors T(n) = T(n/2) + O(log n) Therefore, T(n) = O(log n) • ∴The algorithm runs in O(log n) time using O(n) processors. • Computational model: There is no concurrent access ⇒ EREW PRAM • ProprocessingSort the sequence of the points according to their x coordinates. • If the size of the sequence is 2, return the sequence. • Divide the sequence into the left half part and the right half part, • and find the upper convex hull of each recursively. • Find the upper common tangent of two upper convex hulls obtained in (2), • and output the upper convex hull of the sequence. 2 2

  27. Processors Time c log n 1 n c log n/2 2 n/2 n/2 c log n/4 n/4 n/4 n/4 n/4 4 Height log n 2 c n/2 2 2 2 2 2 2 2 O(log n) n/2 Totally 2 T(n)×P(n)=O(nlog n)It is not cost optimal Divide and Conquer 2divide and conquer (5) • Finding the complexity of the algorithm from recursive tree • Computing time • Number of processors At the level of the leaves, n/2 processors are used at the same time. ⇒n/2 processors

  28. 1/2 1/2 n n Divide and Conquer n divide and conquer 1/2 • Outline of the algorithm • Preprocessing • Sort the sequence of the input points according to their x coordinates, and let the result be sequence (p1, p2, p3, ... , pn). • (1)If the size of the sequence is 2, return the sequence. • (2)Divide (p1, p2, p3, ... , pn) to equally-sized subsequence, and find the upper convex hull of each recursively. • (3)Merge upper convex hulls into the upper convex hull of the sequence.

  29. Merging upper convex hull 1/2 n Case 1 Case 2 n 1/2 Divide and Conquer Assign each upper convex hull processors to find the upper common tangents in O(log n) time, and then determine the edges which belong to the solution.

  30. k 1/(2 ) n 1/4 1/4 1/2 1/2 1/4 1/2 n n n n n n 2 2 2 2 2 2 Divide and Conquer Recursive tree of n divide and conquer 1/2 • When the problem is divided once, the size of the subproblems becomes . • Suppose that the size of the subproblems becomes 2 when the problem is divided k times. • = 2 ⇒  k = log log n 1/2 n n Height= loglog n

  31. 1/2 1/2 n n PreprocessingSort the sequence of the points in their x coordinates. (1)If the size of the sequence is 2,return the sequence. (2)Divide the sequence intoequally-sized subsequences, and find the upper convex hull of each recursively. (3)Find the upper common tangents of the upper convex hulls obtained in (2), and determine the solution. T(n)×P(n)=O(nlog n)Optimal !!! Divide and Conquer Analysis of the algorithm • Preprocessing:O(log n) time,n processors. • Steps (1)〜(3):each step O(log n) time,n processors. T(n) = T(n) + O(log n), therefore, T(n) = O(log n) • ∴Totally, the algorithm runs in O(log n) time using O(n) processors. Computational model • Concurrent reading happens in the procedure of finding the upper common tangents ⇒CREW PRAM 1/2

  32. Exercise • 1. Suppose nxn matrix A and matrix B are saved in two dimension arrays. Design a PRAM algorithm for A×B using n and nxn processors, respectively. Answer the following questions: • What PRAM models that you use in your algorithms? • What are the runings time? • Are you algorithms cost optimal? • Are your algorithms time optimal? • 2. Design a PRAM algorithm for A×B using k (k <= nxn processors). Answer the same questions.

More Related