Lecture 4 : Accelerated Cascading and Parallel List Ranking

Lecture 4 : Accelerated Cascading and Parallel List Ranking • We will first discuss a technique called accelerated cascading for designing very fast parallel algorithms. • We will then study a very important technique for ranking the elements of a list in parallel. Advanced Topics in Algorithms and Data Structures

Fast computation of maximum Input: An array A holding p elements from a linearly ordered universe S. We assume that all the elements in A are distinct. Output: The maximum element from the array A. We use a boolean array M such that M(k)=1 if and only if A(k) is the maximum element in A. Initialization: We allocate p processors to set each entry in M to 1. Advanced Topics in Algorithms and Data Structures

Fast computation of maximum Step 1: Assign p processors for each element in A, p2 processors overall. • Consider the p processors allocated to A(j). We name these processors as P1, P2,..., Pi,..., Pp. • Pi compares A(j) with A(i) : IfA(i) > A(j)thenM(j) := 0 else do nothing. Advanced Topics in Algorithms and Data Structures

Fast computation of maximum Step 2: At the end of Step 1, M(k) , 1  k  p will be 1 if and only if A(k) is the maximum element. • We allocate p processors, one for each entry in M. • If the entry is 0, the processor does nothing. • If the entry is 1, it outputs the index k of the maximum element. Advanced Topics in Algorithms and Data Structures

Fast computation of maximum Complexity: The processor requirement is p2 and the time complexity is O(1). • We need concurrent write facility and hence the Common CRCW PRAM model. Advanced Topics in Algorithms and Data Structures

Optimal computation of maximum • This is the same algorithm which we used for adding n numbers. Advanced Topics in Algorithms and Data Structures

Optimal computation of maximum • This algorithm takes O(n) processors and O(log n) time. • We can reduce the processor complexity to O(n / log n). Hence the algorithm does optimal O(n) work. Advanced Topics in Algorithms and Data Structures

An O(log log n) time algorithm • Instead of a binary tree, we use a more complex tree. Assume that . • The root of the tree has children. • Each node at the i-th level has children for . • Each node at level k has two children. Advanced Topics in Algorithms and Data Structures

An O(log log n) time algorithm Some Properties • The depth of the tree is k. Since • The number of nodes at the i-th level is Prove this by induction. Advanced Topics in Algorithms and Data Structures

An O(log log n) time algorithm The Algorithm • The algorithm proceeds level by level, starting from the leaves. • At every level, we compute the maximum of all the children of an internal node by the O(1) time algorithm. • The time complexity is O(log log n) since the depth of the tree is O(log log n). Advanced Topics in Algorithms and Data Structures

An O(log log n) time algorithm Total Work: • Recall that the O(1) time algorithm needs O(p2) work for p elements. • Each node at the i-th level has children. • So the total work for each node at the i-th level is . Advanced Topics in Algorithms and Data Structures

An O(log log n) time algorithm Total Work: • There are nodes at the i-th level. Hence the total work for the i-th level is: • For O(log log n) levels, the total work is O(n log log n) . This is suboptimal. Advanced Topics in Algorithms and Data Structures

Accelerated cascading • The first algorithm which is based on a binary tree, is optimal but slow. • The second algorithm is suboptimal, but very fast. • We combine these two algorithms through the accelerated cascading strategy. Advanced Topics in Algorithms and Data Structures

Accelerated cascading • We start with the optimal algorithm until the size of the problem is reduced to a certain value. • Then we use the suboptimal but very fast algorithm. Advanced Topics in Algorithms and Data Structures

Accelerated cascading Phase 1. • We apply the binary tree algorithm, starting from the leaves and upto log log log n levels. • The number of candidates reduces to • The total work done so far is O(n)and the total time is O(log log log n). Advanced Topics in Algorithms and Data Structures

Accelerated cascading Phase 2. • In this phase, we use the fast algorithm on the remaining candidates. • The total work is . • The total time is . • Theorem: Maximum of n elements can be computed in O(log log n)time and O(n)work on the Common CRCW PRAM. Advanced Topics in Algorithms and Data Structures

Lecture 4 : Accelerated Cascading and Parallel List Ranking