330 likes | 344 Views
Unit –VIII PRAM Algorithms. Classification of the PRAM model. In the PRAM model, processors communicate by reading from and writing to the shared memory locations. The power of a PRAM depends on the kind of access to the shared memory locations. Engineered for Tomorrow.
E N D
Unit –VIII PRAM Algorithms
Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing to the shared memory locations. The power of a PRAM depends on the kind of access to the shared memory locations. Engineered for Tomorrow
Engineered for Tomorrow Classification of the PRAM model • In every clock cycle, • In the Exclusive Read Exclusive Write (EREW) PRAM, each memory location can be accessed only by one processor. • In the Concurrent Read Exclusive Write (CREW) PRAM, multiple processor can read from the same memory location, but only one processor can write. • In the Concurrent Read Concurrent Write (CRCW) PRAM, multiple processor can read from or write to the same memory location.
Engineered for Tomorrow • In the Common CRCW PRAM, all the processors must write the same value. • In the Arbitrary CRCW PRAM, one of the processors arbitrarily succeeds in writing. • In the Priority CRCW PRAM, processors have priorities associated with them and the highest priority processor succeeds in writing.
Engineered for Tomorrow • The EREW PRAM is the weakest and the Priority CRCW PRAM is the strongest PRAM model. • The relative powers of the different PRAM models are as follows.
Engineered for Tomorrow • We say model A is less powerful compared to model B if either: • the time complexity for solving a problem is asymptotically less in model B as compared to model A. • or • if the time complexities are the same, the processor or work complexity is asymptotically less in model B as compared to model A. • An algorithm designed for a stronger PRAM model can be simulated on a weaker model either with asymptotically more processors (work) or with asymptotically more time.
Engineered for Tomorrow Adding n numbers on a PRAM Adding n numbers on a PRAM
Engineered for Tomorrow Adding n numbers on a PRAM • This algorithm works on the EREW PRAM model as there are no read or write conflicts. • We will use this algorithm to design a matrix multiplication algorithm on the EREW PRAM.
Engineered for Tomorrow Matrix multiplication For simplicity, we assume that n = 2p for some integer p.
Engineered for Tomorrow Matrix multiplication • Each can be computed in parallel. • We allocate n processors for computing ci,j. Suppose these processors are P1, P2,…,Pn. • In the first time step, processor • computes the product ai,m x bm,j. • We have now n numbers and we use the addition algorithm to sum these n numbers in log n time.
Engineered for Tomorrow Matrix multiplication • Computing each takes n processors and log n time. • Since there are n2 such ci,j s, we need overall O(n3) processors and O(log n) time. • The processor requirement can be reduced to O(n3 / log n). • Hence, the work complexity is O(n3)
Engineered for Tomorrow Matrix multiplication For simplicity, we assume that n = 2p for some integer p.
Engineered for Tomorrow • Hence our algorithm runs on the CREW PRAM and we need to avoid the read conflicts to make it run on the EREW PRAM. • We will create ncopies of each of the elements ai,j(and bi,j). Then one copy can be used for computing each ci,j. • Creating n copies of a number in O (log n) time using O (n) processors on the EREW PRAM. • In the first step, one processor reads the number and creates a copy. Hence, there are two copies now. • In the second step, two processors read these two copies and create four copies.
Engineered for Tomorrow • Since the number of copies doubles in every step, n copies are created in O(log n) steps. • Though we need n processors, the processor requirement can be reduced to O (n / log n). • Since there are n2 elements in the matrix A (and in B), we need O (n3 / log n) processors and O (log n) time to create n copies of each element. • After this, there are no read conflicts in our algorithm. The overall matrix multiplication algorithm now take O (log n) time and O (n3 / log n) processors on the EREW PRAM.
Parallel Algorithms Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. Engineered for Tomorrow • Multiple processors connected • to a shared memory. • Each processor access any • location in unit time. • All processors can access • memory in parallel. • All processors can perform • operations in parallel. Shared memory p0 p1 pn-1
Concurrent vs. Exclusive Access Four models EREW: exclusive read and exclusive write CREW: concurrent read and exclusive write ERCW: exclusive read and concurrent write CRCW: concurrent read and concurrent write Handling write conflicts Common-write model: only if they write the same value. Arbitrary-write model: an arbitrary one succeeds. Priority-write model: the one with smallest index succeeds. EREW and CRCW are most popular. Engineered for Tomorrow
Synchronization and Control Synchronization: A most important and complicated issue Suppose all processors are inherently tightly synchronized: All processors execute the same statements at the same time No race among processors, i.e, same pace. Termination control of a parallel loop: Depend on the state of all processors Can be tested in O(1) time. Engineered for Tomorrow
Pointer Jumping – list ranking Given a single linked list L with n objects, compute, for each object in L, its distance from the end of the list. Formally: suppose next is the pointer field d[i]= 0 if next[i]=nil d[next[i]]+1 if next[i]nil Serial algorithm: (n). Engineered for Tomorrow
List ranking –EREW algorithm LIST-RANK(L) (in O(log n) time) for each processor i, in parallel doifnext[i]=nil thend[i]0 elsed[i]1 while there exists an object i such that next[i]nil dofor each processor i, in parallel do ifnext[i]nil then d[i] d[i]+ d[next[i]] next[i] next[next[i]] Engineered for Tomorrow
List-ranking –EREW algorithm 3 4 6 1 0 5 (a) 1 1 1 1 1 0 4 4 3 2 1 0 5 4 3 2 1 0 Engineered for Tomorrow 3 4 6 1 0 5 (b) 2 2 2 2 1 0 3 4 6 1 0 5 (c) 3 4 6 1 0 5 (d) 20
List ranking –correctness of EREW algorithm Loop invariant: for each i, the sum of d values in the sub-list headed by i is the correct distance from i to the end of the original list L. Parallel memory must be synchronized: the reads on the right must occur before the writes on the left. Moreover, read d[i] and then read d[next[i]]. An EREW algorithm: every read and write is exclusive. For an object i, its processor reads d[i], and then its precedent processor reads its d[i]. Writes are all in distinct locations. Engineered for Tomorrow 21
LIST ranking EREW algorithm running time O(log n): The initialization for loop runs in O(1). Each iteration of while loop runs in O(1). There are exactly log n iterations: Each iteration transforms each list into two interleaved lists: one consisting of objects in even positions, and the other odd positions. Thus, each iteration double the number of lists but halves their lengths. The termination test in line 5 runs in O(1). Define work = #processors running time. O(n log n). Engineered for Tomorrow
Parallel prefix on a list A prefix computation is defined as: Input: <x1, x2, …, xn> Binary associative operation Output:<y1, y2, …, yn> Such that: y1= x1 yk= yk-1 xk fork=2,3, …,n, i.e, yk= x1 x2 … xk . Suppose <x1, x2, …, xn> are stored orderly in a list. Define notation: [i,j]= xi xi+1 … xj Engineered for Tomorrow 23
Prefix computation LIST-PREFIX(L) for each processor i, in parallel doy[i] x[i] while there exists an object i such that next[i]nil dofor each processor i, in parallel do ifnext[i]nil then y[next[i]] y[i] y[next[i]] next[i] next[next[i]] Engineered for Tomorrow
Prefix computation –EREW algorithm x1 x2 x4 x5 x6 x3 (a) [4,4] [5,5] [1,1] [2,2] [3,3] [6,6] x1 x1 x1 x6 x6 x6 x2 x2 x2 x5 x5 x5 x3 x3 x3 [1,1] [1,2] [1,3] [1,4] [1,5] [1,6] Engineered for Tomorrow x4 (b) [1,1] [1,2] [2,3] [3,4] [4,5] [5,6] (c) [1,1] [1,2] [1,3] [1,4] [2,5] [3,6] (d)
Find root –CREW algorithm Suppose a forest of binary trees, each node i has a pointer parent[i]. Find the identity of the tree of each node. Assume that each node is associated a processor. Assume that each node i has a field root[i]. Engineered for Tomorrow
CREW algorithm FIND-ROOTS(F) for each processor i, in parallel doif parent[i] = nil then root[i]i while there exist a node i such that parent[i] nil dofor each processor i, in parallel do if parent[i] nil then root[i] root[parent[i]] parent[i] parent[parent[i]] Engineered for Tomorrow
Running time: O(log d), where d is the height of maximum-depth tree in the forest. All the writes are exclusive But the read in line 7 is concurrent, since several nodes may have same node as parent. Engineered for Tomorrow 28
CREW v/s EREW Q: How fast can n nodes in a forest determine their roots using only exclusive read? A: Engineered for Tomorrow (log n) Argument: when exclusive read, a given peace of information can only be copied to one other memory location in each step, thus the number of locations containing a given piece of information at most doubles at each step. Looking at a forest with one tree of n nodes, the root identity is stored in one place initially. After the first step, it is stored in at most two places; after the second step, it is Stored in at most four places, …, so need lg n steps for it to be stored at n places. 29
Find maximum – CRCW algorithm FAST-MAX(A) nlength[A] fori 0 ton-1, in parallel dom[i] true fori 0 ton-1 and j 0 ton-1, in parallel do ifA[i] < A[j] thenm[i] false fori 0 ton-1, in parallel doifm[i] =true thenmax A[i] returnmax A[j] 5 6 9 2 9 m 5 F T T F T F 6 F F T F T F 9 F F F F F T 2 T T T F T F 9 F F F F F T A[i] max=9 Engineered for Tomorrow The running time is O(1).
CRCW v/s EREW If find maximum using EREW, then (lg n). Argument: consider how many elements “think” that they might be the maximum. First, n, After first step, n/2, After second step n/4. …, each step, halve. Moreover, CREW takes (log n). Engineered for Tomorrow
CRCW v/s EREW CRCW: Some say : easier to program and more faster. Others say: The hardware to CRCW is slower than EREW. And one can not find maximum in O(1). Still others say: either EREW or CRCW is wrong. Processors must be connected by a network, and only be able to communicate with other via the network, so network should be part of the model. Engineered for Tomorrow
Thank You.. Engineered for Tomorrow