Concurrent Programming

Concurrent Programming נכתב על ידי מאיר בכור 027382977 אביתר שרעבי 32033946

Module • The module we are talking about is: • computer with multiple processors but only one memory unit. • All the processors are synchronized using the same clock. • The processors are all connected to each other and to the memory. • If more then one processor writes the same value to the same address in memory at the same time then the value will be written correctly. If the values are not the same then any value can be written.

Module • More then one processor can read the same memory address at the same time. • Other modules: • The processors are on different computers. • There is no sheared memory for all the processors. • The processors are not using the same clock.

Array Maximum Problem • On a computer with one processor: • Time: O(N). • Algorithm: Going over an array and keeping the maximum. • On a computer with K processors: • Time: O(N/K). • Algorithm: Each processor handles N/K elements from the array. And all the sum's of the parts of the array are summed together.

Array Maximum Problem • On a computer with O(N) processors. • Time: O(log(N)). • Algorithm: On the first stage every processor will add 2 items. So after the first round will have N/2 numbers. On the next round N/4 processors each will take 2 numbers and sum them so we will have on ly N/4 result after the 2 round. After log(N) rounds we will have the sum of the array.

Array Maximum Problem 1 2 3 4 5 6 7 8 Example: 8 elements time 3 = Log(8).

Array Maximum Problem • The number of commutations that are performed is 7 (4 in the first round, 2 in the second and 1 in the last). This is the same number of computation that is being done in the serial algorithm but it’s being done in less time. • This Algorithm will work for a lot of other functions not just Max like Min, Sum, Avg. It will work for every Associative function.

Finding The Two Greatest Numbers • Simple solution for O(N) processors. • Algorithm: Find the first maximum remove it from the array and find the second. • Time: 2 Log(N). • Smart algorithm for O(N) processors. • Algorithm: • First round: each processor handles 2 items find the max and puts the other item in a. • Rounds 2..log(n): each processors handles 2 of the result of the second round compares the 2 Max values takes the Max as the new Max. and Takes the candidate group of the new max adds the max of the second group to it as the new candidate group.

Finding The Two Greatest Numbers • On The last round the Max of the array is the maximum and the second max is the maximum of the candidate group. • Sample: • Array: 7, 10, 1, 3, 100, 8, 55, 6.

Finding The Two Greatest Numbers 100 8 55 10 10100 7 8 3 55 10310055 7 1 8 6 7 10 1 3 100 8 55 6 Results: The maximum is the maximum of the array (100) and the second maximum is the maximum of the candidate group (55).

Finding The Two Greatest Numbers • Time: Log(N) + LogLog(N). Log(N) to find the first maximum and the candidate group. LogLog(N) to find the maximum in the candidate group. The candidate group size grows in 1 in each round (the maximum of the other group) so at the end it’s size is Log(N).

Merge problem • Description: We have 2 sorted N size arrays B, C and we need to divide them into 2 new N sized arrays A1, A2 that the N largest items from both B and C will be in A1 and the N smallest will be in A2. • Simple solution: We can merge B and C into one sorted array A and copy the firs N elements to A1 and the last N elements to A2. But with this algorithm we can’t use multiple processors the cost will still be O(N).

Merge problem • Smart algorithm for O(N) processors. • Processor I compares Bi with Cn+1-i the largest of the two is going to A1 and the other to A2. • Correction proof. • If Bi > Cn+1-i the Bi > B1..Bi-1 and Cn+1-i > C1..Cn-iso Bi is larger then N elements (I - 1 from B and N - i + 1 from C) so Bi needs to be in A1. • If Cn+1-i > Bi then Cn+1-i is larger then N elements ( N - I from C and I from B ) so Cn+1-i needs to be in A1.

Merge problem • Example: B: 1, 8, 10, 17C: 9, 12, 67, 100(B1, Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).A1 : 100, 67, 12, 17.A2 : 1, 8, 10, 9. • Time: We can do all the comparisons at the same time so the cost will be O(1).

Prefix Problem • Description: Find the sum of the elements group.S11 = X1S12 = X1 + X2S1n = X1 + X2 +… Xn-1+Xn • Simple solution: Compute the sums with N processors time O(NLogN) N sums where each one takes O(LogN).

Prefix Problem • Algorithm: for I = 0 to n-1 doip Si = Xi for j = 0 to log n do for I = 2^j to n-1 doip Si = Si + Si-2^j The doip means do in parallel in the different processor. At the end the results are in the array s.

Prefix Problem • Example: With 8 numbers X1..X8 Sij is Xi + Xi+1… + Xj. X1 X2 X3 X4 X5 X6 X7 X8 S11 S12 S23 S34 S45 S56 S67 S78 S11 S12 S13 S14 S25 S36 S47 S58 S11 S12 S13 S14 S15 S16 S17 S18

Prefix Problem • Time:each round we get double the result S1i so after log(n) rounds we will get all the result. • In order to use this algorithm each processor needs to be connected to log(n) other processors.

Prefix Problem • Usage exampleProblem : we have an arithmetic expression and we need to test if the brackets arrangement is legal. Algorithm: we will create an array x by adding 1 for each “(“ and -1 for each “)”. And run the prefix algorithm. The results needs to be.S11 = 1 and S11..S1n-1>=0 and S1n = 0.Time with N processors : O(logN) log(N) for the prefix algorithm and O(1) for the test.

Partition Problem • Description: We have and array X that some of it’s element are signed we need to move all the signed elements to one array and the none signed to another array. • Simple solution: We take 2 stacks we push the signed into one stack and the none signed into the other stack. It will take o(N) time. • Simple solution 2: We take two indexes one for the start of the array and one to the end. The first search for signed and the second for none signed and when they both find they exchange the items they point to and move on until they meet. This will take o(N) time too but it’s more parallel.

Partition Problem • Smart algorithm for O(N) processors: • Create a new array B but in be if the element i is signed B[i] = 1 else B[i] = 0. • Create an array C with the prefix sums of B that is C[i] = B[1] + B[2] + … B[i]. • If X[i] is signed then Y1[C[i]] = X[i]. • If X[i] is not signed then Y2[i-C[i]] = X[i].

Partition Problem • Example: X = 2, 4, 7, 8, 1, 3, 10, 12, 15. X = 2, 4, 7, 8, 1, 3, 10, 12, 15 B = 0, 1, 0, 0, 0, 1, 1, 0, 1 C = 0, 1, 1, 1, 1, 2, 3, 3, 4 Y1 = 4, 3, 10, 15 Y2 = 2, 7, 8, 1, 12

Partition Problem • Time with O(N) processor.Computing B: O(1).Computing C: O(log(n)) using the prefix algorithm.Computing Y1 and Y2: O(1).Total: O(log(n)).

Sorting Algorithm • Description: Sorting array A using O(N^2) processors and put the result into array C. • Simple algorithm: The serial algorithm for sorting an array takes a minimum of O(Nlog(N)) time. • Smart algorithm • Create a matrix B size of N*N and initialize it with zeroes at all cells. • We will look at the N^2 processor as a matrix of processors. Processor Pi,j will compute Ai>=Aj if true then B[i,j] =1.

Sorting Algorithm • For each i from 1 to N C[Sum(i)] = A[i]. When Sum(i) is the sum of B[i,1] to B[i,N]. • Example: A=3, 5, 2, 9, 1Matrix B 1 2 3 4 5 1 1 0 1 0 1 2 1 1 1 0 1 3 0 0 1 0 1 4 1 1 1 1 0 5 0 0 0 0 1

Sorting Algorithm C = 1, 2, 3, 5, 9. • Time: Using O(N^2) processors finding B matrix will take O(1) and finding C will cost O(log(N)). So the total cost of the algorithm will be O(log(N)). Using O(N) processors finding B will take O(N) time and finding C will take O(N) time so the total will be O(N).

Sorting Algorithm • Description: Sorting array A using O(N^2) processors and put the result into array C. • Algorithm: Merge sort the largest cost in the merge sort algorithm is the cost of the merge. Using a serial algorithm the cost of merging 2 sorted arrays is O(N) and the cost of the merge sort algorithm is O(Nlog(N)). We will use the regular algorithm but with a smarter merge algorithm.

Sorting Algorithm • Smart merge algorithm • Description: We need to merge two sorted arrays A, B to a sorted array R. • Algorithm: We will describe a recursive algorithm Merge.C=merge(even(A), odd(B)).D=merge(odd(A), even(B)).Where odd(A) is all the items in A with an Odd index. And Even(A) is all the items in A with an even index.

Sorting Algorithm • When C = C0, C1, C2….Cn D = D0, D1, D2….DnE=C0, D0, C1, D1…Cn, Dn.Compare each Ci,Di and if Ci>Di then replace Ci and Di in array E.And array E is the merger of C and D.

Sorting Algorithm • Example: A = 3, 5, 8, 10 B = 4, 7, 9, 12Even(A) = 5 ,10 Odd(A) = 3, 8Even(B) = 7, 12 Odd(B) = 4, 9C = 3, 7, 8, 12D = 4, 5, 9, 10E = 3, 4, 7, 5, 8, 9, 12, 10After replacing in EE = 3, 4, 5, 7, 8, 9, 10, 12 • Time: Using O(N) processors the merge will take O(log(N)) time The merge sort runs the merge algorithm log(N) times so the total cost of the merge sort is O(log^2(N)).

Find Algorithm • Description: If array X contains the value Val the Res needs to be True else Res needs to be False. • Simple Algorithm: Using a serial algorithm it will take O(N) time. • Smart Algorithm: Using O(N) processor. Res = False. Each process i tests if X[I] = Val if true Res = True. • Time: O(1).

Model Description • Many processors. • Processors can send messages to each other through communication. • We will want that each processor will have a unique identification. • Since we have O(n) processors we need O(logn) bit to represent the Id.

Model Description • Clean Net: when a processor doesn’t now anything about his neighbors, not even their Id’s. he only knows how many neighbors he have. • We will explicitly mention when dealing with Clean Net, otherwise every processor has a unique Id.

Model Description • Message should include sender and receiver Id and some information - total O(logn) bits. • If X wants to send message to Y through Z, it will cost 2 steps to send the message. X Z Y

Model Description • Local computation doesn’t take time. • we will analyze:time complexity - the number of steps the algorithm takes in the worst case.communication complexity - the total number of messages that we sent in the execution of the algorithm in the worst case.

Distributed vs. Sequential • Communication - we need in the distributed model but not in the sequential. • Partial knowledge - together all the processor knows everything, but not all the processors necessarily knows everything. • There can be processors or communication channels down.

Distributed vs. Sequential • Synchronization - we need to synchronize the processor.

Synchronic Model • there is a global clock. • In any clock cycle each of the processor- send messages to his neighbors.- receive messages from his neighbors.- make local computation in 0 time.- change state.

Asynchronies Model • There is no global clock. • if a message was sent it will eventually arrive to its destination (with no fall downs) but we can't assume anything about the arrival time. • we will start the time from the beginning of the execution until the last processor stooped.

Asynchronies Model • We will force the assumption that any of the messages arrived in one time unit in the worst case for time complexity calculations.

Model Representation • We can represent the processors net with a graph. • Each node in the graph is a processor. • There is an edge between two nodes if there is a direct communication channel between the two processors they represent.

Complexity • C(, G, I) - communication complexity:the total number of messages that were sent in the execution in the worst case. • T(, G, I) - time complexity:the number of clock cycles that the execution take in the worst case. • Where  is the protocol, G is the graph and I is the input.

Complexity - examples • The following examples are in a full graph. 1 2 n

Complexity - example 1 • Protocol A: node 1 send the message m to node 2. • C(A, G, I) = 1. • T(A, G, I) = 1. 1 2 m

Complexity - example 2 • Protocol B: node 1 send the message mi to the node i. • C(B, G, I) = n. • T(B, G, I) = 1. 1 i mi iG

Complexity - example 3 • Protocol C: node i send the message mi to node i+1. • C(C, G, I) = n. • T(C, G, I) = 1. i i+1 mi iG

Complexity - example 4 • Protocol D: node i send the message m to node i+1 in cycle i. • C(D, G, I) = n. • T(D, G, I) = n. m 1 2 m 2 3 . . .

Transmission Problem • Input: there is a message m in the node V0. • Output: the message m is written in all the nodes in the graph. • dG(x,y) - the shortest path from x to y in graph G. • D = Diameter(G) = max x,yV { dG(x,y) }.

Algorithms for the Transmission Problem • Direct Delivery. • Spanning Tree. • DFS. • Flooding.

Direct Delivery • Bases on the assumptions:- there is a routing system, such as that messages are sent in the shortest path.- V0 knows the addresses of all other nodes in the graph. • V0 send the message m n-1 times, each time to a different node.

Concurrent Programming