1 / 43

Basic procedures on processor networks

Basic procedures on processor networks. Presenter : Kuan-Hsin Lin. Outline. Data scattering Matrix-vector multiplication Parallel matrix multiplication Sorting problems. Data scattering. How data is to be mapped Division by points or blocks Division by rows or columns. Data scattering.

questa
Download Presentation

Basic procedures on processor networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic procedures on processor networks Presenter : Kuan-Hsin Lin

  2. Outline • Data scattering • Matrix-vector multiplication • Parallel matrix multiplication • Sorting problems

  3. Data scattering • How data is to be mapped • Division by points or blocks • Division by rows or columns

  4. Data scattering • Divide the matrix into p square blocks Matrix A Network

  5. Data scattering • Division into consecutive rows

  6. Data scattering • Snake configuration division

  7. 0 1 4 0 1 4 2 3 3 2 8 12 8 12 Data scattering • Recursive division

  8. 0 1 14 15 3 2 13 12 4 7 8 11 5 6 9 10 Data scattering • Recursive division by proximity

  9. Network 0 1 Data scattering • Division by packets of consecutive rows Matrix A 1 2 3 4 5 6 7 8 9 10 11 12

  10. Network 0 1 Data scattering • Circular division by packets of rows Matrix A 1 2 3 4 5 6 7 8 9 10 11 12

  11. Matrix-vector multiplication • Definition • A: nxn matrix • x: a vector of Rn • Computing the product of a matrix by a vector,v=Ax For i = 1 to n v(i)=0 for j=1 to n v(i) = v(i) + A(i,j)*x(j)

  12. Matrix-vector multiplication • Row-oriented allocation • Column-oriented allocation • Allocation by blocks • Pipelined allocation

  13. Matrix-vector multiplication • Row-oriented allocation i =

  14. Matrix-vector multiplication • Algorithm ATA on the n/p local components of vector xFor all processors q from 0 to p-1 in parallel For all k=1 to n/p do v(q+(k-1)p+1)=scalar product of row q+(k-1)p+1 of A and vector x

  15. Matrix-vector multiplication • Column-oriented allocation =

  16. Matrix-vector multiplication • Algorithm For all processors q=0 to p-1 do in parallelFor all k=1 to n/p do v-temporary=multiplication of column q+(k-1)p of A by the component q+(k-1)p+1 of vector x personalized-ATA with accumulation of the temporary vectors v

  17. Matrix-vector multiplication • Allocation by blocks Matrix A Network Vector x

  18. Matrix-vector multiplication • Algorithm (partial) ATA of vector xFor all processors q=0 to p-1 do in parallel partial vector v=matrix-vector product of the block by the partial vector x that has just been received personalized ATA with (partial) accumulation of the partial vectors v

  19. x1x5 x2x6 x3x7 x4x8 Matrix-vector multiplication • Pipeline allocation(First stage) Components of x: Row 1 3 Row 5 7 Proc. 0 Proc. 1 Proc. 2 Proc. 3 2 4 6 8

  20. x4x8 x1x5 x2x6 x3x7 Matrix-vector multiplication • Pipeline allocation(Second stage) Components of x: Row 1 3 Row 5 7 Proc. 0 Proc. 1 Proc. 2 Proc. 3 2 4 6 8

  21. x3x7 x4x8 x1x5 x2x6 Matrix-vector multiplication • Pipeline allocation(Third stage) Components of x: Row 1 3 Row 5 7 Proc. 0 Proc. 1 Proc. 2 Proc. 3 2 4 6 8

  22. Cij Parallel matrix multiplication For i=1 to n For j=1 to n c(i,j)=0 For k=1 to n c(i,j)=c(i,j)+A(i,k)*B(k,j) i = * j

  23. Parallel matrix multiplication • Parallelization on a toric grid C11=A11*B11+A12*B21+A13*B31 C12=A11*B12+A12*B22+A13*B32 C13=A11*B13+A12*B23+A13*B33 C21=A21*B11+A22*B21+A23*B31 C22=A21*B12+A22*B22+A23*B32 C23=A21*B13+A22*B23+A23*B33

  24. A11 A12 A13 B11 B22 B33 [1,1] [1,2] [1,3] A22 A23 A21 B21 B32 B13 [2,1] [2,2] [2,3] A33 A31 A32 B31 B12 B23 [3,1] [3,2] [3,3] Generalization – First stage

  25. A12 A13 A11 B21 B32 B13 [1,1] [1,2] [1,3] A23 A21 A22 B31 B12 B23 [2,1] [2,2] [2,3] A31 A32 A33 B11 B22 B33 [3,1] [3,2] [3,3] Generalization – Second stage

  26. A13 A11 A12 B31 B12 B23 [1,1] [1,2] [1,3] A21 A22 A23 B11 B22 B33 [2,1] [2,2] [2,3] A32 A33 A31 B21 B32 B13 [3,1] [3,2] [3,3] Generalization – Third stage

  27. The link with systolic algorithms • The operations performed by each cell: • Read an operand on channel N:op1=N • Read an operand on channel W:op2=W • Execute the internal operation:R=R+op1*op2 • Transmit an operand on channel S:S=op1 • Transmit an operand on channel E:E=op2

  28. Basic cell in the systolic network N R W E S

  29. Product of square matrices B33 B32 B23 B31 B22 B13 B21 B12 - B11 - - A13 A12 A11 A23 A22 A21  | A33 A32 A31  | |

  30. Adaptation of the computation B33 B32 B23 B31 B22 B13 B21 B12 B33 B11 B32 B23 A13 A12 A11 A23 A22 A21 A23 A33 A32 A31 A33 A32

  31. Fast parallel multiplication C= = M0=(A11+A22)(B11+B22)M1=(A12-A22)(B21+B22) M2=A22 (B21-B11) M3=(A11+A12)B22M4=(A21+A22)B11M5=A11(B12-B22)M6=(A21-A11)(B11+B12) Where

  32. T1=A11+A22T2=B11+B22T3=A12-A22T4=B21+B22T5=B21-B11T6=A11+A12T7=A21+A22T8=B12-B22T9=A21-A11T10=B11+A12 M0=T1*T2M1=T3*T4M2=A22*T5M3=T6*B22M4=T7*B11M5=A11*T8M6=T9*T10T11=M0+M1T12=M2-M3T13=M5-M4 Defined tasks T14=M0+M6T15=T11+T12T16=M2+M4T17=M3+M5T18=T13-T14

  33. The task graph T3 T4 T1 T2 T9 T10 T5 T6 T7 T8 M1 M0 M6 M2 M3 M4 M5 T11 T14 T12 T16 T17 T13 T15 T18

  34. P0 P1 P2 P3 P4 P5 P6 Initial allocation of matrix blocks B22 B22 B12 B11 B21 B21 B22 B11 B22 B11 A22 A22 B11 A12 A22 B12 A21 A11 A12 A22 A11 A21 A11 A11

  35. T1 T3 T6 T8 T9 T2 T4 T5 T7 T10 Execution scheme Stage1: Local computations Stage2: Local computations

  36. M0 M1 T11 M2 T12 M3 M4 T13 M5 M6 T14 Execution scheme M0 M3 M3 M4 M0 Stage3: computations followed by local communications T12 M4 M4 M3 T13 Stage4: computations followed by local communications

  37. T15 T16 T17 T18 Execution scheme Stage5: Local computations

  38. Sorting problems • Odd-even sorting algorithms on a ring{program for processor Pi}for stage=1 to n if even stage then compare-exchange(key[i-1],key[i]) else compare-exchange(key[i],key[i+1])

  39. P0 P1 P6 P2 P3 P7 P4 P5 1 9 2 0 5 3 8 5 Example of odd-even sort

  40. Stage 1 1 9 0 2 5 8 3 5 1 0 9 2 5 3 8 5 Example of odd-even sort List to be sorted 1 9 2 0 8 5 3 5

  41. Stage 2 Stage 3 0 1 2 9 3 5 5 8 0 1 2 3 5 9 5 8 0 1 2 3 9 5 5 8 0 1 2 3 5 5 9 8 Example of odd-even sort

  42. Stage 4 0 1 2 3 5 5 8 9 0 1 2 3 5 5 8 9 Example of odd-even sort

  43. Thank you

More Related