1 / 15

CBE Lagrange Interpolation Implementation

CBE Lagrange Interpolation Implementation. By:- Anvesha Katti(2008012)(TL) ‏ A.Ramayya Ch(2008001) ‏ Arvind Kumar. R(2008016) ‏ Sarat Chandra. N(2008087) ‏. Study CBE arch. And Lagrange interpolation. PROJECT ARCHITECTURE. Algorithm for implementing Lagrange interpolation on

josef
Download Presentation

CBE Lagrange Interpolation Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CBE Lagrange Interpolation Implementation By:- Anvesha Katti(2008012)(TL)‏ A.Ramayya Ch(2008001)‏ Arvind Kumar. R(2008016)‏ Sarat Chandra. N(2008087)‏

  2. Study CBE arch. And Lagrange interpolation PROJECT ARCHITECTURE Algorithm for implementing Lagrange interpolation on multiple parallel processors Algorithm for implementing Lagrange interpolation on multiple parallel processors (n point) Algorithm for implementing Lagrange interpolation on a uniprocessor Pseudo code for implementing Lagrange interpolation on multiple parallel processors(n point) Pseudo code for implementation of Lagrange interpolation on a uniprocessor Pseudo code for implementing Lagrange interpolation on multiple parallel processors C code for implementing Lagrange interpolation on a multiple parallelprocessors (CBE)‏(n point) C code for implementing Lagrange interpolation on a uniprocessor C code for implementing Lagrange interpolation on a multiple parallelprocessors (CBE)‏(8 point)

  3. Work done • Completed C source code for Lagrange Interpolation on uni processor and uploaded it to SourceForge. • Completed Algorithm and pseudo code for Parallel implementation of Lagrange Implementation for eight points. • Completed C source code for Lagrange Interpolation for eight points on PS3 and uploaded it to SourceForge. • Completed Algorithm and pseudo code for Parallel implementation of Lagrange Implementation for n points. • Completed C source code for Lagrange Interpolation for n points on PS3 and uploaded it to SourceForge.

  4. Algorithm for Lagrange Interpolation on Parallel Processors(8 point) • Step 1: By use of multinode broadcast from the main processor every processor P il 1<=l<=4 in the subcube I, 1<=i<=s receives the data points x4(i-1)+j, 1<=j<=4, y4(i-1)+l and x. Multi node broadcast will be done in parallel for each I 1<=i<=s. • Step 2: Compute in parallel for all I and l , 1<=i<=s;1<l<=4 • 4 • Ail= ∏ (x4(i-1)+l-x4(i-1)+j)‏ • j=1 • j!=l • 4 • Bil=∏(x-x4(i-1)+j)‏ • j=1 • Step 3: For each I and l, 1<=i<=s;1<=l<=4 processor Pil transmits the value of the product Bil and x4(i-l)+l to each processor Pkl 1<=k<=s, k!=i via the main processor. • Step 4: Compute in parallel for all l and I,1<=l<=4;1<=i<=s • 4 • Akl= ∏ (x4(k-1)+l-x4(i-1)+j) 1<=k<=s • j=1

  5. Contd… • Step 5: For each l and I, 1<=<=4;1<=i<=s processor Pil transmits Akl to Pkl, 1<=k<=s,k!=i. This will be done in parallel for all I and l, 1<=i<=s;1<=l<=4. • Step 6: Do in parallel for all I and l,1<=i<=s;1<=l<=4. • s • A*il= Ail ∏ Ail • k=1 • k!=i • s • B*il=Bil ∏ Bkl • k=1 • k!=i • Step 7: Compute in parallel for all I and l, 1<=i<=s; 1<=l<=4 • Cil=(B*ily4(i-1)+l)/(A*(x-x4(i-1)+l))‏ • Now each processor Pil,contains the value Cil,1<=i<=s;1<=l<=4 • Step 8: Compute F(x)= ∑i ∑l Cil.

  6. Algorithm for N point Lagrange Interpolation • For 1 ≤ i ≤ s; 1 ≤ l ≤ 4, let Ail, Aikl, A*il, Cil be one-dimensional arrays with maximum size q. Denote by Xil, the q-tuple (x[4i+l-5]q+1, x[4i+l-5]q+2, . . ., x[4i+l-5]q+q). • Step 1: By use of multimode broadcast every processor Pil, 1≤ l ≤4 in the subcube i, 1≤ i ≤ s receives the vectors Xij, 1≤ j ≤ 4. Note that multimode broadcast can be done in parallel for i, 1≤ i ≤ s. In addition to these vectors, Pil, 1≤ i ≤ s; 1≤ l ≤ 4 has data points y[4i+l-5]q+m1≤ m ≤ q and x. • Step 2: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, 4q • Ail[m] = ∏ ( xm – x4(i - 1)q + j) j=1 j ≠ m-4(i-1)q • m = (4i + l -5)q + 1, . . . , (4i + l -5)q + q, 4q • Bil = ∏ (x – x4(i - 1)q + j) j=1

  7. Contd… • Step 3: For l, 1≤ l ≤ 4 and i, 1≤ i ≤ s processor Pil transmits the value of the product Bil and the vector Xil to each processor Pkl, 1≤ k ≤ s, k ≠ I; 1≤ l ≤ 4. (Alternatively, for all l and i, 1≤ l ≤ 4; 1≤ i ≤ sPil receives product Bkl and vector Xkl from each Pkl, 1≤ k ≤ s, k ≠ i.) This will be done in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4. • Step 4: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, 4q • Aiil [m] = ∏ ( xm – x4(i - 1)q + j) j = 1 • m = (4i + l -5)q + 1, . . . , (4i + l -5)q + q. • For 1≤ i ≤ s; 1≤ l ≤ 4; 1≤ k ≤ s, k ≠ i, let • A*kl = (Aikl{(4i + l -5)q + 1}, . . . , Aikl{(4i + l -5)q + q}). • Step 5: Then, for i and l, 1≤ i ≤ s; 1≤ l ≤ 4 processor Pil transmits Aikl1≤ k ≤ s, k ≠ i to Pkl. (Alternatively, for all l and i, 1≤ l ≤ 4; 1≤ i ≤ sPil receives product Akil and vector Xkl from each Pkl, 1≤ k ≤ s, k ≠ i.) Again, this will be done in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4.

  8. Contd… • Step 6: Do in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, s • A*il [m] = Ail [m] ∏ Akil[m], m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q, k = 1 k ≠ 1 s • B*il = Bil ∏ Bkl k = 1 k ≠ 1 • Step 7: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4. • Cil[m] = (B*il / (x - xm))*(ym/A*il[m]),m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q. • Step 8: On each processor Pil,1≤ i ≤ s; 1≤ l ≤ 4 compute in parallel, q • Cil = ∑ Cil[m], m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q.

  9. Contd… • Step 9: Now, each processor Pil contains the value Cil, 1≤ i ≤ s; 1≤ l ≤ 4. Compute • F(x) = ∑ ∑ Cil • i l

  10. Explanation of the Algorithm • Each of P11 ,P12 , P13 , P14 - Each of P21,P22,P23,P24 • gets x1 , x2 , x3 , x4 , y1 , y2 , y3 , y4 & x gets x5 , x6 , x7 , x8 , y5 , y6 , y7 , y8 & x • Do in parallel for each node • A11= (x1-x2)(x1-x3)(x1-x4) denominator for S1 • A12= (x2-x1)(x2-x3)(x2-x4) denominator for S2 • A13= (x3-x1)(x3-x2)(x3-x4) denominator for S3 • A14= (x4-x1)(x4-x2)(x4-x3) denominator for S4 • B11=B12=B13=B14=(x-x1)(x-x2)(x-x3)(x-x4) numerator for S1 , S2 , S3 , S4. • Numerator and Denominator • (A11, B11) is calculated on P11 • (A12, B12) is calculated on P12 • (A13, B13) is calculated on P13 • (A14, B14) is calculated on P14 • All are done simultaneously. • Similarly, • (A21, B21) is calculated on P21 • (A22, B22) is calculated on P22 • (A23, B23) is calculated on P23 • (A24, B24) is calculated on P24 • B21=B22=B23=B24=(x-x5)(x-x6)(x-x7)(x-x8)‏

  11. Contd… • Communication Phase: • P11 ,P12 , P13 , P14 transmits • B11 ,B12 , B13 , B14 and x1 , x2 , x3 , x4 and P21,P22,P23,P24 simultaneously. • A211= (x5-x1)(x5-x2)(x5-x3)(x5-x4) – calculated on P11 • A221= (x6-x1)(x6-x2)(x6-x3)(x6-x4) – calculated on P12 • A231= (x7-x1)(x7-x2)(x7-x3)(x7-x4) – calculated on P13 • A241= (x8-x1)(x8-x2)(x8-x3)(x8-x4) – calculated on P14 • All are done simultaneously. • A111= (x1-x5)(x5-x6)(x5-x7)(x5-x8) – calculated on P21 • A121= (x2-x5)(x2-x6)(x2-x7)(x2-x8) – calculated on P22 • A131= (x3-x5)(x3-x6)(x3-x7)(x3-x8) – calculated on P23 • A141= (x4-x5)(x4-x6)(x4-x7)(x4-x8) – calculated on P24

  12. Contd… • Computation phase: • A*11 = A11A211 = (x1-x2)(x1-x3)(x1-x4)(x1-x5)(x1-x6)(x1-x7)(x1-x8) • will be calculated on 1st processor. • A*12 = A12A212 = (x2-x1)(x2-x3)(x2-x4)(x2-x5)(x2-x6)(x2-x7)(x2-x8)‏ • will be calculated on 2nd processor. • A*13 = A13A213 = (x3-x1)(x3-x2)(x3-x4)(x3-x5)(x3-x6)(x3-x7)(x3-x8)‏ • will be calculated on 3rd processor. • A*14 = A14A214 = (x4-x2)(x4-x3)(x4-x1)(x4-x5)(x4-x6)(x4-x7)(x4-x8) • will be calculated on 4th processor. • A*21 = A21A121 = (x5-x1)(x5-x2)(x5-x3)(x5-x4)(x5-x6)(x5-x7)(x5-x8) • will be calculated on 5th processor. • A*22 = A22A122 = (x6-x1)(x6-x2)(x6-x3)(x6-x4)(x6-x5)(x6-x7)(x6-x8)‏ • will be calculated on 6th processor. • A*23 = A23A123 = (x7-x1)(x7-x2)(x7-x3)(x7-x4)(x7-x5)(x7-x6)(x7-x8)‏ • will be calculated on 7th processor. • A*24 = A24A124 = (x8-x1)(x8-x2)(x8-x3)(x8-x4)(x8-x5)(x8-x6)(x8-x7)‏ • will be calculated on 8th processor.

  13. Contd… • B*11 = B11B211 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 1st processor. • B*12 = B12B212 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 2nd processor. • B*13 = B13B213 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 3rd processor. • B*14 = B14B214 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 4th processor. • B*21 = B21B121 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 5th processor. • B*22 = B22B122 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 6th processor. • B*23 = B23B123 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 7th processor. • B*24 = B24B124 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 8th processor.

  14. Contd… • C11= (B*11.y1)/(A*11.(x-x1))‏ • C12= (B*12.y2)/(A*12.(x-x2))‏ • C13= (B*13.y3)/(A*13.(x-x3))‏ • C14= (B*14.y4)/(A*14.(x-x4))‏ • C21= (B*21.y5)/(A*21.(x-x5))‏ • C22= (B*22y6)/(A*22.(x-x6))‏ • C23= (B*23.y7)/(A*23.(x-x7))‏ • C24= (B*24.y8)/(A*24.(x-x8))‏

  15. Challenges Faced • PPE to SPE transfer of data. • Making the structure size a multiple of 16. • Making the variable size a multiple of 16. • SPE to PPE transfer of data.

More Related