CBE Lagrange Interpolation Implementation

CBE Lagrange Interpolation Implementation By:- Anvesha Katti(2008012)(TL)‏ A.Ramayya Ch(2008001)‏ Arvind Kumar. R(2008016)‏ Sarat Chandra. N(2008087)‏

Study CBE arch. And Lagrange interpolation PROJECT ARCHITECTURE Algorithm for implementing Lagrange interpolation on multiple parallel processors Algorithm for implementing Lagrange interpolation on multiple parallel processors (n point) Algorithm for implementing Lagrange interpolation on a uniprocessor Pseudo code for implementing Lagrange interpolation on multiple parallel processors(n point) Pseudo code for implementation of Lagrange interpolation on a uniprocessor Pseudo code for implementing Lagrange interpolation on multiple parallel processors C code for implementing Lagrange interpolation on a multiple parallelprocessors (CBE)‏(n point) C code for implementing Lagrange interpolation on a uniprocessor C code for implementing Lagrange interpolation on a multiple parallelprocessors (CBE)‏(8 point)

Work done • Completed C source code for Lagrange Interpolation on uni processor and uploaded it to SourceForge. • Completed Algorithm and pseudo code for Parallel implementation of Lagrange Implementation for eight points. • Completed C source code for Lagrange Interpolation for eight points on PS3 and uploaded it to SourceForge. • Completed Algorithm and pseudo code for Parallel implementation of Lagrange Implementation for n points. • Completed C source code for Lagrange Interpolation for n points on PS3 and uploaded it to SourceForge.

Algorithm for Lagrange Interpolation on Parallel Processors(8 point) • Step 1: By use of multinode broadcast from the main processor every processor P il 1<=l<=4 in the subcube I, 1<=i<=s receives the data points x4(i-1)+j, 1<=j<=4, y4(i-1)+l and x. Multi node broadcast will be done in parallel for each I 1<=i<=s. • Step 2: Compute in parallel for all I and l , 1<=i<=s;1<l<=4 • 4 • Ail= ∏ (x4(i-1)+l-x4(i-1)+j)‏ • j=1 • j!=l • 4 • Bil=∏(x-x4(i-1)+j)‏ • j=1 • Step 3: For each I and l, 1<=i<=s;1<=l<=4 processor Pil transmits the value of the product Bil and x4(i-l)+l to each processor Pkl 1<=k<=s, k!=i via the main processor. • Step 4: Compute in parallel for all l and I,1<=l<=4;1<=i<=s • 4 • Akl= ∏ (x4(k-1)+l-x4(i-1)+j) 1<=k<=s • j=1

Contd… • Step 5: For each l and I, 1<=<=4;1<=i<=s processor Pil transmits Akl to Pkl, 1<=k<=s,k!=i. This will be done in parallel for all I and l, 1<=i<=s;1<=l<=4. • Step 6: Do in parallel for all I and l,1<=i<=s;1<=l<=4. • s • A*il= Ail ∏ Ail • k=1 • k!=i • s • B*il=Bil ∏ Bkl • k=1 • k!=i • Step 7: Compute in parallel for all I and l, 1<=i<=s; 1<=l<=4 • Cil=(B*ily4(i-1)+l)/(A*(x-x4(i-1)+l))‏ • Now each processor Pil,contains the value Cil,1<=i<=s;1<=l<=4 • Step 8: Compute F(x)= ∑i ∑l Cil.

Algorithm for N point Lagrange Interpolation • For 1 ≤ i ≤ s; 1 ≤ l ≤ 4, let Ail, Aikl, A*il, Cil be one-dimensional arrays with maximum size q. Denote by Xil, the q-tuple (x[4i+l-5]q+1, x[4i+l-5]q+2, . . ., x[4i+l-5]q+q). • Step 1: By use of multimode broadcast every processor Pil, 1≤ l ≤4 in the subcube i, 1≤ i ≤ s receives the vectors Xij, 1≤ j ≤ 4. Note that multimode broadcast can be done in parallel for i, 1≤ i ≤ s. In addition to these vectors, Pil, 1≤ i ≤ s; 1≤ l ≤ 4 has data points y[4i+l-5]q+m1≤ m ≤ q and x. • Step 2: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, 4q • Ail[m] = ∏ ( xm – x4(i - 1)q + j) j=1 j ≠ m-4(i-1)q • m = (4i + l -5)q + 1, . . . , (4i + l -5)q + q, 4q • Bil = ∏ (x – x4(i - 1)q + j) j=1

Contd… • Step 3: For l, 1≤ l ≤ 4 and i, 1≤ i ≤ s processor Pil transmits the value of the product Bil and the vector Xil to each processor Pkl, 1≤ k ≤ s, k ≠ I; 1≤ l ≤ 4. (Alternatively, for all l and i, 1≤ l ≤ 4; 1≤ i ≤ sPil receives product Bkl and vector Xkl from each Pkl, 1≤ k ≤ s, k ≠ i.) This will be done in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4. • Step 4: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, 4q • Aiil [m] = ∏ ( xm – x4(i - 1)q + j) j = 1 • m = (4i + l -5)q + 1, . . . , (4i + l -5)q + q. • For 1≤ i ≤ s; 1≤ l ≤ 4; 1≤ k ≤ s, k ≠ i, let • A*kl = (Aikl{(4i + l -5)q + 1}, . . . , Aikl{(4i + l -5)q + q}). • Step 5: Then, for i and l, 1≤ i ≤ s; 1≤ l ≤ 4 processor Pil transmits Aikl1≤ k ≤ s, k ≠ i to Pkl. (Alternatively, for all l and i, 1≤ l ≤ 4; 1≤ i ≤ sPil receives product Akil and vector Xkl from each Pkl, 1≤ k ≤ s, k ≠ i.) Again, this will be done in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4.

Contd… • Step 6: Do in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4, s • A*il [m] = Ail [m] ∏ Akil[m], m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q, k = 1 k ≠ 1 s • B*il = Bil ∏ Bkl k = 1 k ≠ 1 • Step 7: Compute in parallel for all i and l, 1≤ i ≤ s; 1≤ l ≤ 4. • Cil[m] = (B*il / (x - xm))*(ym/A*il[m]),m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q. • Step 8: On each processor Pil,1≤ i ≤ s; 1≤ l ≤ 4 compute in parallel, q • Cil = ∑ Cil[m], m = (4i + l - 5)q + 1, . . . , (4i + l - 5)q + q.

Contd… • Step 9: Now, each processor Pil contains the value Cil, 1≤ i ≤ s; 1≤ l ≤ 4. Compute • F(x) = ∑ ∑ Cil • i l

Explanation of the Algorithm • Each of P11 ,P12 , P13 , P14 - Each of P21,P22,P23,P24 • gets x1 , x2 , x3 , x4 , y1 , y2 , y3 , y4 & x gets x5 , x6 , x7 , x8 , y5 , y6 , y7 , y8 & x • Do in parallel for each node • A11= (x1-x2)(x1-x3)(x1-x4) denominator for S1 • A12= (x2-x1)(x2-x3)(x2-x4) denominator for S2 • A13= (x3-x1)(x3-x2)(x3-x4) denominator for S3 • A14= (x4-x1)(x4-x2)(x4-x3) denominator for S4 • B11=B12=B13=B14=(x-x1)(x-x2)(x-x3)(x-x4) numerator for S1 , S2 , S3 , S4. • Numerator and Denominator • (A11, B11) is calculated on P11 • (A12, B12) is calculated on P12 • (A13, B13) is calculated on P13 • (A14, B14) is calculated on P14 • All are done simultaneously. • Similarly, • (A21, B21) is calculated on P21 • (A22, B22) is calculated on P22 • (A23, B23) is calculated on P23 • (A24, B24) is calculated on P24 • B21=B22=B23=B24=(x-x5)(x-x6)(x-x7)(x-x8)‏

Contd… • Communication Phase: • P11 ,P12 , P13 , P14 transmits • B11 ,B12 , B13 , B14 and x1 , x2 , x3 , x4 and P21,P22,P23,P24 simultaneously. • A211= (x5-x1)(x5-x2)(x5-x3)(x5-x4) – calculated on P11 • A221= (x6-x1)(x6-x2)(x6-x3)(x6-x4) – calculated on P12 • A231= (x7-x1)(x7-x2)(x7-x3)(x7-x4) – calculated on P13 • A241= (x8-x1)(x8-x2)(x8-x3)(x8-x4) – calculated on P14 • All are done simultaneously. • A111= (x1-x5)(x5-x6)(x5-x7)(x5-x8) – calculated on P21 • A121= (x2-x5)(x2-x6)(x2-x7)(x2-x8) – calculated on P22 • A131= (x3-x5)(x3-x6)(x3-x7)(x3-x8) – calculated on P23 • A141= (x4-x5)(x4-x6)(x4-x7)(x4-x8) – calculated on P24

Contd… • Computation phase: • A*11 = A11A211 = (x1-x2)(x1-x3)(x1-x4)(x1-x5)(x1-x6)(x1-x7)(x1-x8) • will be calculated on 1st processor. • A*12 = A12A212 = (x2-x1)(x2-x3)(x2-x4)(x2-x5)(x2-x6)(x2-x7)(x2-x8)‏ • will be calculated on 2nd processor. • A*13 = A13A213 = (x3-x1)(x3-x2)(x3-x4)(x3-x5)(x3-x6)(x3-x7)(x3-x8)‏ • will be calculated on 3rd processor. • A*14 = A14A214 = (x4-x2)(x4-x3)(x4-x1)(x4-x5)(x4-x6)(x4-x7)(x4-x8) • will be calculated on 4th processor. • A*21 = A21A121 = (x5-x1)(x5-x2)(x5-x3)(x5-x4)(x5-x6)(x5-x7)(x5-x8) • will be calculated on 5th processor. • A*22 = A22A122 = (x6-x1)(x6-x2)(x6-x3)(x6-x4)(x6-x5)(x6-x7)(x6-x8)‏ • will be calculated on 6th processor. • A*23 = A23A123 = (x7-x1)(x7-x2)(x7-x3)(x7-x4)(x7-x5)(x7-x6)(x7-x8)‏ • will be calculated on 7th processor. • A*24 = A24A124 = (x8-x1)(x8-x2)(x8-x3)(x8-x4)(x8-x5)(x8-x6)(x8-x7)‏ • will be calculated on 8th processor.

Contd… • B*11 = B11B211 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 1st processor. • B*12 = B12B212 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 2nd processor. • B*13 = B13B213 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 3rd processor. • B*14 = B14B214 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 4th processor. • B*21 = B21B121 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 5th processor. • B*22 = B22B122 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 6th processor. • B*23 = B23B123 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 7th processor. • B*24 = B24B124 = (x-x1)(x1-x3)(x-x4)(x-x5)(x-x6)(x-x7)(x-x8) • will be calculated on 8th processor.

Contd… • C11= (B*11.y1)/(A*11.(x-x1))‏ • C12= (B*12.y2)/(A*12.(x-x2))‏ • C13= (B*13.y3)/(A*13.(x-x3))‏ • C14= (B*14.y4)/(A*14.(x-x4))‏ • C21= (B*21.y5)/(A*21.(x-x5))‏ • C22= (B*22y6)/(A*22.(x-x6))‏ • C23= (B*23.y7)/(A*23.(x-x7))‏ • C24= (B*24.y8)/(A*24.(x-x8))‏

Challenges Faced • PPE to SPE transfer of data. • Making the structure size a multiple of 16. • Making the variable size a multiple of 16. • SPE to PPE transfer of data.

CBE Lagrange Interpolation Implementation

CBE Lagrange Interpolation Implementation

Presentation Transcript

Interpolation and the Lagrange Polynomial

Lagrange Multipliers

Interpolation

CBE 491 / CBE 433

LAGRANGE mULTIPLIERS

CBE 491 / CBE 433

LAGRANGE MULTIPLIER

Interpolation

Interpolation

Developing a Professional Learning Plan for CBE Implementation

3. Lagrange Interpolation:

3. LAGRANGE INTERPOLATION METHOD (LAGRANGE POLYNOMIAL):

Lagrange Multipliers

LaGrange College

Interpolation

Interpolation

CBE

INTERPOLATION

3. LAGRANGE INTERPOLATION METHOD (LAGRANGE POLYNOMIAL):