Symbolic program transformation for numerical codes
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Symbolic Program Transformation for Numerical Codes PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Symbolic Program Transformation for Numerical Codes. Vijay Menon Cornell University. Background. Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) Sparse matrix code generation Memory hierarchy optimizations Compiler/runtime techniques for MATLAB

Download Presentation

Symbolic Program Transformation for Numerical Codes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Symbolic program transformation for numerical codes

Symbolic Program Transformation for Numerical Codes

Vijay Menon

Cornell University


Background

Background

  • Compiler/Runtime Support for Numerical Programs (under Keshav Pingali)

    • Sparse matrix code generation

    • Memory hierarchy optimizations

    • Compiler/runtime techniques for MATLAB

      • MAJIC (with University of Illinois)

      • MultiMATLAB


Motivation

Motivation

  • Using matrix properties to generate efficient code

    • e.g., Associativity, Commutativity of matrix operations

  • Problem: loop notation

    • Matrix operations hidden within loop nests


Example lu factorization with partial pivoting

Example: LU Factorization with Partial Pivoting

  • Also called: Gaussian Elimination

  • Key algorithm for solving systems of linear equations:

    • To solve A x = b for x:

      • => Factor A into L U

      • => Solve L y = b for y

      • => Solve U x = y for x


Lu factorization

LU Factorization

do j = 1,N

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

  • Problem: Poor cache behavior

x x x x x x

0 x x x x x

0 0 5 x x x

0 0 3 x x x

0 0 7x x x

0 0 2 x x x


Automatic blocking

Automatic Blocking

do jB = 1,N,B

do j = jB,jB+B-1

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,jB+B-1

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1

do k = jB+B,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

  • Compiler transformations:

    • Stripmine

    • Index-Set Split

    • Loop Distribute

    • Tile/Map to BLAS

  • Problems:

    • Establishing legality

    • Mapping to BLAS


Lu performance

LU Performance


Key points

Key Points

  • Conventional techniques (dependence analysis) are insufficient to establish legality

  • Detection of matrix operations buys extra performance


Overview of this talk

Overview of this Talk

  • Fractal Symbolic Analysis

    • Framework for Symbolically determining Legality of Program Transformations

  • Matrixization

    • Generalization of Vectorization to Matrix Operations


Fractal symbolic analysis

Fractal Symbolic Analysis

  • Symbolic test to establish legality of program transformations

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);


Dependence analysis

Dependence Analysis

  • Independent operations may be reordered

  • Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);


Symbolic comparison

Symbolic Comparison

  • Dependence Analysis is conservative:

  • Symbolic execution shows equality:

    • aout = 2*(ain + bin)

    • bout = 2*bin

  • But, intractable for recurrent loops

s1: a = 2*a

s2: b = 2*b

s3: a = a+b

s3: a = a+b

s1: a = 2*a

s2: b = 2*b

?


Distillation of lu

Distillation of LU

  • Given p(j)  j, prove:

  • Dependence analysis: too conservative

  • Symbolic comparison: intractable

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):


Symbolic program transformation for numerical codes

Idea

  • Simplify

    • Prove equality of complex programs via equality of simpler programs

    • Repeat if necessary

  • Sufficient, but not necessary

    • equality of simpler programs -> equality of original programs


Commutativity

Commutativity

S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j  1..n | i < j  f(i) > f(j).

Si; Sj = Sj; Si

  • Permutation  Sequence of Adjacent Transpositions


Commutative legality

Commutative Legality

  • Loop Distribution

  • Legality:

     1  m < l  N. S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n

S1(i);

S2(i);

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);


Commutative legality1

Commutative Legality

  • Similar conditions:

    • Statement reordering

    • Loop interchange

    • Loop reversal

    • Loop tiling

  • Compiler establishes legality of transformation by testing commute condition


Back to example

Back to Example

  • Given p(j)  j, prove:

  • Simplified comparison =>

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):


Simplified comparison

Simplified Comparison

  • Given p(j)  j  l > m prove:

  • Further simplification?

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

B2(m):

B1(l):

B1(l):

B2(m):


Too simple

Too Simple!

  • Given p(j)  j  l > m  i > m prove:

  • Too conservative - no longer legal!

  • Hypothesis:

    • Repeated application -> dependence analysis

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

a(i) = a(i)/a(m)

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp


Symbolic comparison1

Symbolic Comparison

  • Can we symbolically prove?:

  • Effect can be summarized:

    • non-recurrent loops

    • affine indices/loop bounds

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

B2(m):

B1(l):

B1(l):

B2(m):


Guarded expressions

Guarded Expressions

  • Compare symbolic values of live output variable in terms of input variables:

  • aout (k)=

  • Each

    • guard : affine expression defining part of aout

    • expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……

guardn -> exprn


Guarded expressions1

Guarded Expressions

  • For both program blocks:

  • aout(k) =

  • Omega Library (integer programming tool) to manipulate/compare affine guards

k  m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)


Summary of fractal symbolic analysis

Summary of Fractal Symbolic Analysis

  • Powerful legality test

  • Explores tradeoffs between

    • tractability (dependence analysis)

    • accuracy (symbolic comparison)

  • Similar application for LU w/ pivoting

    • 2 recursive simplification steps

    • 6 guarded regions/expressions

  • Prototype implemented in OCAML


Overview of this talk1

Overview of this Talk

  • Fractal Symbolic Analysis

    • Framework for Symbolically determining Legality of Program Transformations

  • Matrixization

    • Generalization of Vectorization to Matrix Operations


Matrixization

Matrixization

  • Detect Matrix Operations

    • Map to

      • hand-optimized libraries (BLAS)

      • special-purpose hardware

    • Eliminate loop overhead

      • MATLAB: type/bounds checks

  • Exploit Matrix Semantics

    • Ring Properties of Matrix (+,*)


Blas performance

BLAS Performance


Matlab performance

MATLAB Performance


Galerkin example

Galerkin Example

  • In Galerkin, 98% of time spent in:

  • for i = 1:N

  • for j = 1:N

  • phi(k) += A(i,j)*x(i)*y(i);

  • end

  • end

  • A - Matrix

  • x, y- Vector


Vectorized code

Vectorized Code

  • In Optimized Galerkin:

    phi(k) += x*A*y’;

  • Fragment Speedup: 260

  • Program Speedup:110


Conventional approaches

Conventional Approaches

  • Syntactic Pattern Replacement (KAPF,VAST,FALCON)

    • Can encode high-level properties

    • Limited use on loops

  • Vector Code Generation

    • Can detect array operations in loops

    • Cannot detect/exploit matrix products


Our approach

Our Approach

  • Map code to: Abstract Matrix Form (AMF)

  • Convert to Symmetric AMF formulation

  • Optimize AMF expressions

    • factorization

    • invariant removal

  • Detect Matrix Products

  • Map AMF to MATLAB/BLAS


Expansion

Expansion

  • Array expressions:

  • Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n

x(i) = x(i) + y(i); -> ixi = i(xi+yi)

end


Reduction

Reduction

  • Sum Reduction Loops

for i = 1:n

k = k + x(i); -> k = k + î i xi

end


Product

Product

  • Matrix Product

    C = A*B -> ijCi,j= P(ijAi,h,ijBh,j)

  • Product-Reduction Equivalence:

    • e1 and e2 are scalar

    • e1 is constant w.r.t. ik+1…in

    • e2 is constant w.r.t. i1…ik-1

îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)


Other amf properties

Other AMF Properties

  • Distributive Properties

    • e1·î e2 = î((ie1) ·e2)

      • when e1 is constant w.r.t. i

  • Interchange Properties

    • if(e1, e2,…, en) = f(ie1, ie2,…, ien)

    • i  e = i e

      • where î=

    • i j e = j i e

    • î  e = î e


Back to galerkin

Back to Galerkin

  • Original:

    • k = k + îi j (ai,j * xi * yj)

  • Convert to Symmetric Form:

    • k = k + î(i j ai,j * i j xi * i j yj)

  • Optimize:

    • k = k + îi xi * (i j ai,j * i j yj)

  • Map to Matrix Operations:

    • k = k + Pî (i xi , P(i j ai,j,j yj))

  • => k = k + x’*A*y;


  • Other vectorization examples

    Other Vectorization Examples

    • Taken from USENET:

    for i = 1:n

    for j = 1:n

    C(i,j) = A(i,j)*x(i);

    C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

    for i = 1:n

    x(i) = y(:,i)*A*y(:,i)’;

    x(1:n) = sum(y.*(y*A’),2);


    Summary status of matrixization

    Summary/Status of Matrixization

    • General technique to detect matrix/vector operations

    • Implemented in MAJIC

      • MATLAB interpreter/compiler

    • Future work:

      • Extend to more ops: recurrences

      • JIT Matrixization


    Related work

    Related Work

    • Commutativity Analysis

      • Rinard & Diniz

      • Parallelization of OO programs

    • High-Level Pattern Replacement

      • DeRose, Marsolf, …

      • Exploit high-level matrix properties

        • Structure

        • Symmetry

        • Orthogonality


    Conclusion

    Conclusion

    • High-level symbolic techniques:

      • Fractal Symbolic Analysis

      • Matrixization

    • New techniques to analyze & exploit underlying computation and achieve substantial performance gains


  • Login