- 76 Views
- Uploaded on
- Presentation posted in: General

Symbolic Program Transformation for Numerical Codes

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Symbolic Program Transformation for Numerical Codes

Vijay Menon

Cornell University

- Compiler/Runtime Support for Numerical Programs (under Keshav Pingali)
- Sparse matrix code generation
- Memory hierarchy optimizations
- Compiler/runtime techniques for MATLAB
- MAJIC (with University of Illinois)
- MultiMATLAB

- Using matrix properties to generate efficient code
- e.g., Associativity, Commutativity of matrix operations

- Problem: loop notation
- Matrix operations hidden within loop nests

- Also called: Gaussian Elimination
- Key algorithm for solving systems of linear equations:
- To solve A x = b for x:
- => Factor A into L U
- => Solve L y = b for y
- => Solve U x = y for x

- To solve A x = b for x:

do j = 1,N

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

- Problem: Poor cache behavior

x x x x x x

0 x x x x x

0 0 5 x x x

0 0 3 x x x

0 0 7x x x

0 0 2 x x x

do jB = 1,N,B

do j = jB,jB+B-1

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,jB+B-1

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1

do k = jB+B,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

- Compiler transformations:
- Stripmine
- Index-Set Split
- Loop Distribute
- Tile/Map to BLAS

- Problems:
- Establishing legality
- Mapping to BLAS

- Conventional techniques (dependence analysis) are insufficient to establish legality
- Detection of matrix operations buys extra performance

- Fractal Symbolic Analysis
- Framework for Symbolically determining Legality of Program Transformations

- Matrixization
- Generalization of Vectorization to Matrix Operations

- Symbolic test to establish legality of program transformations

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

- Independent operations may be reordered
- Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

- Dependence Analysis is conservative:
- Symbolic execution shows equality:
- aout = 2*(ain + bin)
- bout = 2*bin

- But, intractable for recurrent loops

s1: a = 2*a

s2: b = 2*b

s3: a = a+b

s3: a = a+b

s1: a = 2*a

s2: b = 2*b

?

- Given p(j) j, prove:
- Dependence analysis: too conservative
- Symbolic comparison: intractable

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

- Simplify
- Prove equality of complex programs via equality of simpler programs
- Repeat if necessary

- Sufficient, but not necessary
- equality of simpler programs -> equality of original programs

S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j 1..n | i < j f(i) > f(j).

Si; Sj = Sj; Si

- Permutation Sequence of Adjacent Transpositions

- Loop Distribution
- Legality:
1 m < l N. S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n

S1(i);

S2(i);

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

- Similar conditions:
- Statement reordering
- Loop interchange
- Loop reversal
- Loop tiling

- Compiler establishes legality of transformation by testing commute condition

- Given p(j) j, prove:
- Simplified comparison =>

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

- Given p(j) j l > m prove:
- Further simplification?

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

B2(m):

B1(l):

B1(l):

B2(m):

- Given p(j) j l > m i > m prove:
- Too conservative - no longer legal!
- Hypothesis:
- Repeated application -> dependence analysis

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

a(i) = a(i)/a(m)

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

- Can we symbolically prove?:
- Effect can be summarized:
- non-recurrent loops
- affine indices/loop bounds

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

B2(m):

B1(l):

B1(l):

B2(m):

- Compare symbolic values of live output variable in terms of input variables:
- aout (k)=
- Each
- guard : affine expression defining part of aout
- expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……

guardn -> exprn

- For both program blocks:
- aout(k) =
- Omega Library (integer programming tool) to manipulate/compare affine guards

k m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)

- Powerful legality test
- Explores tradeoffs between
- tractability (dependence analysis)
- accuracy (symbolic comparison)

- Similar application for LU w/ pivoting
- 2 recursive simplification steps
- 6 guarded regions/expressions

- Prototype implemented in OCAML

- Fractal Symbolic Analysis
- Framework for Symbolically determining Legality of Program Transformations

- Matrixization
- Generalization of Vectorization to Matrix Operations

- Detect Matrix Operations
- Map to
- hand-optimized libraries (BLAS)
- special-purpose hardware

- Eliminate loop overhead
- MATLAB: type/bounds checks

- Map to
- Exploit Matrix Semantics
- Ring Properties of Matrix (+,*)

- In Galerkin, 98% of time spent in:
- for i = 1:N
- for j = 1:N
- phi(k) += A(i,j)*x(i)*y(i);
- end
- end
- A - Matrix
- x, y- Vector

- In Optimized Galerkin:
phi(k) += x*A*y’;

- Fragment Speedup: 260
- Program Speedup:110

- Syntactic Pattern Replacement (KAPF,VAST,FALCON)
- Can encode high-level properties
- Limited use on loops

- Vector Code Generation
- Can detect array operations in loops
- Cannot detect/exploit matrix products

- Map code to: Abstract Matrix Form (AMF)
- Convert to Symmetric AMF formulation
- Optimize AMF expressions
- factorization
- invariant removal

- Detect Matrix Products
- Map AMF to MATLAB/BLAS

- Array expressions:
- Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n

x(i) = x(i) + y(i); -> ixi = i(xi+yi)

end

- Sum Reduction Loops

for i = 1:n

k = k + x(i); -> k = k + î i xi

end

- Matrix Product
C = A*B -> ijCi,j= P(ijAi,h,ijBh,j)

- Product-Reduction Equivalence:
- e1 and e2 are scalar
- e1 is constant w.r.t. ik+1…in
- e2 is constant w.r.t. i1…ik-1

îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

- Distributive Properties
- e1·î e2 = î((ie1) ·e2)
- when e1 is constant w.r.t. i

- e1·î e2 = î((ie1) ·e2)
- Interchange Properties
- if(e1, e2,…, en) = f(ie1, ie2,…, ien)
- i e = i e
- where î=

- i j e = j i e
- î e = î e

- Original:
- k = k + îi j (ai,j * xi * yj)

- k = k + î(i j ai,j * i j xi * i j yj)

- k = k + îi xi * (i j ai,j * i j yj)

- k = k + Pî (i xi , P(i j ai,j,j yj))

- Taken from USENET:

for i = 1:n

for j = 1:n

C(i,j) = A(i,j)*x(i);

C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

for i = 1:n

x(i) = y(:,i)*A*y(:,i)’;

x(1:n) = sum(y.*(y*A’),2);

- General technique to detect matrix/vector operations
- Implemented in MAJIC
- MATLAB interpreter/compiler

- Future work:
- Extend to more ops: recurrences
- JIT Matrixization

- Commutativity Analysis
- Rinard & Diniz
- Parallelization of OO programs

- High-Level Pattern Replacement
- DeRose, Marsolf, …
- Exploit high-level matrix properties
- Structure
- Symmetry
- Orthogonality

- High-level symbolic techniques:
- Fractal Symbolic Analysis
- Matrixization

- New techniques to analyze & exploit underlying computation and achieve substantial performance gains