1 / 41

# Symbolic Program Transformation for Numerical Codes - PowerPoint PPT Presentation

Symbolic Program Transformation for Numerical Codes. Vijay Menon Cornell University. Background. Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) Sparse matrix code generation Memory hierarchy optimizations Compiler/runtime techniques for MATLAB

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Symbolic Program Transformation for Numerical Codes' - zeki

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Symbolic Program Transformation for Numerical Codes

Vijay Menon

Cornell University

• Compiler/Runtime Support for Numerical Programs (under Keshav Pingali)

• Sparse matrix code generation

• Memory hierarchy optimizations

• Compiler/runtime techniques for MATLAB

• MAJIC (with University of Illinois)

• MultiMATLAB

• Using matrix properties to generate efficient code

• e.g., Associativity, Commutativity of matrix operations

• Problem: loop notation

• Matrix operations hidden within loop nests

• Also called: Gaussian Elimination

• Key algorithm for solving systems of linear equations:

• To solve A x = b for x:

• => Factor A into L U

• => Solve L y = b for y

• => Solve U x = y for x

do j = 1,N

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

• Problem: Poor cache behavior

x x x x x x

0 x x x x x

0 0 5 x x x

0 0 3 x x x

0 0 7x x x

0 0 2 x x x

do jB = 1,N,B

do j = jB,jB+B-1

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,jB+B-1

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1

do k = jB+B,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

• Compiler transformations:

• Stripmine

• Index-Set Split

• Loop Distribute

• Tile/Map to BLAS

• Problems:

• Establishing legality

• Mapping to BLAS

• Conventional techniques (dependence analysis) are insufficient to establish legality

• Detection of matrix operations buys extra performance

• Fractal Symbolic Analysis

• Framework for Symbolically determining Legality of Program Transformations

• Matrixization

• Generalization of Vectorization to Matrix Operations

• Symbolic test to establish legality of program transformations

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

• Independent operations may be reordered

• Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

• Dependence Analysis is conservative:

• Symbolic execution shows equality:

• aout = 2*(ain + bin)

• bout = 2*bin

• But, intractable for recurrent loops

s1: a = 2*a

s2: b = 2*b

s3: a = a+b

s3: a = a+b

s1: a = 2*a

s2: b = 2*b

?

• Given p(j)  j, prove:

• Dependence analysis: too conservative

• Symbolic comparison: intractable

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

• Simplify

• Prove equality of complex programs via equality of simpler programs

• Repeat if necessary

• Sufficient, but not necessary

• equality of simpler programs -> equality of original programs

S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j  1..n | i < j  f(i) > f(j).

Si; Sj = Sj; Si

• Permutation  Sequence of Adjacent Transpositions

• Loop Distribution

• Legality:

 1  m < l  N. S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n

S1(i);

S2(i);

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

• Similar conditions:

• Statement reordering

• Loop interchange

• Loop reversal

• Loop tiling

• Compiler establishes legality of transformation by testing commute condition

• Given p(j)  j, prove:

• Simplified comparison =>

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

• Given p(j)  j  l > m prove:

• Further simplification?

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

B2(m):

B1(l):

B1(l):

B2(m):

• Given p(j)  j  l > m  i > m prove:

• Too conservative - no longer legal!

• Hypothesis:

• Repeated application -> dependence analysis

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

a(i) = a(i)/a(m)

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

• Can we symbolically prove?:

• Effect can be summarized:

• non-recurrent loops

• affine indices/loop bounds

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

B2(m):

B1(l):

B1(l):

B2(m):

• Compare symbolic values of live output variable in terms of input variables:

• aout (k)=

• Each

• guard : affine expression defining part of aout

• expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……

guardn -> exprn

• For both program blocks:

• aout(k) =

• Omega Library (integer programming tool) to manipulate/compare affine guards

k  m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)

• Powerful legality test

• tractability (dependence analysis)

• accuracy (symbolic comparison)

• Similar application for LU w/ pivoting

• 2 recursive simplification steps

• 6 guarded regions/expressions

• Prototype implemented in OCAML

• Fractal Symbolic Analysis

• Framework for Symbolically determining Legality of Program Transformations

• Matrixization

• Generalization of Vectorization to Matrix Operations

• Detect Matrix Operations

• Map to

• hand-optimized libraries (BLAS)

• special-purpose hardware

• MATLAB: type/bounds checks

• Exploit Matrix Semantics

• Ring Properties of Matrix (+,*)

• In Galerkin, 98% of time spent in:

• for i = 1:N

• for j = 1:N

• phi(k) += A(i,j)*x(i)*y(i);

• end

• end

• A - Matrix

• x, y - Vector

• In Optimized Galerkin:

phi(k) += x*A*y’;

• Fragment Speedup: 260

• Program Speedup: 110

• Syntactic Pattern Replacement (KAPF,VAST,FALCON)

• Can encode high-level properties

• Limited use on loops

• Vector Code Generation

• Can detect array operations in loops

• Cannot detect/exploit matrix products

• Map code to: Abstract Matrix Form (AMF)

• Convert to Symmetric AMF formulation

• Optimize AMF expressions

• factorization

• invariant removal

• Detect Matrix Products

• Map AMF to MATLAB/BLAS

• Array expressions:

• Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n

x(i) = x(i) + y(i); -> ixi = i(xi+yi)

end

• Sum Reduction Loops

for i = 1:n

k = k + x(i); -> k = k + î i xi

end

• Matrix Product

C = A*B -> ijCi,j= P(ijAi,h,ijBh,j)

• Product-Reduction Equivalence:

• e1 and e2 are scalar

• e1 is constant w.r.t. ik+1…in

• e2 is constant w.r.t. i1…ik-1

îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

• Distributive Properties

• e1·î e2 = î((ie1) ·e2)

• when e1 is constant w.r.t. i

• Interchange Properties

• if(e1, e2,…, en) = f(ie1, ie2,…, ien)

• i  e = i e

• where î=

• i j e = j i e

• î  e = î e

• Original:

• k = k + îi j (ai,j * xi * yj)

• Convert to Symmetric Form:

• k = k + î(i j ai,j * i j xi * i j yj)

• Optimize:

• k = k + îi xi * (i j ai,j * i j yj)

• Map to Matrix Operations:

• k = k + Pî (i xi , P(i j ai,j,j yj))

• => k = k + x’*A*y;

• Taken from USENET:

for i = 1:n

for j = 1:n

C(i,j) = A(i,j)*x(i);

C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

for i = 1:n

x(i) = y(:,i)*A*y(:,i)’;

x(1:n) = sum(y.*(y*A’),2);

• General technique to detect matrix/vector operations

• Implemented in MAJIC

• MATLAB interpreter/compiler

• Future work:

• Extend to more ops: recurrences

• JIT Matrixization

• Commutativity Analysis

• Rinard & Diniz

• Parallelization of OO programs

• High-Level Pattern Replacement

• DeRose, Marsolf, …

• Exploit high-level matrix properties

• Structure

• Symmetry

• Orthogonality

• High-level symbolic techniques:

• Fractal Symbolic Analysis

• Matrixization

• New techniques to analyze & exploit underlying computation and achieve substantial performance gains