- By
**zeki** - Follow User

- 95 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Symbolic Program Transformation for Numerical Codes' - zeki

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Background

- Compiler/Runtime Support for Numerical Programs (under Keshav Pingali)
- Sparse matrix code generation
- Memory hierarchy optimizations
- Compiler/runtime techniques for MATLAB
- MAJIC (with University of Illinois)
- MultiMATLAB

Motivation

- Using matrix properties to generate efficient code
- e.g., Associativity, Commutativity of matrix operations
- Problem: loop notation
- Matrix operations hidden within loop nests

Example: LU Factorization with Partial Pivoting

- Also called: Gaussian Elimination
- Key algorithm for solving systems of linear equations:
- To solve A x = b for x:
- => Factor A into L U
- => Solve L y = b for y
- => Solve U x = y for x

LU Factorization

do j = 1,N

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

- Problem: Poor cache behavior

x x x x x x

0 x x x x x

0 0 5 x x x

0 0 3 x x x

0 0 7x x x

0 0 2 x x x

Automatic Blocking

do jB = 1,N,B

do j = jB,jB+B-1

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,jB+B-1

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1

do k = jB+B,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

- Compiler transformations:
- Stripmine
- Index-Set Split
- Loop Distribute
- Tile/Map to BLAS
- Problems:
- Establishing legality
- Mapping to BLAS

Key Points

- Conventional techniques (dependence analysis) are insufficient to establish legality
- Detection of matrix operations buys extra performance

Overview of this Talk

- Fractal Symbolic Analysis
- Framework for Symbolically determining Legality of Program Transformations
- Matrixization
- Generalization of Vectorization to Matrix Operations

Fractal Symbolic Analysis

- Symbolic test to establish legality of program transformations

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

Dependence Analysis

- Independent operations may be reordered
- Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

Symbolic Comparison

- Dependence Analysis is conservative:
- Symbolic execution shows equality:
- aout = 2*(ain + bin)
- bout = 2*bin
- But, intractable for recurrent loops

s1: a = 2*a

s2: b = 2*b

s3: a = a+b

s3: a = a+b

s1: a = 2*a

s2: b = 2*b

?

Distillation of LU

- Given p(j) j, prove:
- Dependence analysis: too conservative
- Symbolic comparison: intractable

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

Idea

- Simplify
- Prove equality of complex programs via equality of simpler programs
- Repeat if necessary
- Sufficient, but not necessary
- equality of simpler programs -> equality of original programs

Commutativity

S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j 1..n | i < j f(i) > f(j).

Si; Sj = Sj; Si

- Permutation Sequence of Adjacent Transpositions

Commutative Legality

- Loop Distribution
- Legality:

1 m < l N. S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n

S1(i);

S2(i);

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

Commutative Legality

- Similar conditions:
- Statement reordering
- Loop interchange
- Loop reversal
- Loop tiling
- Compiler establishes legality of transformation by testing commute condition

Back to Example

- Given p(j) j, prove:
- Simplified comparison =>

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

Simplified Comparison

- Given p(j) j l > m prove:
- Further simplification?

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

B2(m):

B1(l):

B1(l):

B2(m):

Too Simple!

- Given p(j) j l > m i > m prove:
- Too conservative - no longer legal!
- Hypothesis:
- Repeated application -> dependence analysis

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

a(i) = a(i)/a(m)

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

Symbolic Comparison

- Can we symbolically prove?:
- Effect can be summarized:
- non-recurrent loops
- affine indices/loop bounds

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

B2(m):

B1(l):

B1(l):

B2(m):

Guarded Expressions

- Compare symbolic values of live output variable in terms of input variables:
- aout (k)=
- Each
- guard : affine expression defining part of aout
- expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……

guardn -> exprn

Guarded Expressions

- For both program blocks:
- aout(k) =
- Omega Library (integer programming tool) to manipulate/compare affine guards

k m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)

Summary of Fractal Symbolic Analysis

- Powerful legality test
- Explores tradeoffs between
- tractability (dependence analysis)
- accuracy (symbolic comparison)
- Similar application for LU w/ pivoting
- 2 recursive simplification steps
- 6 guarded regions/expressions
- Prototype implemented in OCAML

Overview of this Talk

- Fractal Symbolic Analysis
- Framework for Symbolically determining Legality of Program Transformations
- Matrixization
- Generalization of Vectorization to Matrix Operations

Matrixization

- Detect Matrix Operations
- Map to
- hand-optimized libraries (BLAS)
- special-purpose hardware
- Eliminate loop overhead
- MATLAB: type/bounds checks
- Exploit Matrix Semantics
- Ring Properties of Matrix (+,*)

Galerkin Example

- In Galerkin, 98% of time spent in:
- for i = 1:N
- for j = 1:N
- phi(k) += A(i,j)*x(i)*y(i);
- end
- end
- A - Matrix
- x, y - Vector

Conventional Approaches

- Syntactic Pattern Replacement (KAPF,VAST,FALCON)
- Can encode high-level properties
- Limited use on loops
- Vector Code Generation
- Can detect array operations in loops
- Cannot detect/exploit matrix products

Our Approach

- Map code to: Abstract Matrix Form (AMF)
- Convert to Symmetric AMF formulation
- Optimize AMF expressions
- factorization
- invariant removal
- Detect Matrix Products
- Map AMF to MATLAB/BLAS

Expansion

- Array expressions:
- Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n

x(i) = x(i) + y(i); -> ixi = i(xi+yi)

end

Product

- Matrix Product

C = A*B -> ijCi,j= P(ijAi,h,ijBh,j)

- Product-Reduction Equivalence:
- e1 and e2 are scalar
- e1 is constant w.r.t. ik+1…in
- e2 is constant w.r.t. i1…ik-1

îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

Other AMF Properties

- Distributive Properties
- e1·î e2 = î((ie1) ·e2)
- when e1 is constant w.r.t. i
- Interchange Properties
- if(e1, e2,…, en) = f(ie1, ie2,…, ien)
- i e = i e
- where î=
- i j e = j i e
- î e = î e

Back to Galerkin

- Original:
- k = k + îi j (ai,j * xi * yj)
- Convert to Symmetric Form:
- k = k + î(i j ai,j * i j xi * i j yj)
- Optimize:
- k = k + îi xi * (i j ai,j * i j yj)
- Map to Matrix Operations:
- k = k + Pî (i xi , P(i j ai,j,j yj))
- => k = k + x’*A*y;

Other Vectorization Examples

- Taken from USENET:

for i = 1:n

for j = 1:n

C(i,j) = A(i,j)*x(i);

C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

for i = 1:n

x(i) = y(:,i)*A*y(:,i)’;

x(1:n) = sum(y.*(y*A’),2);

Summary/Status of Matrixization

- General technique to detect matrix/vector operations
- Implemented in MAJIC
- MATLAB interpreter/compiler
- Future work:
- Extend to more ops: recurrences
- JIT Matrixization

Related Work

- Commutativity Analysis
- Rinard & Diniz
- Parallelization of OO programs
- High-Level Pattern Replacement
- DeRose, Marsolf, …
- Exploit high-level matrix properties
- Structure
- Symmetry
- Orthogonality

Conclusion

- High-level symbolic techniques:
- Fractal Symbolic Analysis
- Matrixization
- New techniques to analyze & exploit underlying computation and achieve substantial performance gains

Download Presentation

Connecting to Server..