symbolic program transformation for numerical codes
Download
Skip this Video
Download Presentation
Symbolic Program Transformation for Numerical Codes

Loading in 2 Seconds...

play fullscreen
1 / 41

Symbolic Program Transformation for Numerical Codes - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Symbolic Program Transformation for Numerical Codes. Vijay Menon Cornell University. Background. Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) Sparse matrix code generation Memory hierarchy optimizations Compiler/runtime techniques for MATLAB

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Symbolic Program Transformation for Numerical Codes' - zeki


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
background
Background
  • Compiler/Runtime Support for Numerical Programs (under Keshav Pingali)
    • Sparse matrix code generation
    • Memory hierarchy optimizations
    • Compiler/runtime techniques for MATLAB
      • MAJIC (with University of Illinois)
      • MultiMATLAB
motivation
Motivation
  • Using matrix properties to generate efficient code
    • e.g., Associativity, Commutativity of matrix operations
  • Problem: loop notation
    • Matrix operations hidden within loop nests
example lu factorization with partial pivoting
Example: LU Factorization with Partial Pivoting
  • Also called: Gaussian Elimination
  • Key algorithm for solving systems of linear equations:
    • To solve A x = b for x:
      • => Factor A into L U
      • => Solve L y = b for y
      • => Solve U x = y for x
lu factorization
LU Factorization

do j = 1,N

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

Select pivot row:

Swap pivot row with current:

Scale column (to store L):

Update(to compute partial U):

  • Problem: Poor cache behavior

x x x x x x

0 x x x x x

0 0 5 x x x

0 0 3 x x x

0 0 7x x x

0 0 2 x x x

automatic blocking
Automatic Blocking

do jB = 1,N,B

do j = jB,jB+B-1

p(j) = j;

do i = j+1,N

if (A(i,j)>A(p(j),j))

p(j) = i;

do k = 1,N

tmp = A(j,k);

A(j,k) = A(p(j),k);

A(p(j),k) = tmp;

do i = j+1,N

A(i,j) = A(i,j)/A(j,j);

do k = j+1,jB+B-1

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

do j = jB,jB+B-1

do k = jB+B,N

do i = j+1,N

A(i,k) = A(i,k) - A(i,j)*A(j,k);

  • Compiler transformations:
    • Stripmine
    • Index-Set Split
    • Loop Distribute
    • Tile/Map to BLAS
  • Problems:
    • Establishing legality
    • Mapping to BLAS
key points
Key Points
  • Conventional techniques (dependence analysis) are insufficient to establish legality
  • Detection of matrix operations buys extra performance
overview of this talk
Overview of this Talk
  • Fractal Symbolic Analysis
    • Framework for Symbolically determining Legality of Program Transformations
  • Matrixization
    • Generalization of Vectorization to Matrix Operations
fractal symbolic analysis
Fractal Symbolic Analysis
  • Symbolic test to establish legality of program transformations

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

dependence analysis
Dependence Analysis
  • Independent operations may be reordered
  • Legality: No data dependence from S2(m) to S1(l) where l > m

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

for i = 1 : n

S1(i);

S2(i);

symbolic comparison
Symbolic Comparison
  • Dependence Analysis is conservative:
  • Symbolic execution shows equality:
    • aout = 2*(ain + bin)
    • bout = 2*bin
  • But, intractable for recurrent loops

s1: a = 2*a

s2: b = 2*b

s3: a = a+b

s3: a = a+b

s1: a = 2*a

s2: b = 2*b

?

distillation of lu
Distillation of LU
  • Given p(j)  j, prove:
  • Dependence analysis: too conservative
  • Symbolic comparison: intractable

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

slide14
Idea
  • Simplify
    • Prove equality of complex programs via equality of simpler programs
    • Repeat if necessary
  • Sufficient, but not necessary
    • equality of simpler programs -> equality of original programs
commutativity
Commutativity

S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n)

i,j  1..n | i < j  f(i) > f(j).

Si; Sj = Sj; Si

  • Permutation  Sequence of Adjacent Transpositions
commutative legality
Commutative Legality
  • Loop Distribution
  • Legality:

 1  m < l  N. S2(m); S1(l) = S1(l); S2(m)

for i = 1 : n

S1(i);

S2(i);

for i = 1 : n

S1(i);

for i = 1 : n

S2(i);

commutative legality1
Commutative Legality
  • Similar conditions:
    • Statement reordering
    • Loop interchange
    • Loop reversal
    • Loop tiling
  • Compiler establishes legality of transformation by testing commute condition
back to example
Back to Example
  • Given p(j)  j, prove:
  • Simplified comparison =>

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for j = 1:n

for i = j+1:n

a(i) = a(i)/a(j)

for j = 1:n

tmp = a(j)

a(j) = a(p(j))

a(p(j)) = tmp

for i = j+1:n

a(i) = a(i)/a(j)

B1(j):

B1(j):

B2(j):

B2(j):

simplified comparison
Simplified Comparison
  • Given p(j)  j  l > m prove:
  • Further simplification?

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

B2(m):

B1(l):

B1(l):

B2(m):

too simple
Too Simple!
  • Given p(j)  j  l > m  i > m prove:
  • Too conservative - no longer legal!
  • Hypothesis:
    • Repeated application -> dependence analysis

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

a(i) = a(i)/a(m)

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

symbolic comparison1
Symbolic Comparison
  • Can we symbolically prove?:
  • Effect can be summarized:
    • non-recurrent loops
    • affine indices/loop bounds

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

for i = m+1:n

a(i) = a(i)/a(m)

for i = m+1:n

a(i) = a(i)/a(m)

tmp = a(l)

a(l) = a(p(l))

a(p(l)) = tmp

B2(m):

B1(l):

B1(l):

B2(m):

guarded expressions
Guarded Expressions
  • Compare symbolic values of live output variable in terms of input variables:
  • aout (k)=
  • Each
    • guard : affine expression defining part of aout
    • expr : symbolic expression on input data

guard1 -> expr1

guard2 -> expr2

……

guardn -> exprn

guarded expressions1
Guarded Expressions
  • For both program blocks:
  • aout(k) =
  • Omega Library (integer programming tool) to manipulate/compare affine guards

k  m => ain(k)

k = l => ain(p(l))/ain(m)

k = p(l) => ain(l)/ain(m)

else => ain(k)/ain(m)

summary of fractal symbolic analysis
Summary of Fractal Symbolic Analysis
  • Powerful legality test
  • Explores tradeoffs between
    • tractability (dependence analysis)
    • accuracy (symbolic comparison)
  • Similar application for LU w/ pivoting
    • 2 recursive simplification steps
    • 6 guarded regions/expressions
  • Prototype implemented in OCAML
overview of this talk1
Overview of this Talk
  • Fractal Symbolic Analysis
    • Framework for Symbolically determining Legality of Program Transformations
  • Matrixization
    • Generalization of Vectorization to Matrix Operations
matrixization
Matrixization
  • Detect Matrix Operations
    • Map to
      • hand-optimized libraries (BLAS)
      • special-purpose hardware
    • Eliminate loop overhead
      • MATLAB: type/bounds checks
  • Exploit Matrix Semantics
    • Ring Properties of Matrix (+,*)
galerkin example
Galerkin Example
  • In Galerkin, 98% of time spent in:
  • for i = 1:N
  • for j = 1:N
  • phi(k) += A(i,j)*x(i)*y(i);
  • end
  • end
  • A - Matrix
  • x, y - Vector
vectorized code
Vectorized Code
  • In Optimized Galerkin:

phi(k) += x*A*y’;

  • Fragment Speedup: 260
  • Program Speedup: 110
conventional approaches
Conventional Approaches
  • Syntactic Pattern Replacement (KAPF,VAST,FALCON)
    • Can encode high-level properties
    • Limited use on loops
  • Vector Code Generation
    • Can detect array operations in loops
    • Cannot detect/exploit matrix products
our approach
Our Approach
  • Map code to: Abstract Matrix Form (AMF)
  • Convert to Symmetric AMF formulation
  • Optimize AMF expressions
    • factorization
    • invariant removal
  • Detect Matrix Products
  • Map AMF to MATLAB/BLAS
expansion
Expansion
  • Array expressions:
  • Forall loops:

a(1:m,1:n) -> i j ai,j

for i = 1:n

x(i) = x(i) + y(i); -> ixi = i(xi+yi)

end

reduction
Reduction
  • Sum Reduction Loops

for i = 1:n

k = k + x(i); -> k = k + î i xi

end

product
Product
  • Matrix Product

C = A*B -> ijCi,j= P(ijAi,h,ijBh,j)

  • Product-Reduction Equivalence:
      • e1 and e2 are scalar
      • e1 is constant w.r.t. ik+1…in
      • e2 is constant w.r.t. i1…ik-1

îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

other amf properties
Other AMF Properties
  • Distributive Properties
    • e1·î e2 = î((ie1) ·e2)
      • when e1 is constant w.r.t. i
  • Interchange Properties
    • if(e1, e2,…, en) = f(ie1, ie2,…, ien)
    • i  e = i e
      • where î=
    • i j e = j i e
    • î  e = î e
back to galerkin
Back to Galerkin
  • Original:
      • k = k + îi j (ai,j * xi * yj)
  • Convert to Symmetric Form:
      • k = k + î(i j ai,j * i j xi * i j yj)
  • Optimize:
      • k = k + îi xi * (i j ai,j * i j yj)
  • Map to Matrix Operations:
      • k = k + Pî (i xi , P(i j ai,j,j yj))
  • => k = k + x’*A*y;
other vectorization examples
Other Vectorization Examples
  • Taken from USENET:

for i = 1:n

for j = 1:n

C(i,j) = A(i,j)*x(i);

C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n)

for i = 1:n

x(i) = y(:,i)*A*y(:,i)’;

x(1:n) = sum(y.*(y*A’),2);

summary status of matrixization
Summary/Status of Matrixization
  • General technique to detect matrix/vector operations
  • Implemented in MAJIC
    • MATLAB interpreter/compiler
  • Future work:
    • Extend to more ops: recurrences
    • JIT Matrixization
related work
Related Work
  • Commutativity Analysis
    • Rinard & Diniz
    • Parallelization of OO programs
  • High-Level Pattern Replacement
    • DeRose, Marsolf, …
    • Exploit high-level matrix properties
      • Structure
      • Symmetry
      • Orthogonality
conclusion
Conclusion
  • High-level symbolic techniques:
    • Fractal Symbolic Analysis
    • Matrixization
  • New techniques to analyze & exploit underlying computation and achieve substantial performance gains
ad