- 62 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'A Case for Source-Level Transformations in MATLAB' - breena

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Case for Source-Level Transformations in MATLAB

Vijay Menon and Keshav Pingali

Cornell University

The MaJic Project

at Illinois/Cornell

George Almasi

Luiz De Rose

David Padua

MATLAB

- High-Level Interpreted Language for Numerical Computing
- Matrix is 1st class type
- Library of numerical functions
- Application Domains
- Image Processing
- Structural Mechanics
- Computational Finance

The Problem

- Development is fast...
- ~10X as concise as C/Fortran
- Performance is slow!
- ~10X as slow as C/Fortran
- Conventional Approach:
- Rewrite
- Compile

Our Approach: Source-Level Optimization

- Apply high-level transformations directly on MATLAB codes
- Significant performance benefit for:
- interpreted code
- compiled code

Outline

- Overheads in MATLAB
- Conventional Compilation
- Source-Level Optimization
- Comparison
- Implementation Status

Outline

- Overheads in MATLAB
- Type/Shape Checking
- Memory Management
- Array Bounds Checking
- Conventional Compilation
- Source-Level Optimization
- Comparison
- Implementation Status

MATLAB has no type/shape declarations

Consider:

A * B

Interpreter checks to perform multiply (*)

Shape

Scalar*Scalar

Scalar*Matrix

Matrix*Matrix

Type/Shape Checking- Type
- Real*Real
- Real*Complex
- Complex*Complex

Consider:

for i = 1:n

y = y + a * x(i)

end

Loops

perform redundant checks

magnify interpreter overhead

Type/Shape CheckingMemory Management: Dynamic Resizing

- Consider:

x(10) = 10;

- C/Fortran: x must have >= 10 elements
- MATLAB: x is resized if needed
- Memory reallocated
- Data copied

Memory Management: Dynamic Resizing

- MATLAB dynamically grows arrays:

for i = 1 : 1000

x(i) = i;

end

- Every iteration triggers resize!
- 1,000 memory allocations
- ~500,000 elements copied
- Execution Time:
- x is undefined: 14.2 seconds
- x is already defined: 0.37 seconds

Array Bounds Checking

- Consider array indexing:

x(i) = y(i);

- Failed Bounds Check on
- x(i) can trigger resize
- y(i) can trigger error

Array Bounds Checking

- In a loop:

for i = 3:100

x(i) = x(i-1) + x(i-2);

end

- Interpreter performance redundant checks
- Compiler work:
- Nonresizable arrays: Gupta PLDI’90
- Resizable arrays: more difficult

Common Theme

- Loops magnify overheads
- every iteration: redundant checks, resizes, …
- MATLAB interprets naively
- computes as is
- no reorganization to optimize

Outline

- Overheads in MATLAB
- Conventional Compilation
- Compile to C/Fortran
- Rely on C/Fortran compiler for optimization
- Source-Level Optimization
- Comparison
- Implementation Status

MATLAB Compilers

- Compile to C/C++/Fortran
- MCC -> C (The MathWorks)
- MATCOM -> C++ (Mathtools)
- FALCON -> F90 (U of Illinois)
- Native compiler generates executable code:
- Link back into MATLAB environment
- Run as stand-alone program

The MCC Compiler

- Safe Optimization:
- Type Inference - no declarations in MATLAB
- Eliminate Type Checks / Reduce Storage
- Specialize for real input variables
- Always legal!
- Unsafe Optimization:
- Assume all data is real
- Eliminate all bounds checks - disallow resizing
- User must ensure legality!

Falcon Benchmarks

- Collected by DeRose from MATLAB users at Illinois/NCSA
- Element/Loop Intensive
- CN - Crank-Nicholson PDE Solver
- Di - Dirichlet PDE Solver
- FD - Finite Difference PDE Solver
- Ga - Galerkin PDE Solver
- IC - Incomplete Cholesky Factorization
- Memory Intensive
- AQ - Adaptive Quadrature w/ Simpson’s Rule
- EC - Euler-Cromer 2 body problem
- RK - Runga Kutta 2 body problem
- Library Intensive
- CG - Conjugate Gradients Iterative Solver
- Mei - 3D surface Generation
- QMR - Quasi-Minimal Residual
- SOR - Successive Over-Relaxation AQ

MCC: Unsafe Optimizations

Note: User must ensure legality!

Outline

- Overheads in MATLAB
- Conventional Compilation
- Source-Level Optimization
- Vectorization
- Preallocation
- Expression Optimization
- Comparison
- Implementation Status

Vectorization

- Loops are expensive
- Overheads are magnified
- Idea: Eliminate Loops
- Map loops to higher-level matrix operations
- Interpreter uses efficient libraries
- BLAS
- LINPACK/EISPACK

Example of Vectorization

- In Galerkin, 98% of execution spent in:

for i = 1:N

for j = 1:N

phi(k) += a(i,j)*x(i)*y(i);

end

end

Vectorized Code

- In Optimized Galerkin:

phi(k) += x*a*y’;

- Fragment Speedup: 260
- Program Speedup: 110
- Note: Not always possible!

Preallocation

- Eliminate Dynamic Resizing
- Try to predict eventual size of array
- Insert early allocation when possible:
- x = zeros(1000,1);
- Resizing will not be triggered

Example of Preallocation

- In Euler-Cromer, 87% of time spent in:

for i = 1:N

r(i) = …

th(i) = …

t(i) = …

k(i) = …

p(i) = …

…

end

Preallocated Code

- In Optimized Euler-Cromer:

r = zeros(1,N);

...

for i = 1:N

r(i) = …

…

end

- Fragment Speedup: 7
- Program Speedup: 4

Expression Optimization

- MATLAB interprets expressions naïvely in left to right order
- Simple restructuring may significantly effects execution time, e.g.:
- A*B*x : O(n3) flops
- A*(B*x) : O(n2) flops

Example of Expression Optimization

- In QMR, 70% of execution spent in:

w = A’*q;

- A : 420x420 matrix
- q, w : 420x1 vectors
- A’ = transpose(A)

Expression Optimized Code

- In Optimized QMR: A’*q == (q’*A)’

w = (q’*A)’;

- Transpose 2 vectors instead 1 matrix
- Fragment Speedup: 20
- Program Speedup: 3

Point #1:

- Source optimizations can outperform MCC

Point #2:

- Source optimizations complement MCC

Benefits of Source-Level Optimizations

- Vectorization
- Directly eliminates loop overhead
- Move work to hand-optimized BLAS
- Preallocation
- Eliminates resizing overhead
- Enables MCC array bounds elimination
- Expression Optimization
- Uses algebraic info unavailable in C/Fortran

Implementation Status

- Illinois/Cornell MaJic system
- Just-in-time MATLAB interpreter/compiler
- Incorporates Source-Level Transformation
- Semantic Optimization (Menon/Pingali ICS’99)
- Vectorization/BLAS call generation
- Expression Optimization
- Preallocation/Bounds Check Optimization (Work in progress)

Conclusion

- Source Level Optimizations are important for enhancing performance of MATLAB whether code is just interpreted or later compiled

Unsafe Type Check Removal

- Correct on 11/12 Codes

Unsafe Bounds Check Removal

- Correct on 7/12 Codes

Download Presentation

Connecting to Server..