slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tuning MATLAB for Better Performance Kadin Tseng Scientific Computing and Visualization, IS&T Boston University PowerPoint Presentation
Download Presentation
Tuning MATLAB for Better Performance Kadin Tseng Scientific Computing and Visualization, IS&T Boston University

Loading in 2 Seconds...

play fullscreen
1 / 46

Tuning MATLAB for Better Performance Kadin Tseng Scientific Computing and Visualization, IS&T Boston University - PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on

Tuning MATLAB for Better Performance Kadin Tseng Scientific Computing and Visualization, IS&T Boston University. Topics Covered. 1. Performance Issues 1.1 Memory Allocations 1.2 Vector Representations 1.3 Compiler 1.4 Other Considerations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tuning MATLAB for Better Performance Kadin Tseng Scientific Computing and Visualization, IS&T Boston University' - laurence


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Tuning MATLAB for Better Performance

Kadin Tseng

Scientific Computing and Visualization, IS&T

Boston University

topics covered
Topics Covered

1. Performance Issues

1.1 Memory Allocations

1.2 Vector Representations

1.3 Compiler

1.4 Other Considerations

2. Multiprocessing MATLAB

slide3

1. Performance Issues

1.1 Memory Access

1.2 Vector Representations

1.3 Compiler

1.4 Other Considerations

slide4

1.1 Memory Access

Memory access patterns often affect computational

performance. Some effective ways to enhance

performance in MATLAB :

  • Allocate array memory before using it
  • For-loops Ordering
  • Compute and save array in-place where applicable.
slide5

MATLAB Memory Allocation Issues

  • MATLAB arrays are allocated in contiguous address space ( for efficiency, as dictated by Lapack).
  • Arrays allocated on-the-fly. Problematic in a large for-loop.
  • MATLAB is both pass-by-value and pass-by-reference (“lazy copy”).
slide6

How Does MATLAB Allocate Arrays ?

  • MATLAB arrays are allocated in contiguous address space.

Without pre-allocation

for i=1:4

x(i) = i;

end

slide7

How … Arrays ? Examples

  • MATLAB arrays are allocated in contiguous address space.

n=5000;

tic

for i=1:n

x(i) = i^2;

end

toc

Wallclock time = 0.00046 seconds

n=5000; x = zeros(n,1);

tic

for i=1:n

x(i) = i^2;

end

toc

Wallclock time = 0.00004 seconds

The timing data are recorded on Katana. The actual times may vary depending on the processor.

slide8

Passing Arrays Into A Function

MATLAB uses pass-by-reference if passed array is

used as is; a copy will be made if the array is

modified. MATLAB calls it “lazy copy.”

function y = mycopy(A, x, b, change)

If change, A(2,3) = 23; end % change forces a local copy of a

y = A*x + b; % use x and b directlyfrom calling program

pause(2) % keep memory longer to see it

On Windows, can use Task Manager to monitor

memory allocation history.

>> n = 5000; A = rand(n); x = rand(n,1); b = rand(n,1);

>> y = mycopy(A, x, b, 0);

>> y = mycopy(A, x, b, 1);

slide9

For-loop Ordering

  • Best if inner-most loop is for array left-most index, etc.
  • For a multi-dimensional array, x(i,j), the 1D representation of the same array, x(k), follows column-wise order and inherently possesses the contiguous property

n=5000; x = zeros(n);

for i=1:n % rows

for j=1:n % columns

x(i,j) = i+(j-1)*n;

end

end

Wallclock time = 0.88 seconds

n=5000; x = zeros(n);

for j=1:n % columns

for i=1:n % rows

x(i,j) = i+(j-1)*n;

end

end

Wallclock time = 0.48 seconds

for i=1:n*n

x(i) = i;

end

x = 1:n*n;

slide10

Compute In-place

  • Compute and save array in-place improves performance

x = rand(5000);

tic

y = x.^2;

toc

Wallclock time = 0.30 seconds

x = rand(5000);

tic

x = x.^2;

toc

Wallclock time = 0.11 seconds

Caveat:

May not be worthwhile if it involves data type or size change …

slide11

OtherConsiderations

  • Generally, better to use function instead of script

m-file

    • Script m-file is loaded into memory and evaluate one line at a time. Subsequent uses require reloading.
    • Function m-file is compiled into a pseudo-code and is loaded on first application. Subsequent uses of the function will be faster without reloading.
    • Function is modular; self cleaning; reusable.
  • Global variables are expensive; difficult to track.
  • Physical memory is much faster than virtual mem.
  • Avoid passing large matrices to a function and modifying only a handful of elements. (struc and cell are exceptions)
other considerations cont d
Other Considerations (cont’d)
  • load and save are efficient to handle whole data file; textscan is more memory-efficient to extract text meeting specific criteria.
  • Don’t reassign array that results in change of data type or shape.
  • Limit m-files size and complexity.
  • Computationally intensive jobs often require large memory …
  • Structure of array more memory-efficient than array of structures.
slide13

Memory Management

  • Maximize memory availability.
    • 32-bit systems < 2 or 3 GB
    • 64-bit systems running 32-bit MATLAB < 4GB
    • 64-bit systems running 64-bit MATLAB < 8TB

(16GB on some Katana nodes)

  • Minimize memory usage.

(Details to follow …)

slide14

Minimize Memory Usage

  • Use clear, pack or other memory saving means when possible. If double precision (default) is not required, use

of ‘single’ data type could save substantial amount of memory. For example,

>> x=ones(10,'single'); y=x+1; % y inherits single from x

  • Use sparse to save memory

>> n=5000; A = zeros(n); A(3,2) = 1; B = ones(n);

>> C = A*B;

>> As = sparse(A);

>> Cs = As*B; % it can save time for low density

>> A2 = sparse(n,n); A2(3,2) = 1;

slide15

Minimize Memory Usage (Cont’d)

  • Use “matlab –nojvm …” saves lots of memory.
  • Memory usage query

For Unix:

>>unix('ps aux | grep $USER | grep –m 1 MATLAB | …

awk ''{print $5“k”}''') % only “k” double quoted

Katana% top

For Windows:

>>m = feature('memstats'); % largest contiguous free block

Use MS Windows Task Manager to monitor memory allocation.

  • Distribute memory among multiprocessors via MATLAB

Parallel Computing Toolbox.

slide16

Special Functions for Real Numbers

MATLAB provides a few functions for processing real,

noncomplex, data specifically. These functions are more

efficient than their generic versions:

  • realpow – power for real numbers
  • realsqrt – square root for real numbers
  • reallog – logarithm for real numbers
  • realmin/realmax – min/max for real numbers

n = 1000; x = 1:n;

x = x.^2;

tic

x = sqrt(x);

toc

Wallclock time = 0.00022 seconds

n = 1000; x = 1:n;

x = x.^2;

tic

x = realsqrt(x);

toc

Wallclock time = 0.00004 seconds

  • isreal reports whether the array is real
  • single/double converts data to single-/double-precision
slide17

Vectorization

  • MATLAB is designed for vector and matrix operations. The use of for loop, in general, can be expensive, especially if the loop count is large or nested.
  • Without array pre-allocation, for-loops are very costly.
  • From a performance standpoint, in general, vector representation should be used in place of for-loops whenever reasonable.

i = 0;

for t = 0:.01:100

i = i + 1;

y(i) = sin(t);

end

Wallclock time = 0.1069 seconds

t = 0:.01:100;

y = sin(t);

Wallclock time =0.0007seconds

slide18

Vector Manipulations of Arrays

>> A = magic(3) % define a 3x3 matrix A

A =

8 1 6

3 5 7

4 9 2

>> B = A^2; % B = A * A;

>> C = A + B;

>> b = 1:3 % define b as a 1x3 row vector

b =

1 2 3

>> [A, b'] % Add b transpose as a 4th column to A

ans =

8 1 6 1

3 5 7 2

4 9 2 3

slide19

Vector Manipulations of Arrays

>> [A; b] % Add b as a 4th row to A

ans =

8 1 6

3 5 7

4 9 2

1 2 3

>> A = zeros(3) % zeros generates 3*3 array of 0’s

A =

0 0 0

0 0 0

0 0 0

>> B = 2*ones(2,3) % ones generates 2 * 3 array of 1’s

B =

2 2 2

2 2 2

Alternatively,

>> B = repmat(2,2,3) % matrix replication

slide20

Vector Manipulations of Arrays

>> y = (1:5)’;

>> n = 3;

>> B = y(:, ones(1,n)) % B = y(:, [1 1 1]) or B=[y y y]

B =

1 1 1

2 2 2

3 3 3

4 4 4

5 5 5

Again, B can be generated via repmat as

>>B = repmat(y, 1, 3);

slide21

Vector Manipulations of Arrays

>> A = magic(3)

A =

8 1 6

3 5 7

4 9 2

>> B = A(:, [1 3 2]) % switch 2nd and third columns of A

B =

8 6 1

3 7 5

4 2 9

>> A(:, 2) = [ ] % delete second column of A

A =

8 6

3 7

4 2

slide22

Vector Representation Example

n = 3000; x = zeros(n);

for j=1:n

for i=1:n

x(i,j) = i+(j-1)*n;

x(i,j) = x(i,j)^2;

end

end

Wallclock time = 0.19 seconds

n = 3000;

i = (1:n) ';

I = repmat(i,1,n); % replicate along j

x = I + (I'-1)*n;

x = x.^2;

Wallclock time = 0.49 seconds

  • Notes on the vector form of the computations :
  • To eliminate the for-loops, all values of i and j must be made available at once. The ones or repmat utilities can be used to replicate rows or columns. In this special case, J = I' is used to save computations and memory.
  • Often, there are trade-offs between efficiency and memory using the vector form. Here, the creation of I adds to the memory and compute time. However, as more works can be leveraged against I, efficiency improves.
  • Added memory could push memory usage closer to the physical ram limit. Once virtual memory (swap space) is required, performance degrades.
laplace equation
Laplace Equation

Laplace Equation:

(1)

Boundary Conditions:

(2)

Analytical solution:

(3)

discrete laplace equation
Discrete Laplace Equation

Discretize Equation (1) by centered-difference yields:

(4)

where n and n+1 denote the current and the next time step, respectively, while

(5)

For simplicity, we take

slide27

Five-point Finite-Difference Stencil

Interior cells.

Where solution of the Laplace equation is sought.

x

x

x

o

x

x

(i, j)

Exterior cells.

Green cells denote cells where homogeneous boundary conditions are imposed while non-homogeneous boundary conditions are colored in blue.

x

x

o

x

x

slide28

SOR Update Function

How do you vectorize it ?

Remove the for loops

Define i = ib:2:ie;

Define j = jb:2:je;

Use sum for del

% original code fragment

jb = 2; je = n+1; ib = 3; ie = m+1;

for i=ib:2:ie

for j=jb:2:je

up = ( u(i ,j+1) + u(i+1,j ) + ...

u(i-1,j ) + u(i ,j-1) )*0.25;

u(i,j) = (1.0 - omega)*u(i,j) +omega*up;

del = del + abs(up-u(i,j));

end

end

% vector code fragment

jb = 2; je = n+1; ib = 3; ie = m+1;

i = ib:2:ie; j = jb:2:je;

up = ( u(i ,j+1) + u(i+1,j ) + ...

u(i-1,j ) + u(i ,j-1) )*0.25;

u(i,j) = (1.0 - omega)*u(i,j) + omega*up;

del = del + sum(sum(abs(up-u(i,j))));

More efficient way ?

slide31

An integration of the cosine function between 0 and π/2

  • Integration scheme is mid-point rule for simplicity.
  • Several parallel methods will be demonstrated.

Integration Example

Worker 1

mid-point of increment

Worker 2

a = 0; b = pi/2; % range

m = 8; % # of increments

h = (b-a)/m; % increment

p = numlabs;

n = m/p; % inc. / worker

ai = a + (i-1)*n*h;

aij = ai + (j-1)*h;

cos(x)

h

Worker 3

Worker 4

x=a

x=b

slide32

function intOut = Integral(fct, a, b, n)

%function intOut = Integral(fct, a, b, n)

% performs mid-point rule integration of "fct"

% fct -- integrand (cos, sin, etc.)

% a -- starting point of the range of integration

% b –- end point of the range of integration

% n -- number of increments

% Usage example: >> Integral(@cos, 0, pi/2, 500) % 0 to pi/2

h = (b – a)/n; % increment length

intOut = 0.0; % initialize integral

for j=1:n % sum integrals

aij = a +(j-1)*h; % function is evaluated at mid-interval

intOut = intOut + fct(aij+h/2)*h;

end

Integration Example — the Kernel

Vector form of the function:

function intOut = Integral(fct, a, b, n)

h = (b – a)/n;

aij = a + (0:n-1)*h;

intOut = sum(fct(aij+h/2))*h;

slide33

% serial integration

tic

m = 10000;

a = 0;

b = pi*0.5;

intSerial = Integral(@cos, a, b, m);

toc

Integration Example — Serial Integration

slide34
Integration Example Benchmarks
  • Timings (seconds) obtained on a quad-core Xeon X5570
  • Computation linearly proportional to # of increments.
  • FORTRAN and C timings are an order of magnitude faster.
slide35

Compiler

A MATLAB compiler, mcc, is available.

  • It compiles m-files into C codes, object libraries, or stand-alone executables.
  • A stand-alone executable generated with mcc can run on compatible platforms without an installed MATLAB or a MATLAB license.
  • Many MATLAB general and toolbox licenses are available. Infrequently, MATLAB access may be denied if all licenses are checked out. Running a stand-alone requires NO licenses and no waiting.
compiler cont d
Compiler (Cont’d)
  • Some compiled codes may run more efficiently than m-files because they are not run in interpretive mode.
  • A stand-alone enables you to share it without revealing the source.

www.bu.edu/tech/research/training/tutorials/matlab/vector/miscs/compiler/

compiler cont d1
Compiler (Cont’d)

Input arguments

MATLAB root

How to build a standalone executable

>> mcc –o ssor2Dijc –m ssor2Dij

How to run ssor2Dijc on Katana

Katana% run_ssor2Dijc.sh /usr/local/apps/matlab_2009b 256 256

Details:

  • The m-file is ssor2Dij.m
  • Input arguments to code are processed as strings by mcc. Convert

with str2num if need be. ssor2Dij.m requires 2 inputs; m, n

if isdeployed, m=str2num(m); end

  • Output cannot be returned; either save to file or display on screen.
  • The executable is ssor2Dijc
  • run_ssor2Dijc.sh is the run script generated by mcc.
  • None of SOR codes benefits, in runtime, from mcc.
slide38

MATLAB Programming Tools

  • profile - profiler to identify “hot spots” for performance enhancement.
  • mlint - for inconsistencies and suspicious constructs in M-files.
  • debug - MATLAB debugger.
  • guide - Graphical User Interface design tool.
slide39

MATLAB profiler

To use profile viewer, DONOT start MATLAB with –nojvm option

>> profile on –detail 'builtin' –timer 'real'

>> % run code to be

>> % profiled here

>> %

>> %

>> profile viewer % view profiling data

>> profile off % turn off profiler

Profiling example.

>> profile on

>> ssor2Dij % profiling the SOR Laplace solver

>> profile viewer

>> profile off

Turns on profiler. –detail 'builtin'enables MATLAB builtinfunctions; -timer 'real' reports wallclock time.

slide40

How to Save Profiling Data

Two ways to save profiling data:

  • Save into a directory of HTML files

Viewing is static, i.e., the profiling data displayed correspond to a prescribed set of options. View with a browser.

2. Saved as a MAT file

Viewing is dynamic; you can change the options. Must be viewed in the MATLAB environment.

slide41

Profiling – save as HTML files

Viewing is static, i.e., the profiling data displayed

correspond to a prescribed set of options. View with

a browser.

>> profile on

>> plot(magic(20))

>> profile viewer

>> p = profile('info');

>> profsave(p, ‘my_profile') % html files in my_profile dir

slide42

Profiling – save as MAT file

Viewing is dynamic; you can change the options. Must be viewed in the MATLAB environment.

>> profile on

>> plot(magic(20))

>> profile viewer

>> p = profile('info');

>> save myprofiledata p

>> clear p

>> load myprofiledata

>> profview(0,p)

slide43

MATLAB “grammar checker”

  • mlint is used to identify coding inconsistencies and make coding performance improvement recommendations.
  • mlint is a standalone utility; it is an option in profile.
  • MATLAB editor provides this feature.
  • Debug mode can also be invoked through editor.
slide44

Running MATLAB in Command Line Mode and Batch

matlab -nodisplay –nosplash –r myfile

Add –nojvm if Java is not needed to save memory

For batch jobs on Katana, use the above command in the

batch script.

Visit http://www.bu.edu/tech/research/computation/linux-cluster/katana-cluster/runningjobs/ for instructions on running batch jobs.

comment out block of statements
Comment Out Block Of Statements

Often in code debugging, one wants to comment out an

entire block of statements. Three convenient ways to do it:

1. %{

n = 3000;

x = rand(n);

%}

2. Select the statement block with the mouse, then press

the control key along with the key “r”, MATLAB prepends

the “%” key to each line of the selected block. Uncomment

by first select the statement block followed by Ctrl “t”.

3. if 0

n = 3000;

x = rand(n);

end

useful scv info
Useful SCV Info

Please help us do better in the future by participating in a quick survey: http://scv.bu.edu/survey/fall10tut_survey.html

  • SCV home page (http://scv.bu.edu/)
  • Resource Applications (https://acct.bu.edu/SCF)
  • Help
    • Web-based tutorials (http://scv.bu.edu/)

(MPI, OpenMP, MATLAB, IDL, Graphics tools)

    • HPC consultations by appointment
      • Kadin Tseng (kadin@bu.edu)
      • Doug Sondak (sondak@bu.edu)
    • help@twister.bu.edu, help@cootie.bu.edu