MATLAB. C enter for I ntegrated R esearch C omputing. http:// www.circ.rochester.edu /wiki/ index.php / MatlabWorkshop. Outline. Part I – Interacting with Matlab Running Matlab interactively Accessing the GUI Using the terminal for command entry Using just the terminal
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
MATLAB
Center for Integrated Research Computing
http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
http://www.circ.rochester.edu/wiki/index.php/NX_Cluster
http://www.circ.rochester.edu/wiki/index.php/NX_Cluster
We could also launch a terminal on the NX desktop and submit an interactive job from there.
We could also launch a terminal on the NX desktop and submit an interactive job from there.
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1module load matlab-R2013a-local
matlab -singlecompthread
We could also launch a terminal on the NX desktop and submit an interactive job from there.
Occasionally the Matlab Desktop will respond slowly to commands which can be VERY frustrating. One work around is to use the terminal window as the "desktop" – while still retaining the ability to plot windows / access help etc...
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1module load matlab
matlab -singlecompthread
matlab -nodesktop -nosplash
We could also launch a terminal on the NX desktop and submit an interactive job from there.
Occasionally the Matlab Desktop will respond slowly to commands which can be VERY frustrating. One work around is to use the terminal window as the Desktop – while still retaining the ability to plot windows / access help etc...
And finally you may not need to plot anything on the screen – or use any of the GUI features. In that case you can...
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1module load matlab
matlab -singlecompthread
matlab -nodesktop -nosplash
matlab -nodisplay
If you are running Matlab without a connected display you can still make plots directly to a file in Matlab
You may also find it useful to enter many commands into a script file and then execute the script – so you can do something else while Matlab creates several figures etc... This is also a good way to develop a script for batch jobs.
H=hilb(1000);
Z=fft2(H);
f=figure('Visible','off');imagesc(log(abs(Z))); print('-dpdf','-r300','fig1.pdf')
If you are running a machine that has an X-server – you can bypass NX and just use X11 Forwarding. Though if your connection drops – your Matlab session (and your interactive job) will terminate
Also if you do use NX and you finish using Matlab – please terminate your session instead of just disconnecting. This will cleanup any jobs you have running and free up resources for other users.
ssh -X [email protected]
qsub -I -X -q interactive –l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
To submit a job in batch mode we need to create a batch script
And a Matlab script containing the commands to run
And we should place both files in a folder on /scratch where we will submit the job from.
#PBS -N Matlab
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "sample_script"
sample_script.pbs
H=hilb(1000);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300','fig1-batch.pdf');
sample_script.m
qsubsample_script.pbs
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
To use a job array we need to use the “-t” PBS option
And turn our Matlab script into a function that takes arguments. (sample_function.m)
#PBS -t 0-3
#PBS -N Matlab
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "sample_function($PBS_ARRAYID)"
sample_script.pbs
sample_function.m
function sample_function(n)
H=hilb(n);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300',sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
By default Matlab uses all cores on a given node for operations that can be threaded – regardless of the submission script.
Arrays and matrices • Basic information: ISFINITE, ISINF, ISNAN, MAX, MIN • Operators: +, -, .*, ./, .\, .^, *, ^, \ (MLDIVIDE), / (MRDIVIDE) • Array operations: PROD, SUM • Array manipulation: BSXFUN, SORT Linear algebra • Matrix Analysis: DET, RCOND • Linear Equations: CHOL, INV, LDL, LINSOLVE, LU, QR • Eigenvalues and singular values: EIG, HESS, SCHUR, SVD, QZ Elementary math • Trigonometric: ATAN2, COS, CSC, HYPOT, SEC, SIN, TAN, including variants for radians, degrees, hyperbolics, and inverses. • Exponential: EXP, POW2, SQRT • Complex: ABS • Rounding and remainder: CEIL, FIX, FLOOR, MOD, REM, ROUND • LOG, LOG2, LOG10, LOG1P, EXPM1, SIGN, BITAND, BITOR, BITXOR Special Functions • ERF, ERFC, ERFCINV, ERFCX, ERFINV, GAMMA, GAMMALN Data Analysis • CONV2, FILTER, FFT and IFFT of multiple columns or long vectors, FFTN, IFFTN
To be sure you only use the resources you request, you should either request an entire node and all of the CPU’s...
Or request a single cpu and specify that Matlab should only use a single thread
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=8,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -singleCompThread
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
Matlab can use GPUs to do calculations, provided a GPU is available on the node Matlab is running on.
We can query the connected GPUs from within Matlab using
qsub-I -X -q blugpu-l walltime=1:00:00,nodes=1:ppn=1:gpus=1,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
module load cuda
matlab
gpuDeviceCount()
gpuDevice()
Matlab can use GPUs to do calculations, provided a GPU is available on the node Matlab is running on.
We can query the connected GPUs from within Matlab using
And obtain a list of GPU supported functions using
qsub -I -X -q blugpu-l walltime=1:00:00,nodes=1:ppn=1:gpus=1,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
module load cuda
matlab
gpuDeviceCount()
gpuDevice()
methods('gpuArray')
So there is a 2D FFT – but no Hilbert function...
We could do the log and abs functions on the GPU as well.
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
Z=gather(Z_);
imagesc(log(abs(Z)));
Distribute data to GPU
FFT performed on GPU
Gather data from GPU onto CPU
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
imagesc(gather(log(abs(Z_)));
For our example, doing the FFT on the GPU is 7x faster. (4x if you include moving the data to the GPU and back)
>> H=hilb(5000);
>> tic; A=gather(gpuArray(H)); toc
Elapsed time is 0.161166 seconds.
>> tic; A=gather(fft2(gpuArray(H))); toc
Elapsed time is 0.348159 seconds.
>> tic; A=fft2(H); toc
Elapsed time is 1.210464 seconds.
Matlab has no built in hilb() function that can run on the GPU – but we can write our own function(kernel) in cuda – and save it to hilbert.cu
And compile it with nvcc to generate a Parallel Thread eXecution file
__global__ void HilbertKernel( double * const out, size_tconstnumRows, size_tconstnumCols)
{
constintrowIdx = blockIdx.x * blockDim.x + threadIdx.x;
constintcolIdx = blockIdx.y * blockDim.y + threadIdx.y;
if ( rowIdx >= numRows ) return;
if ( colIdx >= numCols ) return;
size_tlinearIdx = rowIdx + colIdx*numRows;
out[linearIdx] = 1.0 / (double)(1+rowIdx+colIdx) ;
}
nvcc -ptxhilbert.cu
We have to initialize the kernel and specify the grid size before executing the kernel.
The default for matlab kernel’s is 1 thread per block, but we could create fewer blocks that were each 10 x 10 threads.
In any event, our speedup is a factor of 50 compared to 1 CPU.
H_=gpuArray.zeros(1000);
hilbert_kernel=parallel.gpu.CUDAKernel('hilbert.ptx','hilbert.cu');
hilbert_kernel.GridSize=size(H_);
H_=feval(hilbert_kernel, H_, 1000,1000);
Z_=fft2(H_);
imagesc(gather(log(abs(Z_))));
hilbert_kernel.ThreadBlockSize=[10,10,1];
hilbert_kernel.GridSize=[100,100];
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
As an alternative you can also use the Parallel Computing Toolbox. This supports parallelism via MPI
You can enable a pool of matlab workers using matlabpool
You should create a pool that is the same size as the number of processors you requested in your job submission. Matlab also sells licenses for using a Distributed Computing Server which allows for matlabpools that use more than just the local node.
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=8,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -singleCompThread
matlabpool(8)
matlabpool(4)
parforn=1:100
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off');imagesc(log(abs(Z)));
print('-dpdf','-r300',sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
end
matlabpool close
matlabpool(4)
spmd
for n=drange(1:100)
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off');imagesc(log(abs(Z)));
end
end
matlabpool close
matlabpool(4)
spmd
for n=labindex:numlabs:100
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off');imagesc(log(abs(Z)));
end
end
matlabpool close
pmode start 4
pmode lab2client H 3 H3
H3
pmode close
n=labindex;
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off'); imagesc(log(abs(Z)));
print('-dpdf','-r300',sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
Example using distributed arrays
matlabpool(8)
H=hilb(1000);
H_=distributed(H);
Z_=fft(fft(H_,[],1),[],2);
Z=gather(Z_);
imagesc(log(abs(Z)));
matlabpool close
Example using gpuArray
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
Z=gather(Z_);
imagesc(log(abs(Z)));
What about building hilbert matrix in parallel?
matlabpool(4)
spmd
codist=codistributor1d(1,[250,250,250,250],[1000,1000]);
[i_lo, i_hi]=codist.globalIndices(1);
H_local=zeros(250,1000);
for i=i_lo:i_hi
for j=1:1000
H_local(i-i_lo+1,j)=1/(i+j-1);
end
end
H_ = codistributed.build(H_local, codist);
end
Z_=fft(fft(H_,[],1),[],2);
Z=gather(Z_);
imagesc(log(abs(Z)));
matlabpool close
Define partition
Get local indices in x-direction
Initialize local array with Hilbert values.
Allocate space for local part
Assemble codistributed array
Now it's distributed like before!
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
mdce_init
qsubmdce_job.pbs
Here is the job submission script
This script loads the matlab module, starts the cluster with pbs_mdce_start, and runs the matlab script "mdce_script.m"
#!/bin/bash
#PBS -N Matlab_mdce
#PBS -j oe
#PBS -l nodes=2:ppn=8,pvmem=2000mb
#PBS -l walltime=1:00:00
#PBS -l epilogue=mdce_cleanup
#PBS -q standard
#PBS -o matlab.log
. /usr/local/modules/init/bash
module load matlab-R2013a-local
cd $PBS_O_WORKDIR
pbs_mdce_start
matlab -nodisplay -r "mdce_script"
mdce_job.pbs
Note that other versions of matlab could take hours to start the matlab cluster!!!
This epilogue script is important to ensure that the cluster is taken down when your job terminates
And here is the sample matlab script
The mdce_profile() function returns a profile that can be used to connect to the mdce cluster for your job. You can then use matlabpoolor pmode, or spmd etc... to startup parallel computations across the matlab cluster.
profile=mdce_profile()
matlabpool('open', profile)
parfor n=1:matlabpool('size')
H=hilb(n);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300',sprintf('%s%03d%s','fig1-batch',n,'.pdf'));
end
matlabpool('close')
mdce_script.m
For interactive mode, you can use the qMatlab_mdce script. This script will inherit your matlab path from your environment, so be sure to load the matlab-R2013a-local module to speed up the initilization of the cluster.
This will create a matlab cluster which in this case consists of 4 nodes each with 8 workers and 16 GB of memory per. To use the matab cluster, load the profile using the mdce_profile() function and then open the pool of workers with matlabpool– or pmode etc...
mkdir/scratch/jcarrol5/matlab_mdce
cd /scratch/jcarrol5/matlab_mdce
module load matlab-R2013a-local
qMatlab_mdce 4 8 16
profile=mdce_profile()
matlabpool('open', profile)
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
Since pMatlab works by launching other Matlab instances – we need them to startup with pMatlab functionality. To do so we need to add a few lines to our startup.m file in our matlab path.
addpath('/usr/local/pMatlab/MatlabMPI/src');
addpath('/usr/local/pMatlab/src');
rehash;
pMatlabGlobalsInit;
To submit a job in batch mode we need to create a batch script
And a Matlab script to launch the pMatlab script
#PBS -N Matlab
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=2:ppn=8,vmem=32gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "pmatlab_launcher"
sample_script.pbs
[sreturn, nProcs]=system('cat $PBS_NODEFILE | wc -l');
nProcs=str2num(nProcs);
[sreturn, machines]=system('cat $PBS_NODEFILE | uniq');
machines=regexp(machines, '\n', 'split');
machines=machines(1:size(machines,2)-1);
eval(pRUN('pmatlab_script',nProcs,machines));
pmatlab_launcher.m
And finally we have our pmatlab script.
Do y-fftand do x-fft. Z_ has different map
Allocate and populate local portion of array with Hilbert matrix values
Copy local values into distributed array
Collect resulting matrix onto 'leader'
Xmap=map([Np 1],{},0:Np-1);
H_=zeros(1000,1000,Xmap);
[I1,I2]=global_block_range(H_);
H_local=zeros(I1(2)-I1(1)+1,I2(2)-I2(1)+1);
for i=I1(1):I1(2)
for j=I2(1):I2(2)
H_local(i-I1(1)+1,j-I2(1)+1)=1/(i+j-1);
end
end
H_=put_local(H_,H_local);
Z_=fft(fft(H_,[],2),[],1);
Z=agg(Z_);
if (pMATLAB.my_rank == pMATLAB.leader)
f=figure('Visible','off');
imagesc(log(abs(Z)));
print('-dpdf','-r300', 'fig1.pdf');
end
Indices for local portion of array
map for distributing array
Distributed matrix constructor
Plot result from 'leader' matlab process
X = put_local(X, fft(local(X),[],2));
Z = transpose_grid(X);
Z = put_local(Z, fft(local(Z),[],1));
pmatlab_script.m
PBS is unaware of matlab sessions launched from 'pRUN' and therefore cannot properly clean up if something goes wrong (job runs out of walltime etc...) To avoid leaving orphaned Matlab processes on other machines, modify your PBS script
to run this epilogue script which must have user-execute permissions
#PBS -l epilogue=epilogue_script.sh
#!/bin/bash
cd $PBS_O_WORKDIR/MatMPIa
echo "running prologue"
pwd;
for i in `lspid.*`; do
machine=`echo $i | awk -F '.' '{print $2}'`;
pid=`echo $i | awk -F '.' '{print $3}'`\;
ssh $machine "(kill -9 $pid)" && rm -rf $i;
done
epilogue_script.sh
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
There is a configuration file for mex that you can place in your ~/.matlab/R2012b/ folder – or whatever version of matlab you are using. The file can be downloaded from the CIRC wiki
http://www.circ.rochester.edu/wiki/index.php/Mexopts.sh
C, C++, or Fortran routines can be called from within Matlab.
#include "fintrf.h"
subroutine mexfunction(nlhs, plhs, nrhs, prhs)
mwPointer :: plhs(*), prhs(*)
integer :: nlhs, nrhs
mwPointer :: mxGetPr
mwPointer :: mxCreateDoubleMatrix
real(8) :: mxGetScalar
mwPointer :: pr_out
integer :: n
n = nint(mxGetScalar(prhs(1)))
plhs(1) = mxCreateDoubleMatrix(n,n, 0)
pr_out = mxGetPr(plhs(1))
call compute(%VAL(pr_out),n)
end subroutine mexfunction
subroutine compute(h, n)
integer :: n
real(8) :: h(n,n)
integer :: i,j
do i=1,n
do j=1,n
h(i,j)=1d0/(i+j-1d0)
end do
end do
end subroutine compute
mex hilbert.f90
>> H=hilbert(10)
Part I – Interacting with Matlab
Part II – Speeding up Matlab Computations
Part III – Mixing Matlab and Fortran/C code
First we create a log_abs_fft_hilb.m function
And then we run
This will produce a mex file that we can test.
We could have specified the type of 'n' in our matlab function
function result = log_abs_fft_hilb(n) result=log(abs(fft2(hilb(n))));
>> codegenlog_abs_fft_hilb.m –args {uint32(0)}
>> A=log_abs_fft_hilb_mex(uint32(16));>> B=log_abs_fft_hilb(16);
>> max(max(abs(A-B)))
ans = 8.8818e-16
function result = log_abs_fft_hilb(n)
assert(isa(n,'uint32')); result=log(abs(fft2(hilb(n))));
Now we can also export a static library that we can link to:
This will create a subdirectory codegen/lib/log_abs_fft_hilb that will have the source files '.c and .h' as well as a compiled object files '.o' and a library 'log_abs_fft_hilb.a'
The source files are portable to any platform with a 'C' compiler (ieBlueStreak). We can rebuild the library on BlueStreak by running
>> codegenlog_abs_fft_hilb.m -configcoder.config('lib') -args {'uint32(0)'}
mpixlc –c *.c
arrcslog_abs_fft_hilb.a *.o
To use the function, we still need to write a main subroutine that links to it. This requires working with matlab's variable types (which include dynamically resizable arrays)
Output result in column major order
Free up memory associated with return array
Argument to Matlab function
Return value of Matlab function
Initialize Matlab array to have rank 2
Call matlab function
Matlab type definitions
#include "stdio.h"
#include "rtwtypes.h"
#include "log_abs_fft_hilb_types.h"
void main() {
uint32_T n=64;
emxArray_real_T*result;
int32_T i,j;
emxInit_real_T(&result, 2);
log_abs_fft_hilb(n, result);
for(i=0;i<result->size[0];i++) {
for(j=0;j<result->size[1];j++) {
printf("%f",result->data[i+result->size[0]*j]);
}
printf("\n");
}
emxFree_real_T(&result);
}
Exported code was 2x slower.
And here is another example of calling 2D fft's on real data
void main() {
int32_T q0;
int32_T i;
int32_T n=8;
emxArray_creal_T *result;
emxArray_real_T *input;
emxInit_creal_T(&result, 2);
emxInit_real_T(&input, 2);
q0 = input->size[0] * input->size[1];
input->size[0]=n;
input->size[1]=n;
emxEnsureCapacity((emxArray__common *)input,
q0, (int32_T)sizeof(real_T));
for(j=0;j<input->size[1];j++ {
for(i=0;i<input->size[0];i++) {
input->data[i+input->size[0]*j]=1.0 / (real_T)(i+j+1);
}
}
my_fft(input, result);
for(i=0;i<result->size[0];i++) {
for(j=0;j<result->size[1];j++) {
printf("[% 10.4f,% 10.4f] ",
result->data[i+result->size[0]*j].re,
result->data[i+result->size[0]*j].im);
}
printf("\n");
}
emxFree_creal_T(&result);
emxFree_real_T(&input);
}
Exported FFT's only work on vectors of length 2N
Error checking is disabled in exported C code
Mex code will have the same functionality as exported C code, but will also have error checking. It will warn about doing FFT's on arbitrary length vectors, etc...
Always test your mex code!
#include <stdio.h>
#include <math.h>
#include <complex.h>
#include <fftw3.h>
void main() {
int n=4096;
inti,j;
double complex temp[n][n], input[n][n];
double result[n][n];
fftw_plan p;
p=fftw_plan_dft_2d(n, n, &input[0][0], &temp[0][0],
FFTW_FORWARD, FFTW_ESTIMATE);
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
input[i][j]=(double complex)(1.0/(double)(i+j+1));
}
}
fftw_execute(p);
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
result[i][j]=log(cabs(temp[i][j]));
}
}
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
printf("%f ",result[i][j]);
}
}
fftw_destroy_plan(p);
}
Or you can write your own 'C' code that uses open source mathematical libraries (iefftw).