Workshop: Using the VIC3 Cluster for Statistical Analyses Support

Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex

Overview • Cluster VIC3: hardware & software • Statistics research scenario • Worker framework • MapReduce with Worker • Q&A

Birds eye view of VIC3 r1i0n0 login1 r1i3n15 r1i0n1 login2 /bin r2i0n0 r2i3n15 r2i0n1 ~vsc30034 netapp svcs2 svcs1

VIC3 nodes • Compute nodes • 112 nodes with 2 quad core 'harpertown', 8GB RAM • 80 nodes with 2 quad core 'nehalem', 24GB RAM • 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk • Storage • 20 TB disk space shared between home directories and scratch space, access via NFS • 4 nodes with disks for a parallel file system (needed for MPI I/O jobs) • Service nodes include 2 login nodes 1584 cores, for16.6 TFlop (theoretical peak)

What can you run? • All open source linuxsoftware • All linux software the K.U.Leuven has a license for that covers the cluster, and you are a K.U.Leuven staff member • All linux software you have a license for that covers the cluster • No Windows software R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users

Running example: SAS code • Your SAS program, e.g., 'clmk.sas' • is usually interactive • depends on parameters, e.g., • type of distribution • alpha, beta • has to be run for several types and values of alpha and beta

Running example: batch mode • 1st step: convert it for batch mode • capture command line variables: • run it from the command line: … %LET type = "%scan(&sysparm, 1, %str(:))"; %LET alpha = %scan(&sysparm, 2, %str(:)); %LET beta = %scan(&sysparm, 3, %str(:)); … $ sas –batch –noterminal –sysparmdiscr:1.3:15.0clmk.sas

I've got a job to do: PBS files #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparmdiscr:1.3:15.0 clmk.sas clmk.pbs $ msub clmk.pbs login compute nodes queue system/scheduler: Torque/Moab

No more modifying! #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparmdiscr:1.3:15.0 clmk.sas $ msub clmk.pbs #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm$type:$alpha:$beta clmk.sas $ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0

Going parallel… or nuts? • Parameter sets… • are independent, so computations can be done in parallel! • but all combination of type, alpha, beta: large number of jobs Worker framework

Conceptually #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm$type:$alpha:$beta clmk.sas

Concrete clmk.csv clmk.pbs N #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm$type:$alpha:$beta clmk.sas $ module load worker/1.0 $ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8 N rows will be computed in parallel by 2 × 8 – 1 = 15 cores

Caveat 1: time is of the essence… • How long does your job need? (= walltime) • time to compute N rows/requested cores • walltime limitations • more than 5 minutes • less than 2 days • hence, if walltime exceeds 2 days, split data and submit multiple jobs • explicitly request sufficient walltime: No hard limits, but guidelines to reduce queue time $ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00

Caveat 2: slave labour • P cores, how to choose P? • functions • 1 master • P – 1 slaves • each compute node has 8 cores, so P mod 8 = 0 • N>>P: better load balancing, efficiency • largerP • shorterwalltime • (potentially) longer time in queue turn-around = queue time + walltime shortest turn-around: hard to predict

Caveat 3: independence SAS locks log and output files! #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR log_name="clmk-$type-$alpha-$beta.log" print_name="clmk-$type-$alpha-$beta.lst" sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm$type:$alpha:$beta clmk.sas Make sure each computation writes to its own files!

Conceptually: MapReduce result.txt.1 data.txt result.txt.2 data.txt.1 data.txt.2 data.txt.7 … … result.txt.7 result.txt reduce map

Concrete: -prolog & -epilog result.txt.1 data.txt result.txt.2 data.txt.7 data.txt.1 data.txt.2 … … result.txt.7 result.txt batch.sh prolog.sh epilog.sh prolog.sh batch.sh batch.sh $ wsub–prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8

Where to find help? • http://www.vscentrum.be/vsc-help-center • hpcinfo@icts.kuleuven.be • http://status.kuleuven.be/hpc • UHasselt staff: geertjan.bex@uhasselt.be

Workshop: Using the VIC3 Cluster for Statistical Analyses Support