1 / 22

R and HPC

R and HPC. Scaling up and out with R on shared systems Gareth Williams IMT Advanced Scientific Computing. What is CSIRO Advanced Scientific Computing?. CSIRO IMT (Information Management and Technology) Focus on High Performance Computing Focus on eResearch

lora
Download Presentation

R and HPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R and HPC Scaling up and out with R on shared systemsGareth Williams IMT Advanced Scientific Computing

  2. What is CSIRO Advanced Scientific Computing? • CSIRO IMT (Information Management and Technology) • Focus on High Performance Computing • Focus on eResearch • Strategy of Economies of Utilization – shared infrastructure • Support • Software • Hardware • Interfacing with partners CSIRO.

  3. ASC information • http://intra.hpsc.csiro.au • Externally visible copy at http://www.hpsc.csiro.au • User Guides • http://intra.hpsc.csiro.au/userguides • Software information • http://intra.hpsc.csiro.au/software/ • http://nf.nci.org.au/facilities/software/ • hpchelp@csiro.au CSIRO.

  4. ASC resources • Storage – Tape backed petabyte store • Capability and capacity clusters • Cherax – 128 core NUMA ia64 • Burnet – commodity cluster • **new GPU cluster** • Partners • NCI http://nf.nci.org.au • iVEC http://www.ivec.org • TPAC http://www.tpac.org.au • ARCS http://www.arcs.org.au • QFAB http://qfab.org (emerging) • MASSIVE (emerging) CSIRO.

  5. General cluster schematic • Nodes and connecting/shared infrastructure Login node Users Mgmt. node Network switch Compute nodes …… Admin Batch server Shared storage … CSIRO.

  6. Accessing the cluster • Register with ASC • Login with ssh/PuTTY – nexus ident (and passwd) > ssh -X <ident>@<headnode> • Or start PuTTY and connect to headnode • Gets you into the ‘login/head node’ cherax/burnet/gpu01/xe • simple commands > cat /etc/motd > uname –a > ls -al > ps –fHu $USER > man CSIRO.

  7. Making R and other software available • Support staff compile R versions • Commercial compilers • Tuned BLAS/LAPACK • Installed in shared area • Extra packages of software are available as‘environment modules’ • what is loaded now?: > module list • what is available?: > module avail > module avail R • the search path for commands: > echo $PATH • load a module: > module load R • where a command is: > which R CSIRO.

  8. Interactive vs non-interactive R • Choose version, load module and go… • Type in instructions at the R interpreter prompt • But you only get to do one thing at a time. • Scripting R tasks • Save your R instructions in a text file • Make sure you don’t need interaction • Read/write/plot to files • There are a few options for how to run R > Rscript myscript.R [ARGS] > R -–no-restore CMD BATCH myscript.R [ARGS] > R –slave -–no-restore < myscript.R • One file (or set of arguments) per task • Watch out for over-writing results! • This defines separate tasks but doesn’t get them distributed and managed… CSIRO.

  9. Batch Queueing system • On shared ASC systems most work must be done as batch jobs • Queueing system • Distribute work over system • Avoid contention – jobs need dedicated resource • Provide equitable access – scheduling policy • Torque and Moab CSIRO.

  10. Using the Queuing system • Submit jobs, specifying resources required (using qsub) • see >man pbs_resources • walltime • nodes (and processes per node) • vmem • gpus • software • What happens? • Job script gets saved • Scheduler assesses priority and blocks out resource • Script copied to first allocated node and run for you in batch environment • Job terminated if resources specified are exceeded • Screen Output copied back at end of job • Can query status in meantime • More info: Read the user guide CSIRO.

  11. Scheduling • Jobs only started when resources can be dedicated. • Only ask for what you need! • Jobs that request too much memory will prevent other jobs from running • And take longer than necessary to start • PBS stdout file summarises resources usage • Long running jobs are unfriendly • Save state for restarting • Chain jobs together • Cluster may not stay up that long… • Submitting lots of jobs is OK • Not extremely short please. CSIRO.

  12. Example R job • First look at man pages for qsub, qdel, qstat • Then write commands in a script (myjob.q) #!/bin/bash #PBS –l nodes=1:ppn=1,vmem=1GB,walltime=1:00:00 cd $PBS_O_WORKDIR Rscript R-benchmark-25.R • And submit the job > qsub myjob.q > qstat > module load moab > showq • When complete, view output on job completion in myjob.o**** and myjob.e**** • But this job has no input and each run will be more-or-less identical (equivalent) CSIRO.

  13. Interactive batch job • When you need to have a resource dedicated for interactive use eg. • Intensive development • Debugging • Run qsub with the -I option (capital i) and an appropriate resource specification and wait for the prompt > qsub –I –l walltime=1:00:00,vmem=1GB • The scheduler still won’t start the job until resources are available – the accounting will record the resource dedicated • Log out (exit) as soon as you’re done to allow others to use the resources • Your session will be killed if you exceed the limits CSIRO.

  14. Optimized R • In general for optimising, you need to benchmark representative test cases – but an established benchmark is a good start • http://r.research.att.com/benchmarks R-benchmark-25.R • Other benchmarks • http://www.revolution-computing.com/products/benchmarks.php • Nathan Watson-Haigh • perform cross-product of the transpose of matrix m • cp1 <- crossprod(t(m)) • cp2 <- tcrossprod(m) • Run on a dedicated system • Compare systems • Compare versions • Compare build options • Parallel scaling CSIRO.

  15. Transpose cross product CSIRO.

  16. R-benchmark-25 CSIRO.

  17. Extras CSIRO.

  18. Optimized R Summary • Optimized BLAS/LAPACK can make a very big difference • Also shared memory parallel BLAS can be effective (Intel MKL) • ATLAS would also be good • Compiler may not be so critical for R • Windows Binary distribution does not have good BLAS • Performance differences are not uniform across board • Algorithms or Problem size • You should benchmark code that you actually want to run • ASC group can help! • Pre-requisites – general R performance tips • Pre-allocate memory • Minimize I/O • Fit in memory (don’t swap) • Have dedicated resources CSIRO.

  19. Parallelism • Scaling up vs Scaling out • Motivation: faster or distributed memory (more memory) • Shared memory parallel BLAS/LAPACK • Rmpi • R package to use MPI (message passing interface) • Must explicitly code send and receive of messages to transfer data • Hard work! > qsub –l nodes=5:ppn=2 Rmpijob.q • Revolution (Enterprise edition) • Networkspaces/sleigh • Or break up your work into independent tasks and aggregate results CSIRO.

  20. Ensemble of jobs – scaling out • Write one job script for each task • Use a scripting framework of your choice to automate creating the files • Submit jobs in a loop • Write one job script and pass it environment variables • Use qsub ‘–v’ option • Use qsub array job ‘–t’ option • Write jobscripts on-the-fly • Use nimrod • Write a template • Iterate or search over a parameter space CSIRO.

  21. Examples for ensembles • Nb. bash syntax here for loop – but use what you prefer! • submit scripts matching *.q > for SCRIPT in *.q; do qsub $SCRIPT; done • submit myjob.q with X set to 2.1, 2.4, 2.7.. 3.3 > for X in $(seq 2.1 0.3 3.4); do qsub –v X=$X myjob.q; done • submit myjob.q with IN set to files matching *.in > for FILE in *.in; do qsub –v IN=$FILE myjob.q; done • submit myjob.q with LINE set to each line in paramset.in > for L in $(cat paramset.in); do qsub –v LINE=$L myjob.q; done • submit myjob.q as an array job • PBS_ARRAYID will be set to 1..20 > qsub –t 1-20 myjob.q • myjob.q varying cpus requested/used - to test scaling > for N in 1 2 4 8; > do qsub –v OMP_NUM_THREADS=$N –l nodes=1:ppn=$N myjob.q; • done; CSIRO.

  22. CSIRO IM&T Gareth Williams Outreach Manager, Advanced Scientific Computing Email:Gareth.Williams@csiro.au hpchelp@csiro.au Web:http://intranet.csiro.au/intranet/imt http://www.hpsc.csiro.au/contact Helpdesk: (03) 9669 8103 Thank you Contact UsPhone: 1300 363 400 or +61 3 9545 2176Email: Enquiries@csiro.au Web: www.csiro.au

More Related