1 / 57

High Performance Computing Workshop HPC 101

High Performance Computing Workshop HPC 101. Dr. Charles J Antonelli LSAIT ARS June, 2014. Credits. Contributors: Brock Palen (CAEN HPC) Jeremy Hallum (MSIS) Tony Markel (MSIS) Bennet Fauber (CAEN HPC) Mark Montague (LSAIT ARS) Nancy Herlocher (LSAIT ARS) LSAIT ARS CAEN HPC.

kevyn
Download Presentation

High Performance Computing Workshop HPC 101

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High PerformanceComputing WorkshopHPC 101 Dr. Charles J Antonelli LSAIT ARSJune, 2014

  2. Credits • Contributors: • Brock Palen (CAEN HPC) • Jeremy Hallum (MSIS) • Tony Markel (MSIS) • BennetFauber (CAEN HPC) • Mark Montague (LSAIT ARS) • Nancy Herlocher (LSAIT ARS) • LSAIT ARS • CAEN HPC cja 2014

  3. Roadmap • High Performance Computing • Flux Architecture • Flux Mechanics • Flux Batch Operations • Introduction to Scheduling cja 2014

  4. High Performance Computing cja 2014

  5. Cluster HPC • Acomputing cluster • a number of computing nodes connected together via special hardware and software that together can solvelarge problems. • A cluster is much less expensive than a single supercomputer(e.g., a mainframe) • Using clusters effectively requires support in scientific software applications(e.g., Matlab's Parallel Toolbox, or R's Snow library), or custom code cja 2014

  6. Programming Models • Two basic parallel programming models • Message-passingThe application consists of several processes running on different nodes and communicating with each other over the network • Used when the data are too large to fit on a single node, and simple synchronization is adequate • “Coarse parallelism” • Implemented using MPI (Message Passing Interface) libraries • Multi-threadedThe application consists of a single process containing several parallel threads that communicate with each other using synchronization primitives • Used when the data can fit into a single process, and the communications overhead of the message-passing model is intolerable • “Fine-grained parallelism” or “shared-memory parallelism” • Implemented using OpenMP (Open Multi-Processing) compilers and libraries • Both cja 2014

  7. Amdahl’s Law cja 2014

  8. Flux Architecture cja 2014

  9. Flux Flux is a university-wideshared computational discovery / high-performance computing service. • Provided by Advanced Research Computing at U-M • Operated by CAEN HPC • Procurement, licensing, billing by U-M ITS • Interdisciplinary since 2010 http://arc.research.umich.edu/resources-services/flux/ cja 2014

  10. The Flux cluster Login nodes Compute nodes Data transfernode Storage … cja 2014

  11. A Flux node 48-64 GB RAM 12-16 Intel cores Local disk Ethernet InfiniBand cja 2014

  12. A Large Memory Flux node 1 TB RAM 32-40 Intel cores Local disk Ethernet InfiniBand cja 2014

  13. Coming soon:A Flux GPU node 64 GB RAM 8 GPUs 16 Intel cores Local disk Each GPU contains 2,688 GPU cores cja 2014

  14. Flux software • Licensed and open software: • Abacus, BLAST, BWA, bowtie, ANSYS, Java, Mason, Mathematica, Matlab, R, RSEM, STATA SE, … • See http://cac.engin.umich.edu/resources • C, C++, Fortran compilers: • Intel (default), PGI, GNU toolchains • You can choose software using the module command cja 2014

  15. Flux network • All Flux nodes are interconnected via Infiniband and a campus-wide private Ethernet network • The Flux login nodes are also connected to the campus backbone network • The Flux data transfer node is connected over a 10 Gbps connection to the campus backbone network • This means • The Flux login nodes can access the Internet • The Flux compute nodes cannot • If Infiniband is not available for a compute node, code on that node will fall back to Ethernet communications cja 2014

  16. Flux data • Lustre filesystem mounted on /scratch on all login, compute, and transfer nodes • 640 TB of short-term storage for batch jobs • Large, fast, short-term • NFS filesystems mounted on /home and /home2 on all nodes • 80 GB of storage per user for development & testing • Small, slow, long-term cja 2014

  17. Flux data • Flux does not provide large, long-term storage • Alternatives: • Value Storage (NFS) • $20.84 / TB / month (replicated, no backups) • $10.42 / TB / month (non-replicated, no backups) • LSA Large Scale Research Storage • 2 TB free to researchers (replicated, no backups) • Faculty members, lecturers, postdocs, GSI/GSRA • Additional storage $30 / TB / year (replicated, no backups) • Departmental server • CAEN can mount your storage on the login nodes cja 2014

  18. Copying data Three ways to copy data to/from Flux • From Linux or Mac OS X, use scp:scplocalfilelogin@flux-xfer.engin.umich.edu:remotefilescplogin@flux-login.engin.umich.edu:remotefilelocalfilescp -r localdirlogin@flux-xfer.engin.umich.edu:remotedir • From Windows, use WinSCP • U-M Blue Dischttp://www.itcs.umich.edu/bluedisc/ • Use Globus Connect cja 2014

  19. Globus Connect • Features • High-speed data transfer, much faster than SCP or SFTP • Reliable & persistent • Minimal client software: Mac OS X, Linux, Windows • GridFTP Endpoints • Gateways through which data flow • Exist for XSEDE, OSG, … • UMich: umich#flux, umich#nyx • Add your own client endpoint! • Add your own server endpoint: contact flux-support@umich.edu • More information • http://cac.engin.umich.edu/resources/login-nodes/globus-gridftp cja 2014

  20. Flux Mechanics cja 2014

  21. Using Flux • Three basic requirements to use Flux: • A Flux account • A Flux allocation • An MToken (or a Software Token) cja 2014

  22. Using Flux • A Flux account • Allows login to the Flux login nodes • Develop, compile, and test code • Available to members of U-M community, free • Get an account by visiting https://www.engin.umich.edu/form/cacaccountapplication cja 2014

  23. Using Flux • A Flux allocation • Allows you to run jobs on the compute nodes • Some units cost-share Flux rates • Regular Flux:  $11.72/core/monthLSA, Engineering, Medical School $6.60/month • Large Memory Flux: $23.82/core/monthLSA, Engineering, Medical School $13.30/month • GPU Flux: $107.10/2 CPU cores and 1 GPU/monthLSA, Engineering, Medical School $60/month • Flux Operating Environment: $113.25/node/monthLSA, Engineering, Medical School $63.50/month • Flux pricing at http://arc.research.umich.edu/flux/hardware-services/ • Rackham grants are available for graduate students • Details at http://arc.research.umich.edu/resources-services/flux/flux-pricing/ • To inquire about Flux allocations please email flux-support@umich.edu cja 2014

  24. Using Flux • An MToken (or a Software Token) • Required for access to the login nodes • Improves cluster security by requiring a second means of proving your identity • You can use either an MToken or an application for your mobile device (called a Software Token) for this • Information on obtaining and using these tokens at http://cac.engin.umich.edu/resources/login-nodes/tfa cja 2014

  25. Logging in to Flux • ssh flux-login.engin.umich.edu • MToken (or Software Token) required • You will be randomly connected a Flux login node • Currently flux-login1 or flux-login2 • Firewalls restrict access to flux-login.To connect successfully, either • Physically connect your ssh client platform to the U-M campus wired or MWireless network, or • Use VPN software on your client platform, or • Use ssh to login to an ITS login node (login.itd.umich.edu), and ssh to flux-login from there cja 2014

  26. Modules • The module command allows you to specify what versions of software you want to use module list -- Show loaded modulesmodule loadname-- Load module name for usemodule avail -- Show all available modulesmodule avail name -- Show versions of module name*module unload name -- Unload module namemodule -- List all options • Enter these commands at any time during your session • A configuration file allows default module commands to be executed at login • Put module commands in file ~/privatemodules/default • Don’t put module commands in your .bashrc / .bash_profile cja 2014

  27. Flux environment • The Flux login nodes have the standard GNU/Linux toolkit: • make, autoconf, awk, sed, perl, python, java, emacs, vi, nano, … • Watch out for source code or data files written on non-Linux systems • Use these tools to analyze and convert source files to Linux format • file • dos2unix cja 2014

  28. Lab 1 Task: Invoke R interactively on the login node • module load Rmodule list • R q() • Please run only very small computations on the Flux login nodes, e.g., for testing cja 2014

  29. Lab 2 Task: Run R in batch mode • module load R • Copy sample code to your login directorycd cp~cja/hpc-sample-code.tar.gz. tar -zxvfhpc-sample-code.tar.gz cd ./hpc-sample-code • Examine Rbatch.pbsand Rbatch.R • Edit Rbatch.pbs with your favorite Linux editor • Change #PBS -Memail address to your own cja 2014

  30. Lab 2 Task: Run R in batch mode • Submit your job to FluxqsubRbatch.pbs • Watch the progress of your job qstat -u uniqname where uniqname is your own uniqname • When complete, look at the job’s outputless Rbatch.out • Copy your results to your local workstation (change uniqnameto your own uniqname)scpuniqname@flux-xfer.engin.umich.edu:hpc-sample-code/Rbatch.out Rbatch.out cja 2014

  31. Lab 3 Task: Use the multicore package The multicore package allows you to use multiple cores on the same node • module load Rcd ~/sample-code • Examine Rmulti.pbsand Rmulti.R • Edit Rmulti.pbs with your favorite Linux editor • Change #PBS -Memail address to your own cja 2014

  32. Lab 3 Task: Use the multicore package • Submit your job to FluxqsubRmulti.pbs • Watch the progress of your job qstat -u uniqname where uniqname is your own uniqname • When complete, look at the job’s outputless Rmulti.out • Copy your results to your local workstation (change uniqnameto your own uniqname)scpuniqname@flux-xfer.engin.umich.edu:hpc-sample-code/Rmulti.outRmulti.out cja 2014

  33. Compiling Code • Assuming default module settings • Use mpicc/mpiCC/mpif90 for MPI code • Use icc/icpc/ifort with -mp for OpenMP code • Serial code, Fortran 90:ifort -O3 -ipo -no-prec-div –xHost -o prog prog.f90 • Serial code, C:icc -O3 -ipo -no-prec-div –xHost –o progprog.c • MPI parallel code:mpicc -O3 -ipo -no-prec-div –xHost -o progprog.cmpirun -np 2 ./prog cja 2014

  34. Lab 4 Task: compile and execute simple programs on the Flux login node • Copy sample code to your login directory:cd cp~brockp/cac-intro-code.tar.gz. tar -xvzfcac-intro-code.tar.gz cd ./cac-intro-code • Examine, compile & execute helloworld.f90: ifort-O3 -ipo -no-prec-div -xHost -o f90hello helloworld.f90 ./f90hello • Examine, compile & execute helloworld.c: icc-O3 -ipo -no-prec-div -xHost -o chellohelloworld.c ./chello • Examine, compile & execute MPI parallel code: mpicc-O3 -ipo -no-prec-div -xHost -o c_ex01 c_ex01.c mpirun-np 2 ./c_ex01 cja 2014

  35. Makefiles • The make command automates your code compilation process • Uses a makefile to specify dependencies between source and object files • The sample directory contains a sample makefile • To compile c_ex01: make c_ex01 • To compile all programs in the directory make • To remove all compiled programs make clean • To make all the programs using 8 compiles in parallel make -j8 cja 2014

  36. Flux Batch Operations cja 2014

  37. Portable Batch System • All production runs are run on the compute nodes using the Portable Batch System(PBS) • PBS manages all aspects of cluster job execution except job scheduling • Flux uses the Torque implementation of PBS • Flux uses the Moab scheduler for job scheduling • Torque and Moab work together to control access to the compute nodes • PBS puts jobs into queues • Flux has a single queue, named flux cja 2014

  38. Cluster workflow • You create a batch script and submit it to PBS • PBS schedules your job, and it enters the flux queue • When its turn arrives, your job will execute the batch script • Your script has access to any applications or data stored on the Flux cluster • When your job completes, anything it sent to standard output and error are saved and returned to you • You can check on the status of your job at any time, or delete it if it’s not doing what you want • A short time after your job completes, it disappears cja 2014

  39. Basic batch commands • Once you have a script, submit it:qsubscriptfile$ qsubsinglenode.pbs6023521.nyx.engin.umich.edu • You can check on the job status:qstatjobidqstat -u user $ qstat-u cja nyx.engin.umich.edu: Req'dReq'dElap Job ID Username Queue JobnameSessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----- 6023521.nyx.engi cjaflux hpc101i -- 1 1 -- 00:05 Q -- • To delete your jobqdeljobid$ qdel 6023521$ cja 2014

  40. Loosely-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l procs=12,pmem=1gb,walltime=01:00:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below:cd $PBS_O_WORKDIR mpirun ./c_ex01 cja 2014

  41. Tightly-coupled batch script #PBS -N yourjobname #PBS -V #PBS -A youralloc_flux #PBS -l qos=flux #PBS -q flux #PBS –l nodes=1:ppn=12,mem=47gb,walltime=02:00:00 #PBS -M youremailaddress #PBS -m abe #PBS -j oe #Your Code Goes Below: cd $PBS_O_WORKDIR matlab -nodisplay -r script cja 2014

  42. Lab 5 • Task: Run an MPI job on 8 cores • Compile c_ex05cd ~/cac-intro-codemake c_ex05 • Edit file runwith your favorite Linux editor • Change #PBS -Maddress to your own • I don’t want Brock to get your email! • Change #PBS -Aallocation to FluxTraining_flux, or to your own allocation, if desired • Change #PBS -lallocation to flux • Submit your jobqsubrun cja 2014

  43. PBS attributes • As always, man qsub is your friend-N : sets the job name, can’t start with a number-V : copy shell environment to compute node-A youralloc_flux: sets the allocation you are using-l qos=flux: sets the quality of service parameter-q flux: sets the queue you are submitting to-l : requests resources, like number of cores or nodes-M : whom to email, can be multiple addresses-m : when to email: a=job abort, b=job begin, e=job end-joe: join STDOUT and STDERR to a common file-I: allow interactive use-X : allow X GUI use cja 2014

  44. PBS resources (1) • A resource (-l) can specify: • Request wallclock (that is, running) time-l walltime=HH:MM:SS • Request C MB of memory per core-l pmem=Cmb • Request T MB of memory for entire job-l mem=Tmb • Request M cores on arbitrary node(s)-l procs=M • Request a token to uselicensed software-l gres=stata:1-l gres=matlab-l gres=matlab%Communication_toolbox cja 2014

  45. PBS resources (2) • A resource (-l) can specify:For multithreaded code: • Request M nodes with at least N cores per node-l nodes=M:ppn=N • Request Mcores with exactlyNcores per node (note the differencevis a visppn syntax and semantics!)-l nodes=M,tpn=N(you’ll only use this for specific algorithms) cja 2014

  46. Interactive jobs • You can submit jobs interactively: qsub -I -X -V -l procs=2 -l walltime=15:00 -A youralloc_flux-l qos=flux –q flux • This queues a job as usual • Your terminal session will be blocked until the job runs • When your job runs, you'll get an interactive shell on one of your nodes • Invoked commands will have access to all of your nodes • When you exit the shell your job is deleted • Interactive jobs allow you to • Develop and test on cluster node(s) • Execute GUI tools on a cluster node • Utilize a parallel debugger interactively cja 2014

  47. Lab 6 • Task: Run an interactive job • Enter this command (all on one line):qsub -I -V -l procs=1 -l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux • When your job starts, you’ll get an interactive shell • Copy and paste the batch commands from the “run” file, one at a time, into this shell • Experiment with other commands • After thirty minutes, your interactive shell will be killed cja 2014

  48. Lab 7 Task: Run Matlab interactively • module load matlab • Start an interactive PBS sessionqsub -I -V -l procs=2-l walltime=30:00 -A FluxTraining_flux -l qos=flux -q flux • Run Matlab in the interactive PBS sessionmatlab -nodisplay cja 2014

  49. Introduction to Scheduling cja 2014

  50. The Scheduler (1/3) • Flux scheduling policies: • The job’s queue determines the set of nodes you run on • The job’s account and qos determine the allocation to be charged • If you specify an inactive allocation, your job will never run • The job’s resource requirements help determine when the job becomes eligible to run • If you ask for unavailable resources, your job will wait until they become free • There is no pre-emption cja 2014

More Related