1 / 11

TTU High Performance Computing User Training: Part 2 Srirangam Addepalli and David Chaffin, Ph.D.

TTU High Performance Computing User Training: Part 2 Srirangam Addepalli and David Chaffin, Ph.D. Advanced Session: Outline Cluster Architecture File System and Storage Lectures with Labs: Advanced Batch Jobs Compilers/Libraries/Optimization Compiling/Running Parallel Jobs Grid Computing.

shanae
Download Presentation

TTU High Performance Computing User Training: Part 2 Srirangam Addepalli and David Chaffin, Ph.D.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Advanced Session: Outline • Cluster Architecture • File System and Storage • Lectures with Labs: • Advanced Batch Jobs • Compilers/Libraries/Optimization • Compiling/Running Parallel Jobs • Grid Computing

  2. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • HPCC Clusters • hrothgar: 128 dual-processor 64-bit Xeons, 3.2 Ghz, 4GB memory, Infiniband and Gigabit Ethernet, Centos 4.3 (Redhat) • community cluster: 64 nodes, part of hrothgar, same except no Infiniband. Owned by faculty members, controlled by batch queues. • minigar; 20 nodes, 3.6 Ghz, IB, for development, open soon • Physics grid machine on order: some nodes available • poseidon: Opteron, 3 nodes, pathscale compilers • Several retired, test, grid systems

  3. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Cluster Performance • Main factors: • 1. Individual node performance: of course. SpecFP2000Rate (www.spec.org) matches our apps well. Newest dual cores have 2x cores, ~1.5x perf per core for 3x performance per node vs. hrothgar. • 2. Fabric latency (delay time of one message, ms. IB=6 GE=40) • 3. Fabric bandwidth (MB/s IB=600 GE=60) • Intels better cpu right now, AMD better shared mem performance. Overall about equal.

  4. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Cluster Architecture • An application example where the system is limited by interconnect performance: • gromacs, simulation time completed/real time • Hrothgar, 8 nodes, Gig-E: ~1200 ns/day • Hrothgar, 8 nodes, IB: ~2800 ns/day • Current dual-core systems have 3x the serial throughput of hrothgar, and quad-core systems are coming next year. They need more bandwidth: Gig-E will in the future be suitable only for serial jobs.

  5. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Cluster Usage • ssh to hrothgar • scp files to hrothgar • compile on hrothgar • run on compute nodes (only) using lsf batch system (only) • example files: /home/shared/examples/

  6. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Some Useful LSF Commands • bjobs –w (-w for wide shows full node name) • bjobs –l [job#] (–l for long shows everything) • bqueues [-l] shows queues [everything] • bhist [job#] job history • bpeek [job#] stdout/err stored by lsf • bkill job# kill it • -bash-3.00$ /home/shared/bin/check-hosts-batch.sh • hrothgar, 2 free=0 nodes, 0 cpus • hrothgar, 1 free=3 nodes, 3 cpus • hrothgar, 0 free=125 nodes • hrothgar, offline=0 nodes

  7. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Batch Queues on hrothgar • bqueues • QUEUE_NAME PRIO STATU MAX JL/U JL/P JL/H NJOBS PEND RUN • short 35 Open 56 56 - - 0 0 0 • parallel 35 Open 224 40 - - 108 0 108 • serial 30 Open 156 60 - - 204 140 64 • parallel_long 25 Open 256 64 - - 16 0 16 • idle 20 Open 256 256 - - 100 0 55 • Every 30 sec the scheduler cycles queued jobs. Starts if: • (1) nodes are available, free or idle run • (2) Cpu’s less than user queue limit “bqueues JL/U” • (3) Cpu’s Less that total queue limit “bqueues MAX” • (4) Highest priority queue (short,par,ser,par_long,idle) • (5) Fair share (user with smallest current usage goes first)

  8. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Unix/Linux Compiling Common Features • [compiler] [options] [source files] [linker options] • (pathscale is only on poseidon) • C compilers: gcc, icc, pathcc • C++:g++, icpc, pathCC • Fortran:g77, ifort, pathf90 • Options:-O [optimize] -o outputfilename • Source files: new.f or *.f or *.c • Linker options: To link with libx.a or libx.so in /home/elvis/lib: • -L/home/elvis/lib –lx • Many programs need: -lm, -pthread

  9. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • MPI Compile: Path • . /home/shared/examples/new-bashrc [using bash] • source /home/shared/examples/new-cshrc [using tcsh] • hrothgar:dchaffin:dchaffin $ echo $PATH • /sbin:/bin:/usr/bin:/usr/sbin:/usr/X11R6/bin:\ • /usr/share/bin:/opt/rocks/bin:/opt/rocks/sbin:\ • /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin:\ • /opt/intel/fce/9.0/bin:/opt/intel/cce/9.0/bin:\ • /share/apps/mpich/IB-icc-ifort-64/bin:\ • /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin • mpich: IB or GE, icc or gcc or pathcc, ifort or g77 or pathf90 • mpicc/mpif77/mpif90/mpiCC must match mpirun!

  10. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • MPI Compile/Run • cp /home/shared/examples/mpi-basic.sh . • cp /home/shared/examples/cpi.c . • /opt/mpich/gnu/bin/mpicc cpi.c [or] • /share/apps/mpich/IB-icc-ifort-64/bin/mpicc cpi.c • vi mpi-basic.sh • Ptiles comment out the mpirun that you are not using either IB or default • Could change executable name • bsub < mpi-basic.sh • produces: • job#.out lsf output • job#.pgm.out mpirun output • job#.err lsf stderr • job#.pgm.err mpirun stderr

  11. TTU High Performance Computing User Training: Part 2Srirangam Addepalli and David Chaffin, Ph.D. • Exercise/Homework • Run mpi benchmark on Infiniband, Ethernet, and Shared memory. Compare latency and bandwidth. Research and briefly discuss reasons for the performance: • Hardware bandwidth (look it up) • Software layers (OS, interrupts, MPI, one-sided copy, two-sided copy) • Hardware: • Topspin Infiniband SDR, PCI-X • Xeon Nocona shared memory • Intel Gigabit, on board • Program:/home/shared/examples/mpilc.c or equivalent

More Related