1 / 37

Introduction to Using the High Performance Cluster at the Center for Computational Research

Introduction to Using the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research NYS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203

najila
Download Presentation

Introduction to Using the High Performance Cluster at the Center for Computational Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Using the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research NYS Center of Excellence in Bioinformatics and Life SciencesUniversity at Buffalo, SUNY701 Ellicott StBuffalo, NY 14203 Phone: 716-881-8959Office: B1-109, Bell 107 cdc at ccr.buffalo.edu January 23, 2009

  2. Introduction to CCR • where to find information • cluster resources • data storage • required software • login and data transfer • UNIX environment and modules • compiling codes • running jobs • monitoring jobs

  3. Information and Getting Help • CCR website: http://www.ccr.buffalo.edu • How-to pages • Tutorials • System status • New Users Reference Card • Getting help: ccr-help at ccr.buffalo.edu • CCR uses an email problem ticket system. Users send their questions and descriptions of problems The technical staff receives the email and responds to the user. • Usually within one business day

  4. Cluster Computing • The u2 cluster is the major computational platform of the Center for Computational Research. • Login (front-end) and cluster machines run the Linux operating system. • Requires a CCR account. • Accessible from the UB domain. • The login machine is u2.ccr.buffalo.edu • Compute nodes are not accessible from outside the cluster. • Traditional UNIX style command line interface. • A few basic commands are necessary.

  5. Cluster Computing • The u2 cluster consists of 1056 dual processor DELL SC1425 compute nodes. • The compute nodes have Intel Xeon processors. • Most of the cluster machines are 3.2 GHz with 2 GB of memory. • There 64 compute nodes with 4 GB of memory and 32 with 8 GB. • All nodes are connected to a gigabit ethernet network. • 756 nodes are also connected the Myrinet, a high speed fibre network.

  6. Cluster Computing

  7. Data Storage • Home directory: • /san/user/UBITusername/u2 • The default user quota for a home directory is 2GB. • Users requiring more space should contact the CCR staff. • Data in home directories are backed up. • CCR retains data backups for one month. • Projects directories: • /san/projects[1-3]/research-group-name • UB faculty can request additional disk space for the use by the members of the research group. • The default group quota for a project directory is 100GB. • Data in project directories is NOT backed up by default.

  8. Data Storage • Scratch spaces are available for TEMPORARY use by jobs running on the cluster. • /san/scratch provides 2TB of space. • Accessible from the front-end and all compute nodes. • /ibrix/scratch provides 25TB of high performance storage. • Applications with high IO and that share data files benefit the most from using IBRIX. • Accessible from the front-end and all compute nodes. • /scratch provides a minimum of 60GB of storage. • The front-end and each computer nodes has local scratch space. This space is accessible from that machine only. • Applications with high IO and that do not share data files benefit the most from using local scratch. • Jobs must copy files to and from local scratch.

  9. Required Software • Secure Shell (ssh) is used to login to the u2 cluster front-end machine. • Telnet logins are NOT allowed. • X-Display is used to display any graphical interface on the user’s computer. • Secure file transfer is used to upload and download data files to the u2 front-end machine. • VPN must be used to access u2 from outside the University. • http://ubit.buffalo.edu/vpn/ • ssh, sftp and X are usually installed on LINUX/UNIX workstations by default.

  10. Required Software • Software for Windows machines can be downloaded from the UB CIT website. • http://ubit.buffalo.edu/software/win/index.php • PuTTY is the secure shell for Windows machines. • This package also provides clients for secure data transfer. • X-Win32 provides an X-Display client. • The WinsSCP client provides a drag-and-drop interface. • http://winscp.net/eng/index.php • See the CCR website for more information. • http://www.ccr.buffalo.edu/display/WEB/Getting+Started

  11. Unix Environment and Shell • The u2 front-end and compute nodes have a command line interface. • This is the user’s login shell, which on the u2 cluster will be bash. • The .bashrc file is a script that runs when a user logs in. • This script can be modified to set variables and paths. • Using the u2 cluster requires knowledge of some basic UNIX commands. • http://www.ccr.buffalo.edu/display/WEB/Unix+Commands • The UNIX Reference Card provides a short list of the basic commands. • http://wings.buffalo.edu/computing/Documentation/unix/ref/unixref.html

  12. Modules • Modules are available to set variables and paths for application software, communication protocols, compilers and numerical libraries. • module avail (list all available modules) • module load module-name (loads a module) • Updates PATH variable with path of application. • module unload module-name (unloads a module) • Removes path of application from the PATH variable. • module list (list loaded modules) • module show module-name • Show what the module sets. • Modules can be loaded in the user’s .bashrc file.

  13. Software • CCR provides a wide variety of scientific and visualization software. • Some examples: BLAST, MrBayes, iNquiry, WebMO, ADF, GAMESS, TurboMole, CFX, Star-CD, Espresso, IDL, TecPlot, and Totalview. • The CCR website provides a complete listing of application software, as well as compilers and numerical libraries. • http://www.ccr.buffalo.edu/display/WEB/Software • The GNU, INTEL, and PGI compilers are available on the U2 cluster. • Intel Math Kernel Libraries. • MPI (Message Passing Interface) packages are available for each compiler, and network. • MPICH, OpenMPI, MPICH2

  14. Compiling Codes • The GNU compilers are in the default path. • gcc, g77, gfortran • Modules must be loaded to set the paths for the INTEL and PGI compilers. • icc, ifort • pgcc, pgf77, pgf90 • Compiling with the INTEL fortran compiler: • module load intel • ifort -o hello-intel helloworld.f • hello-intel is the executable. • ./hello-intel (runs the code)

  15. Compiling Codes • Codes with mpi directives can be compiled using wrapper scripts. • mpicc, mpif77, mpif90 • Modules must be loaded to set the paths for the the MPI wrapper scripts from the various implementation of MPI. • Compiling with the MPICH and INTEL C compiler: • module load mpich/intel-10.1/ch_p4/1.2.7p1 • mpicc -o cpi-intel cpi.c • cpi-intel is the executable. • MPI launcher is required to run the code: • mpiexec ./cpi-intel • mpirun -machinefile nodefile -np 4 ./cpi-intel

  16. Running on the U2 Cluster • The front-end is used for editing and compiling code, as well as testing application codes. • There is a 30 minute limit. • Jobs are submitted to the PBS (Portable Batch System) scheduler to run on the cluster. • The compute machines are assigned to user jobs by the PBS scheduler. • Users are NOT permitted to use compute nodes outside of the scheduled job.

  17. PBS Execution Model • Users submit jobs to the PBS scheduler on the u2 front-end machine. • PBS executes a login as the user on the master node, and then proceeds according to one of two modes, depending on how the user requested that the job be run. • Script - the user executes the command: qsub [options]job-script • This is a batch job. • Interactive - the user executes the command: qsub [options] –I • Once the job is assigned nodes, the user has a login on the master node.

  18. Execution Model Schematic SCHEDULER pbs_server qsubmyscript No Yes Run? $PBS_NODEFILE prologue epilogue $USER login node1 myscript node2 nodeN

  19. PBS Queues • The PBS queues defined for the U2 cluster are CCR and debug. • The CCR queue is the default queue. Users do not need to specify this queue. • The debug queue can be requested by the user. • Used to test applications. • This queue has a certain number of compute nodes set aside for its use during peak times. • Usually, this queue has 32 compute nodes. • The queue is always available, however it has dedicated nodes Monday through Friday, from 9:00am to 5:00pm. • Use -q debug to specify the debug queue on the u2 cluster.

  20. Batch Scripts - qsub • The qsub command is used to submit jobs to the PBS scheduler. • A full description can be obtained from: • man qsub • qsub --help. • All of the options (except -I) can be specified using directives inside the job-script file. • The “-l” options are used to request resources for a job. • Used in batch scripts and interactive jobs. • -l walltime=01:00:00 • Requests 1 hour wall-clock time limit.

  21. Batch Scripts - Resources • If the job does not complete before this time limit, then it will be terminated by the scheduler. All tasks will be removed from the nodes. • -l nodes=8:ppn=2 • Requests 8 nodes with 2 processors per node. • All the compute nodes in the u2 cluster have 2 processors per node. If you request 1 processor per node, then you may share that node with another job. • -l nodes=8:MEM4GB:ppn=2 • Requests nodes that have at least 4 GB of memory each. • Useful on u2, where there are 32 nodes with 8 GB of memory, and 64 nodes with 4 GB of memory.

  22. PBS Variables • $PBS_O_WORKDIR - directory from which the job was submitted. • By default, a PBS job starts from the user’s $HOME directory. • In practice, many users change directory to the $PBS_O_WORKDIR directory in their scripts. • $PBSTMPDIR - reserved scratch space, local to each host (this is a CCR definition, not part of the PBS package). • This scratch directory is created in /scratch and is unique to the job. • The $PBSTMPDIR is created on every compute node running a particular job. • $PBS_NODEFILE - name of the file containing a list of nodes assigned to the current batch job. • Used to allocate parallel tasks in a cluster environment.

  23. Sample Interactive Job • qsub -I -q debug -lnodes=2:ppn=2 -lwalltime=01:00:00

  24. Sample Script – Cluster • Running jobs in batch on the cluster: • Care must be taken to ensure that the distributed hosts all get a copy of the input file and executable. • Files reside on a nfs mounted directory, such as the home directory, /san/scratch, /ibrix/scratch or a projects directory. • Files are copies to the local scratch on each node. • More commonly used PBS directives. • -m e send email when job ends. • -M user@domain address for email notification. • -j oe join the standard output and error streams (otherwise you get them separately).

  25. Sample Script – Cluster • /util/pbs-scripts/pbsCPI-sample-u2-mpirun

  26. Sample Script – Cluster • Submit pbsCPI-sample-u2-mpirun

  27. Sample Script – Cluster • Submit pbsCPI-sample-u2-mpiexec

  28. Monitoring Jobs • For text-based job inquiry, use the qstat command: • qstat –an -u userid

  29. Monitoring Jobs • jobvis - a GUI for displaying and monitoring of the nodes in a PBS job. • jobvisjobid

  30. Monitoring Jobs • More views of the job with jobvis:

  31. Monitoring Jobs • More views of the job with jobvis:

  32. Manipulating PBS Jobs • qsub - job submission. • qsub myscript • qdel - job deletion. • qdel jobid • qhold - hold job. • qhold jobid • qlrs – release hold on job. • qlrs jobid • qmove - move a job (between servers/queues). • qalter - alter a job (usually resources). • More information: use man.

  33. FAQ PBS FAQ When will my job run? • That depends - the more resources you ask for, the longer that you are likely to wait. • On platforms that run the Maui scheduler (for CCR, that would be all of the current production systems), use the commands showbf to see what resources are available now, and for how long. • Use showq to view a list of running and queued jobs. • This will also display the number of processors active.

  34. FAQ • Example of showbf: • List available nodes: showbf –S • List available Myrinet nodes: showbf –f GM

  35. FAQ • Example of showstart: • Shows estimated start time.

  36. FAQ • Example of showq: • Shows job queue:

  37. FAQ • Example of showq: • Also shows Percentage of Active Processors and Nodes.

More Related