Managing & Using PBS Professional Training Manual for Users & Administrators PBS Professional v12.0 Rev 2
Altair's Focus Simulation, predictive analytics, and optimization, for leveraging high-performance computing and cloud architectures for engineering and business decision-making
Global Presence Lund, Sweden Gothenburg, Sweden Coventry, UK Manchester, UK Stuttgart, Germany Cologne, Germany Hamburg, Germany Hanover, Germany Munich, Germany Lyon, France Paris, FranceSophia Antipolis, France Toulouse, France Torino, Italy Madrid, Spain Moscow, Russia Delhi, India Pune, India Chennai, India Hyderabad, India Bangalore, India KL, Malaysia Seattle, USA Mountain View, USA Los Angeles, USA Austin, USA Denver, USA Mexico City, Mexico Toronto, Canada Detroit, USA Boston, USA Milwaukee, USA Charlotte, USA Huntsville, USA Sao Paulo, Brazil Beijing, China Shanghai, China Tokyo, Japan Osaka, Japan Nagoya, Japan Seoul, Korea Melbourne, Australia Over 40 offices across 16 countries • Technical Support • Email to: firstname.lastname@example.org • Telephone: 248-614-2425 • Hours: 08:00 EST – 18:00 EST M-F • Sales • Email to: email@example.com • Telephone: 248-614-2400
Altair's Brands and Companies Engineering Simulation Platform On-demand Cloud Computing Technology Product Innovation Consulting Industrial Design Technology Business Intelligence & Data Analytics Solutions Solid State Lighting Products
Competitive Difference – Why Altair Wins • Powerful business model with unmatched customer value • Depth and breadth of the overall solution set • Market-defining optimization technologies • Unparalleled performance and measurability • Unique ability to leverage high-end services to drive next generation solutions • Strong global organization that can meet the needs of the most demanding customers worldwide • No other vendor offers the completeness or robustness of the Altair solution set
Introduction to Altair • Founded ... In 1985 as a product designconsulting company • Today ... A global software and technology company focused on enterprise analytics, product development and advanced computing
Chapter One: Understanding PBS Professional • What is PBS Professional? • PBS Professional features • History of PBS Professional • PBS Works online store • PBS Professional documentation • Broad hardware and operating system support • Supported MPI libraries • PBS Professional components & roles
What is PBS Professional? • Workload management solution that maximizes the efficiency and utilization of high-performance computing (HPC) resources and improves job turnaround • Robust Workload Management • Floating licenses • Socket licenses • Scalability, with flexible queues • Job arrays • User and administrator interface • Job suspend/resume • Application checkpoint/restart • Automatic file staging • Accounting logs • Access control lists • Provisioning • Hooks • Advanced Scheduling Algorithms • Resource-based scheduling • Preemptive scheduling • Optimized node sorting • Enhanced job placement • Advance & standing reservations • Cycle harvesting across workstations • Scheduling across multiple complexes • Network topology scheduling • Manages both batch and interactive work • Backfilling • Reliability, Availability and Scalability • Server failover feature • Automatic job recovery • System monitoring • Integration with MPI solutions • Tested to manage 1,000,000+ jobs per day • EAL3+ security • Checkpoint support
History of PBS Professional • 1993-97: Developed for NASA to replace NQS • 2000: Veridian formed commercial version of PBS; • released PBS Professional 5.0 • 2003: Altair acquired PBS Professional technology and engineering; • released PBS Professional 5.3 • 2004: Released PBS Professional 5.4 • 2005: Released PBS Professional 7.0 and 7.1 • 2006: Released PBS Professional 8.0 • 2007: Released PBS Professional 9.0 and 9.1 • 2008: Released PBS Professional 9.2 and 10.0 • 2009: Released PBS Professional 10.1 and 10.2 • 2013: Released PBS Professional 10.4 and 11.0 • 2011: Released PBS Professional 11.1, and 11.2 • 2012: Released PBS Professional 11.3 • 2013: Released PBS Professional 12.0
PBS Works Online Store http://www.pbsworks.com • PBS Works website provides the following: • Online Altair store • Purchase new/additional PBS Works licenses • Generate and view license history • Download suite of PBS Works software • Download suite of PBS Works books • Resource Library • Success stories • Case studies • Webinar recordings • Technical papers • Partner program • Upcoming events • Sales, support and training information • PBS Works forum
PBS Professional Documentation • The following documentation is available for download from the online store without charge: • Administrator's Guide • Guide to configuring and maintaining PBS Professional; for system administrators • User's Guide • Guide to submitting and managing jobs; for users • Reference Guide • Reference guide containing PBS attributes, parameters, formats, states, etc. • Installation and Upgrade Guide • Guide to installing, upgrading, and licensing PBS Professional; for system administrators • Release Notes • Recent information on the latest PBS version
Broad Hardware & Operating System Support • Currently supported platforms supported: • HP-UX 11.23 and later on ia64 • IBM AIX 5.3 with TL9 or later on POWER architectures • IBM AIX 6.x and 7.1 on POWER architectures • Red Hat Enterprise Linux 5 on x86_64 and ia64 • Red Hat Enterprise Linux 6 on x86_64 • CentOS5.4, 5.5, 5.6, and 6.x on x86_64 • SGI Altix with SGI ProPack 6 and 7 on ia64 • SGI ICE/XE with SGI ProPack 6 on x86_64 • SGI ICE/ICE-X/XE/UV/UV2 with SGI Performance Suite 1 on x86_64 • Solaris 10 on SPARC and x86_64 • SuSESLES 10 and 11 on x86_64 and ia64 • Windows 7 on x86_64 • Windows XP Professional, SP1 and later, on x86_64 • Windows Server 2003 on x86_64 • Windows Server 2008 on x86_64 • Windows Server 2008 R2 on x86_64 • Windows Vista on x86_64 • Windows 8 on x86_64 • Windows Server 2013 on x86_64 • Cray Linux Environment (CLE) 3.1, 4.0, 4.1
Supported MPI Libraries • Currently supported MPI libraries integrated with PBS: • MPICH 1.2.5, 1.2.6, 1.2.7 on Linux • MPICH2 1.0.3, 1.0.5, 1.0.7 on Linux • MPICH-GM on Linux • Intel MPI 2.0.22, 3, and 4 on Linux • IBM POE on AIX 5.x, and 6.x , including HPS support • HP MPI 1.08.03 on HP-UX 11 on Itanium 2 • HP MPI 2.0.0 on Linux • Platform MPI 8.0 on Linux • LAM/MPI 6.5.9, 7.0.6, 7.1.1 on Linux • SGI MPI (MPT) on Linux on SGI platforms, including over InfiniBand • MVAPICH 1.2.7 on Linux • OpenMPI
PBS Professional Components & Roles • PBS server • Central focus for a PBS complex • Routes job to compute host * • Processes PBS commands * • Provides central batch services * • Server maintains its own server and queue settings * • Daemon executes as pbs_server.bin • PBS MoM (machine-oriented miniserver) • Executes jobs at request of PBS scheduler • Monitors resource usage of running jobs • Enforces resource limits on jobs • Reports system resource limits, configuration * • Daemon executes aspbs_mom execution host execution host batch management system server host qstat qsub qdel pbs_mom Users execution host pbs_mom pbs_server … pbs_sched pbs_mom • PBS scheduler • Queries list of running and queued jobs from the PBS server * • Queries queue, server, and node properties * • Queries resource consumption and availability from each PBS MoM • Sorts available jobs according to local scheduling policies • Determines which job is eligible to run next • Daemon executes as pbs_sched
Chapter Two: Installation of PBS Professional • Pre-installation planning • Basic installation • After installation • PBS installed directory structure
Pre-Installation Planning: Prerequisites • All PBS daemons and commands should be the same version (major and minor) • File transfer • Admin configures remote file transfer mechanism • Admin specifies where to use local file transfer mechanism • File transfer must be accomplished by password-less authentication • PBS should be able to copy each job's .o and .e file to the path of where the job was submitted or specified at submission • Must have a valid user account that is available on all execution hosts • All system clocks should be sync’d • A PBS data service account must be created • Default: pbsdata • Note: For more complete pre-installation planning requirements, see Installation and Upgrade Guide v12.0 Chapter 2
Pre-Installation Planning: Complex Configurations • Single execution system Server MoM Scheduler All 3 PBS components on a single host
Pre-Installation Planning: Complex Configurations, cont. • Multiple execution system MoM Server Front End System MoM Scheduler MoM Note: PBS server host can be a different architecture (UNIX/LINUX) from the execution hosts. A PBS complex can be either UNIX/Linux or Windows, but not both. Server, scheduler, and all the MoMs must be the same PBS version
Pre-Installation Planning: Altair LM-X License Manager • PBS Professional 12.0 is licensed via Altair License Management System (ALM) based on X-Formation's LM-X license management system • PBS Professional is licensed in the following ways: • Socket licenses to license hosts • Floating CPU licenses • Altair's ALM package for PBS can be downloaded from: https://secure.altair.com/UserArea/Software • If not using socket licenses for all hosts, we strongly recommend that Altair's ALM be installed and configured before installing PBS Professional v12.0 • For additional information on Altair's ALM, refer to the Altair License Manager System 11.0.2 Installation and Operations Guide
Basic Installation: Downloading PBS Pro Install Image • Obtaining the necessary PBS software package • Log into the “Client Login” area of: https://secure.altair.com/UserArea/ • User Name: PBSTrain2013 • Password: train2013 • Once logged in, click on “Download Software” on the right • Your instructor will indicate the appropriate binary to download
Basic Installation: Install Process, cont. PBS needs to have a private directory (referred to as "PBS_HOME" in the documentation) where it can permanently store information. Please enter the full path for the PBS_HOME location you would like or press enter to accept the default. Home directory? [/var/spool/PBS] • The second sequence is asking for the path and name of the directory where the PBS home directory will be Installed. • The default value is “/var/spool/PBS” • If you want to use the default value, press ENTER. Otherwise enter the path and name of the directory and then hit ENTER.
Basic Installation: Install Process, cont. You now need to decide what kind of PBS installation you want for this machine. There are three possibilities: a server node, an execution node, or a client host. If you are going to run PBS on a single timesharing host, install the server package. If you are going to have a cluster of machines, you need to pick one to be the front end and install the server package there. Then install the execution package on all the other nodes in the cluster. The client package is for a host which will not be used for execution but still has access to PBS. It contains the commands, the GUI and man pages. This gives the ability to submit jobs and check status. PBS Installation: 1. Server, execution and commands 2. Execution only 3. Commands only (1|2|3)?1 Enter “1” Option 1: This will install all 3 components of PBS: server, scheduler, and MoM Option 2: This will install the MoM on an execution host Option 3: This will be a command-only submission host
Basic Installation: Install Process, cont. PBS Professional version 9.0 and later is licensed via the Altair License Manager. The Altair License Manager can be downloaded from: http://www.pbspro.com/UserArea/Software/ For more information, please refer to the PBS Professional Administrator's Guide, or contact firstname.lastname@example.org. Continue with the installation ([y]|n)?y Please enter the list of Altair License file location(s) in a colon-separated list of entries in any of the following form: <port>@<host> @<host> <port>@<IP address> Examples: 6200@fest 6200@tokyo:6200@madrid:6200@rio @perikles:6200@aspasia email@example.com /usr/local/altair/security/altair_lic.dat Enter License File Location(s):6200@trainta01 Enter “y” For this training class type in: 6200@trainta01
Basic Installation: Install Process, cont. Installing PBS for a Server Host. Initial installation of release 126.96.36.199184 complete *** PBS Installation Summary *** *** Found new /etc/pbs.conf.188.8.131.52184 *** Replacing /etc/pbs.conf with /etc/pbs.conf.184.108.40.206184 *** Creating new symbolic link /opt/pbs/default to /opt/pbs/220.127.116.11184 *** *** Copying startup script. *** *** End of /opt/pbs/18.104.22.168184/etc/pbs_postinstall Would you like to start PBS now (y|[n])?y • This sequence is asking whether you would like to start the PBS daemons. Type “y” and hit ENTER.
Basic Installation: Install Process, cont. /etc/init.d/pbs Starting PBS PBS Home directory /var/spool/PBS needs updating. Running /opt/pbs/default/etc/pbs_habitat to update it. *** *** PBS_HOME is /var/spool/PBS *** Setting TZ from /etc/sysconfig/clock *** Creating new file /var/spool/PBS/pbs_environment *** The PBS Server has been installed in /opt/pbs/default/sbin. *** *** Setting default queue and resource limits. *** Connecting to PBS dataservice...connected to PBS firstname.lastname@example.org *** Setting license file location(s). *** *** The PBS commands have been installed in /opt/pbs/default/bin. *** *** PBS Mom has been installed in /opt/pbs/default/sbin. *** *** The PBS Scheduler has been installed in /opt/pbs/default/sbin. *** *** End of /opt/pbs/default/etc/pbs_habitat Home directory /var/spool/PBS updated. PBS mom PBS sched Connecting to PBS dataservice...connected to PBS email@example.com Using license server at 6200@trainta01 PBS server PBS started Installation of release 22.214.171.124184 complete
Post Installation: Status of PBS Complex • Use qstat -Bfto view the status of a PBS complex Server: trainta01.prog.altair.com server_state = Active server_host = trainta01.prog.altair.com scheduling = True total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun :0 default_queue = workq log_events = 511 mail_from = adm query_other_jobs = True resources_default.ncpus = 1 default_chunk.ncpus = 1 scheduler_iteration = 600 FLicenses = 0 resv_enable = True node_fail_requeue = 310 max_array_size = 10000 pbs_license_info = 6200@trainta01 pbs_license_min = 1 pbs_license_max = 2147483647 pbs_license_linger_time = 31536000 license_count = Avail_Global:0 Avail_Local:0 Used:0 High_Use:0 Avail_Socket s:0 Unused_Sockets:0 pbs_version = PBSPro_126.96.36.199184 eligible_time_enable = False max_concurrent_provision = 5
PBS Installed Directory Structure • PBS Professional software is installed in two separate directories: • $PBS_EXEC /opt/pbs/default Contains: • PBS daemons • Libraries • Man pages • Support tools • Administrator and user PBS commands • Python distribution • $PBS_HOME /var/spool/PBS Contains: • PBS daemon configuration information • PBS daemon logs • Various directories for PBS files
PBS Directory Structure: PBS_HOME • Directory structure of $PBS_HOME * PBS_HOME server_priv mom_priv sched_priv PBS configuration directories server_logs mom_logs sched_logs PBS log directories spool undelivered checkpoint aux pbs_environment pbs_version datastore Miscellaneous directories/files * This information is for debugging purposes only. It may change in future releases.
bin sbin PBS_EXEC Binaries of PBS daemons and user/administrator PBS commands lib man include etc tcltk unsupported python pgsql Libraries, manual pages, and header files PBS Directory Structure: PBS_EXEC • Directory structure of $PBS_EXEC * * This information is for debugging purposes only. It may change in future releases.
Chapter Three: Job Management • Defining a job script • Types of jobs • Submitting jobs • Managing PBS jobs • Setting job attributes • Requesting job resources • Default job resources
Defining a Job Script • What is a job script? • A file that contains a set of instructions to execute a series of commands. This is also known as a “batch job”. Example of a job script: #!/bin/bash sleep 5 /home/altair/scripts/optistruct –cpu 2 handlebar.fem Shell interpreter Commands
Types of Jobs • There are two types of PBS jobs • Batch Job • A script that contains commands or tasks to execute site-specific applications • Interactive Job • Runs like a batch job, but when it runs, the user's terminal input and output are connected to the execution host; similar to a login session • Allows users to debug a job script • Can be used to verify that a new application runs properly Usage: qsub –I <job attributes/resources> For X forwarding support: Usage: qsub –I –X <job attributes/resources> • Note: Interactive jobs are not supported on Windows, and cannot be used with PBS job arrays
Submitting Jobs: Using the qsub Command • Submitting a batch job script to PBS • Using the qsub command Usage: qsub <job attributes/resources> <job script> Example: qsub –l select=1:ncpus=1 test_script • If the job is accepted by PBS, a job identifier is returned. This job identifier is comprised of the job number and the PBS server's host name: 0.trainta01 • Note: If a job is rejected it does not return a job identifier, but it increments the job ID. • Largest possible job ID is 7 digits: 9,999,999. Once reached it resets to zero.
Managing Jobs: Querying Jobs Using qstat • To show status of list of current PBS jobs: • Using qstatcommand Usage: qstat <-a, -n, -s, -1, -w, -r, -i> Example: qstat Job id Name User Time Use S Queue ---------------- ---------------- ----------- -------- - ----- 6.trainta01 test_script pbsuser01 00:00:00 R workq 7.trainta01 jobA pbsuser02 00:00:00 R workq 8.trainta01 test_2 pbsuser04 0 Q workq 9.trainta01 test_script pbsuser01 00:00:00 R workq Note: If a job was deleted or completed then it can no longer be listed via qstatunless the PBS complex has enabled the job history functionality For more options see User's Guide v12.0 Chapter 7 page 169
Managing Jobs: Additional qstatOptions -a Job name, session ID, # nodes req’d, #ncpus req’d, req’d mem, req’d time, elapsed time Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 8.trainta01 pbsuser0 workq test_scrip 6556 1 8 -- -- R 00:07 -s Same as option –a, but with comments Req'd Req'dElap Job ID Username Queue JobnameSessID NDS TSK Memory Time S Time -------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 8.trainta01 pbsuser0 workqtest_scrip 5556 1 8 -- -- R 00:07 Job run at Wed Jul 05 at 14:48 on (trainta01:ncpus=8) -n Same as option –a, but indicates which vnode(s) the job is running on Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 8.trainta01 pbsuser0 workq test_scrip 5556 1 8 -- -- R 00:07 trainta01/0 Note: Using the “-1” option outputs each entry on a single line instead of wrapping around Using the “-w” displays the full output of individual fields
Managing Jobs: Finished Job History • To view only jobs that have been deleted, moved, or finished: • qstat -H • To view all jobs, regardless of state type: • qstat -x Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 80.trainta01 sleep5 pbsuser01 00:00:00 F workq 81.trainta01 sleep5 pbsuser01 00:00:00 F workq 82.trainta01 sleep5 pbsuser01 00:00:00 F workq 83.trainta01 sleep5 pbsuser01 00:00:00 F workq Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 80.trainta01 sleep5 pbsuser01 00:00:00 F workq 81.trainta01 sleep5 pbsuser01 00:00:00 F workq 82.trainta01 sleep5 pbsuser01 00:00:00 F workq 83.trainta01 sleep5 pbsuser01 00:00:00 F workq 84.trainta01 sleep5 pbsuser01 0 Q workq 85.trainta01 sleep5 pbsuser01 00:00:00 R workq Note: The PBS server attribute job_history_enable must be set in order to use this option
Submitting Jobs: Using a “Here” Document • Users can create their own job script within the qsub command. This is known as a “here” document. The user's input is taken and submitted as a job script. How a “here” document works: • Type qsuband hit enter • Enter the job information commands • Once the input is completed, do a <CTRL-d>to accept the input and submit it to PBS; to terminate, use <CTRL-C> To request resources, include them in the qsub line or as a PBS directive Example: qsub –l select=xxx or qsub <return> #PBS –l select=xxx Note: The job name defaults to “STDIN”
Submitting Jobs: Alternate Methods • Passing arguments to qsub on command line: Usage: echo “script” | qsub <job submission options> • Specifying job executable to qsub command: Example: qsub -- /opt/altair/scripts/optistruct –input_file /homes/pbsuser01/hat.fem
Setting Job Attributes: Job Submission Options • Users can specify various job submission options • Job name, output/error file/location, or queue destination Usage: qsub <flag> <value> <job script> Example: qsub –N modelAtest_script -N <job name> specifies the job name. If no job name is specified, the script name is used as the job name.
Job Attributes: Viewing Job Attributes • To view job attributes for a particular job, use the qstatcommand Usage: qstat –f <job_id> Example: qstat –f 0.trainta01 Job Id: 0.trainta01.prog.altair.com Job_Name = test Job_Owner = firstname.lastname@example.org resources_used.cpupercent = 0 resources_used.cput = 00:00:00 resources_used.mem = 2956kb resources_used.ncpus = 1 resources_used.vmem = 188712kb resources_used.walltime = 01:04:27 job_state = R queue = workq server = trainta01.prog.altair.com Checkpoint = u ctime = Mon Jan 14 12:34:13 2013 Error_Path = trainta01.prog.altair.com:/home/manish/test.e0 exec_host = trainta01/0 exec_vnode = (trainta01:ncpus=1) Hold_Types = n Join_Path = n Keep_Files = n
Job Attributes: Viewing Job Attributes, cont. Mail_Points = a mtime = Mon Jan 14 12:34:13 2013 Output_Path = trainta01.prog.altair.com:/home/manish/test.o0 Priority = 0 qtime = Mon Jan 14 12:34:13 2013 Rerunable = True Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.place = pack Resource_List.select = 1:ncpus=1 stime = Mon Jan 14 12:34:13 2013 session_id = 32144 jobdir = /home/manish substate = 42 Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash, PBS_O_HOME=/home/manish,PBS_O_LOGNAME=manish, PBS_O_WORKDIR=/home/manish,PBS_O_LANG=en_US.UTF-8, PBS_O_PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/opt/pbs/def ault/bin:/opt/pbs/default/sbin:/home/manish/bin, PBS_O_MAIL=/var/spool/mail/manish,PBS_O_QUEUE=workq, PBS_O_HOST=trainta01.prog.altair.com comment = Job run at Mon Jan 14 at 12:34 on (trainta01:ncpus=1) etime = Mon Jan 14 12:34:13 2013 Submit_arguments = test project = _pbs_project_default • Note: Running qstatas root or PBS Manager outputs additional information
Setting Job Attributes: Using PBS Directives • Job attributes can be set in two different ways: • Method 1: on the qsub command line qsub –N <job_name> <job_script> • Method 2: in a job script as a PBS directive #!/bin/bash #PBS –N test_run_01 #PBS –l select=4:ncpus=4:mem=16GB #PBS –l place=scatter #PBS -j oe #PBS –o /home/pbsuser01/OUTPUTS optistruct –ncpu 2 handlebar.fem Note: PBS expects the directives to begin on the second line, and be on consecutive lines thereafter. Once started, the interpreter stops processing directives at the first line that contains an executable line. It ignores comment lines. Command line arguments override PBS directives.
Requesting Job Resources: Understanding Resources • What are job resources? • Applications sometimes need certain types and amounts of system resources such as: • memory • ncpus • scratch space • During job submission, required resources can be requested • How can these resources be requested within PBS? • PBS defines these resources as chunks or as job-wide resources • What are “job-wide resources”? • Resources that are associated with the entire job • For example: cput, walltime • What are “chunks”? • Set of resources that are allocated as a unit to a job • Smallest set of resources that are allocated to a job • For example: ncpus, mem • Requested in a “select” statement • qsub –l select=<#>:ncpus=<#>:mem=<#>
Requesting Job Resources: Using Chunks & Select • Requesting resources in chunks • Resources which are to be allocated as a unit to a job • Smallest set of resources to be allocated to a single job • Host-/vnode-level request Syntax: qsub –l select=[ N: ] chunk[ + [N:] chunk….] For example: • Job request: 3 chunks with 2 CPUs per chunk: qsub –l select=3:ncpus=2 • Job request: 2 chunks with 1 CPU each and 10GB each and another set of 3 chunks with 2 CPUs each and 8GB each of memory qsub –l select=2:ncpus=1:mem=10gb+3:ncpus=2:mem=8gb
Requesting Job Resources: Job Placement • Placing jobs on hosts/vnodes • Users can specify how their multi-vnode job is placed in a PBS complex based on the resources requested • Place statement controls how the job is placed on the hosts/vnodes from which resources may be allocated for the job • Using the “place” statement: Usage: qsub –l place= <type>| <sharing> | group=<res> Example: qsub –l select=3:ncpus=2:mem=64GB –l place=pack myscript