Outline 1. Moab Overview 2. Deployment 3. Diagnostics and Troubleshooting 4. Integration 5. Scheduling Behaviour 6. Resource Access 7. Grid Computing 8. Accounting 9. Transitioning from LCRM 10. End Users
1. Moab Introduction • Overview of the Modern Cluster • Cluster Evolution • Cluster Productivity Losses • Moab Workload Manager Architecture • What Moab Does • What Moab Does Not Do
Users Cluster Stack / Framework: Grid Workload Manager: Scheduler, Policy Manager, Integration Platform Cluster Workload Manager: Scheduler, Policy Manager, Integration Platform Application Resource Manager Portal Application Parallel Serial Security GUI Message Passing CLI Operating System Hardware (Cluster or SMP) Admin
Resource Manager (RM) • While other systems may have more strict interpretations of a resource manager and its responsibilities, Moab's multi-resource manager support allows a much more liberal interpretation. • In essence, any object which can provide environmental information and environmental control can be utilized as a resource manager. • Moab is able to aggregate information from multiple unrelated sources into a larger more complete world view of the cluster which includes all the information and control found within a standard resource manager such as TORQUE including: • Node • Job • Queue management services.
MOAB MOAB Resource Manager License Manager Identity Manager Allocation Manager Resource Manager Remote Site Myrinet The Evolved Cluster Admin Job Queue User Compute Nodes
What Moab Does • Optimizes Resource Utilization with Intelligent Scheduling and Advanced Reservations • Unifies Cluster Management across Varied Resources and Services • Dynamically Adjusts Workload to Enforce Policies and Service Level Agreements • Automates Diagnosis and Failure Response
What Moab Does Not Do • Does not does do resource management (usually) • Does not install the system (usually) • Not a storage manager • Not a license manager • Does not do message passing
2. Deployment • Installation • Configuration • Testing
Moab Workload Manager Installation > tar -xzvf moab-4.5.0p0.linux.tar.gz > cd moab-4.5.4 > ./configure > make • You only install Moab Workload Manager on the head node. • When you are ready to use Moab in production, you may install it into the install directory you have configured using make install. • Workload Manager must be running before Cluster Manager and Access Portal will work. • You can choose to install client commands on a remote system as well.
File Locations • $(MOABHOMEDIR) • moab.cfg (general config file containing information required by both the Moab server and user interface clients) • moab-private.cfg (config file containing private information required by the Moab server only) • .moab.ck (Moab checkpoint file) • .moab.pid (Moab 'lock' file to prevent multiple instances) • log(directory for Moab log files - REQUIRED BY DEFAULT) • moab.log (Moab log file) • moab.log.1 (previous 'rolled' Moab log file) • stats(directory for Moab statistics files - REQUIRED BY DEFAULT) • Moab stats files (in format 'stats.<YYYY>_<MM>_<DD>') • Moab fairshare data files (in format 'FS.<EPOCHTIME>') • tools (directory for local tools called by Moab - OPTIONAL BY DEFAULT) • traces (directory for Moab simulation trace files - REQUIRED FOR SIMULATIONS) • resource.trace1 (sample resource trace file) • workload.trace1 (sample workload trace file)
spool (directory for temporary Moab files - REQUIRED FOR ADVANCED FEATURES) • contrib (directory containing contributed code in the areas of GUI's, algorithms, policies, etc) • $(MOABINSTDIR) • bin (directory for installed Moab executables) • moab (Moab scheduler executable) • mclient (Moab user interface client executable) • /etc/moab.cfg (optional file. This file is used to override default '$(MOABHOMEDIR)' settings. It should contain the string 'MOABHOMEDIR $(DIRECTORY)' to override the 'built-in' $(MOABHOMEDIR)' setting.
Initial Configuration – moab.cfg • moab.cfg contains the parameters and settings for Moab Workload Manager. This is where you will set most of the policy settings. Example of what moab.cfg will look like after installation: ##moab.cfg SCHEDCFG[Moab] SERVER=test.icluster.org:4255 ADMINCFG USERS=root RMCFG[base] TYPE=PBS
Supported Platforms/Environments • Resource Managers • TORQUE, OpenPBS, PBSPro, LSF, Loadleveler, SLURM, BProc, clubMASK, S3, WIKI • Operating Systems • RedHat, SUSE, Fedora, Debian, FreeBSD, (+ all known variants of Linux), AIX, IRIX, HP-UX, OS/X, OSF/Tru-64, SunOS, Solaris, (+ all known variants of UNIX) • Hardware • Intel x86, Intel IA-32, Intel IA-64, AMD x86, AMD Opteron, SGI Altix, HP, IBM SP, IBM x-Series, IBM p-Series, IBM i-Series, Mac G4 and G5
Basic Parameters • SCHEDCFG • Specifies how the Moab server will execute and communicate with client requests. • Example: SCHEDCFG[orion] SERVER=cw.psu.edu • ADMINCFG • Moab provides role-based security enabled by way of multiple levels of admin access. • Example: The following may be used to enable users greg amd thomas as level 1 admins: • ADMINCFG USERS=greg,thomas NOTE: Moab may only be launched by the primary admin user id. • RMCFG • In order for Moab to properly interact with a resource manager, the interface to this resource manager must be defined. • For example: To interface to a TORQUE resource manager, the following may be used: • RMCFG[torque1] TYPE=pbs
Scheduling Modes - Configure modes in moab.cfg • Simulation Mode • Allows a test drive of the scheduler. You can evaluate how various policies can improve the current performance on a stable production system. • Test Mode • Test mode allows evaluation of new Moab releases, configurations, and policies in a risk-free manner. the test-mode Moab behaves identical to a live or normal mode except the ability to start, cancel, or modify jobs. • Normal Mode • Live (after installation, automatically set this way) • Interactive Mode • Like test mode but instead of disabling all resource and job control functions, Moab sends the desired change request to the screen and asks for permission to complete it.
Testing New Policies • Verifying Correct Specification of New Policies • If manually editing the moab.cfg file, use the mdiag –C command • Moab Cluster Manager automatically verifies proper policy specification • Verifying Correct Behavior of New Policies • Put in INTERACTIVE Mode to ensure you want to make each change • Determining Long Term Impact of New Policies • Put in SIMULATION Mode
Moab 'Side-by-Side‘ • Allows a production cluster or other resource to be logically partitioned along resource and workload boundaries and allows different instances of Moab to schedule different partitions. • Use parameters: IGNORENODES, IGNORECLASSES, IGNOREUSERS ##moab.cfg for production partition SCHEDCFG[prod] MODE=NORMAL SERVER=orion.cxz.com:42020 RMCFG[TORQUE] TYPE=PBS IGNORENODES node61,node62,node63,node64 IGNOREUSERS gridtest1,gridtest2 ##moab.cfg for test partition SCHEDCFG[prod] MODE=NORMAL SERVER=orion.cxz.com:42020 RMCFG[TORQUE] TYPE=PBS IGNORENODES !node61,node62,node63,node64 IGNOREUSERS !gridtest1,gridtest2
Simulation • What is the impact of additional hardware on cluster utilization? • What delays to key projects can be expected with the addition of new users? • How will new prioritization weights alter cycle distribution among existing workload? • What total loss of compute resources will result from introducing a maintenance downtime? • Are the benefits of cycle stealing from non-dedicated desktop systems worth the effort? • How much will anticipated grid workload delay the average wait time of local jobs?
Scheduling Iterations • Update State Information • Refresh Reservations • Schedule Reserved Jobs • Schedule Priority Jobs • Backfill Jobs • Update Statistics • Handle User Requests • Perform Next Scheduling Cycle
Job Flow • Determine Basic Job Feasibility • Prioritize Jobs • Enforce Configured Throttling Policies • Determine Resource Availability • Allocate Resources to Job • Launch Job
Scheduling Objects • Moab functions by manipulating five primary, elementary objects: • Jobs • Nodes • Reservations • Policies
Jobs • Job information is provided to the Moab scheduler from a resource manager • (Such as Loadleveler, PBS, Wiki, or LSF) • Job attributes include ownership of the: • Job • Job state • Amount • Type of resources required by the job • Wallclock limit • A job consists of one or more requirements each of which requests a number of resources of a given type.
Nodes • Within Moab, a node is a collection of resources with a particular set of associated attributes. • A node is defined as one or more CPU's, together with associated memory, and possibly other compute resources such as local disk, swap, network adapters, software licenses, etc.
Advance Reservations • An object which dedicates a block of specific resources for a particular use. • Each reservation consists of a list of resources, an access control list, and a time range for which this access control list will be enforced. • The reservation prevents the listed resources from being used in a way not described by the access control list during the time range specified.
Resource Managers • Moab can be configured to manage more than one resource manager simultaneously, even resource managers of different types. • Moab aggregates information from the RMs to fully manage workload, resources, and cluster policies
3 Troubleshooting and Diagnostics • Object Messages • Diagnostic Commands • Admin Notification • Logging • Tracking System Failures • Checkpointing • Debuggers http://www.clusterresources.com/products/mwm/moabdocs/14.0troubleshootingandsysmaintenance.shtml
http://www.clusterresources.com/products/mwm/moabdocs/commands/mschedctl.shtmlhttp://www.clusterresources.com/products/mwm/moabdocs/commands/mschedctl.shtml http://www.clusterresources.com/products/mwm/moabdocs/14.3messagebuffer.shtml Object Messages • Messages can hold information regarding failures and key events • Messages possess event time, owner, expiration time, and event count information • Resource managers and peer services can attach messages to objects • Admins can attach messages • Multiple messages per object are supported • Messages are persistent
Diagnostics • Moab’s diagnostic commands present detailed state information • Scheduling problems • Summarize performance • Evaluate current operation reporting on any unexpected or potentially erroneous conditions • Where possible correct detected problems if desired
http://www.clusterresources.com/products/mwm/moabdocs/commands/mdiag.shtmlhttp://www.clusterresources.com/products/mwm/moabdocs/commands/mdiag.shtml mdiag • Displays object state/health • Displays object configuration • Attributes, resources, policies • Displays object history and performance • Displays object failures and messages
http://www.clusterresources.com/products/mwm/moabdocs/commands/mdiag.shtmlhttp://www.clusterresources.com/products/mwm/moabdocs/commands/mdiag.shtml mdiag usage • Most common diagnostics • Scheduler (mdiag –S) • Jobs (mdiag –j) • Nodes (mdiag –n) • Resource manager (mdiag –R) • Blocked jobs (mdiag –b) • Configuration (mdiag –C) • Other diagnostics • Fairshare, Priority • Users, Accounts, Classes • Reservations, QoS, etc
mdiag details • Performs numerous internal health and consistency checks • Race conditions, object configuration inconsistencies, possible external failures • Not just for failures • Provides status, config, and current performance • Enables moab as an information service --flags=xml
Job Troubleshooting To determine why a particular job will not start, there are several commands which can be helpful: • checkjob -v • Checkjob will evaluate the ability of a job to start immediately. Tests include resource access, node state, job constraints (ie, startdate, taskspernode, QOS, etc). Additionally, command line flags may be specified to provide further information. • -l <POLICYLEVEL> // evaluate impact of throttling policies on job feasibility -n <NODENAME> // evaluate resource access on specific node -r <RESERVATION_LIST> // evaluate access to specified reservations • checknode • Display detailed status of node • mdiag -b • Display various reasons job is considered 'blocked' or 'non-queued'. • mdiag -j • Display high level summary of job attributes and perform sanity check on job attributes/state. • showbf -v • Determine general resource availability subject to specified constraints.
Other Diagnostics • checkjob and checknode commands • Why a job cannot start • Which nodes can be availableinformation regarding the recent events impacting current job • Nodes state
Issues with Client Commands • Utilize built in moab logging • showq --loglevel=9 Or • Check the moab log files
http://www.clusterresources.com/products/mwm/moabdocs/a.fparameters.shtml#eventrecordlisthttp://www.clusterresources.com/products/mwm/moabdocs/a.fparameters.shtml#eventrecordlist http://www.clusterresources.com/products/mwm/moabdocs/14.2logging.shtml http://www.clusterresources.com/products/mwm/moabdocs/a.fparameters.shtml#usesyslog Logging Facilities • Moab Log • Report detailed scheduler actions, configuration, events, failures, etc • Event Log • Report scheduler, job, node, and reservation events and failures • Syslog • USESYSLOG # stats/events.Wed_Aug_24_2005 1124979598rm base RMUP initialized 1124979598sched Moab SCHEDSTART - 1124982013node node017 GEVENT CPU2 Down 1124989457node node135 GEVENT /var/tmp Full 1124996230node node139 GEVENT /home Full 1125013524node node407 GEVENT Transient Power Supply Failure
LoggingBasics • LOGDIR - Indicates directory for log files • LOGFILE - Indicates path name of log file • LOGFILEMAXSIZE - Indicates maximum size of log file before rolling • LOGFILEROLLDEPTH - Indicates maximum number of log files to maintain \ • LOGLEVEL - Indicates verbosity of logging
Function Level Information • In source and debug releases, each subroutine is logged, along with all printable parameters. ##moab.log MPolicyCheck(orion.322,2,Reason)
Status Information • Information about internal status is logged at all LOGLEVELs. Critical internal status is indicated at low LOGLEVELs while less critical and more vebose status information is logged at higher LOGLEVELs. ##moab.log INFO: job orion.4228 rejected (max user jobs) INFO: job fr4n01.923.0 rejected (maxjobperuser policy failure)
Scheduler Warnings • Warnings are logged when the scheduler detects an unexpected value or receives an unexpected result from a system call or subroutine. ##moab.log WARNING: cannot open fairshare data file '/opt/moab/stats/FS.87000'
Scheduler Alerts • Alerts are logged when the scheduler detects events of an unexpected nature which may indicate problems in other systems or in objects. ##moab.log ALERT: job orion.72 cannot run. deferring job for 360 Seconds
Scheduler Errors • Errors are logged when the scheduler detects problems of a nature of which impact the scheduler's ability to properly schedule the cluster. ##moab.log ERROR: cannot connect to Loadleveler API
Searching Moab Logs • While major failures will be reported via the mdiag -S command, these failures can also be uncovered by searching the logs using the grep command as in the following: > grep -E "WARNING|ALERT|ERROR" moab.log
Event Logs • Major events are reported to both the Moab log file as well as the Moab event log. By default, the event log is maintained in the statistics directory and rolls on a daily basis, using the naming convention: • events.WWW_MMM_DD_YYYY (e.g. events.Fri_Aug_19_2005) ##event log format <EPOCHTIME> <OBJECT> <OBJECTID> <EVENT> <DETAILS>
Enabling Syslog • In addition to the log file, the Moab Scheduler can report events it determines to be critical to the UNIX syslog facility via the daemon facility using priorities ranging from INFO to ERROR. • The verbosity of this logging is not affected by the LOGLEVEL parameter. In addition to errors and critical events, user commands that affect the state of the jobs, nodes, or the scheduler may also be logged to syslog. • Moab syslog messages are reported using the INFO, NOTICE, and ERR syslog priorities.
Tracking System Failures • The scheduler has a number of dependencies which may cause failures if not satisfied. • Disk Space • The scheduler utilizes a number of files. If the file system is full or otherwise inaccessible, the following behaviors might be noted:
Checkpointing • Moab checkpoints its internal state. The checkpoint file records statistics and attributes for jobs, nodes, reservations, users, groups, classes, and almost every other scheduling object. • CHECKPOINTEXPIRATIONTIME - Indicates how long unmodified data should be kept after the associated object has disappeared. ie, job priority for a job no longer detected. • FORMAT - [[[DD:]HH:]MM:]SS • EXAMPLE - CHECKPOINTEXPIRATIONTIME 1:00:00:00 • CHECKPOINTFILE - Indicates path name of checkpoint file • FORMAT - <STRING> • EXAMPLE - CHECKPOINTFILE /var/adm/moab/moab.ck • CHECKPOINTINTERVAL - Indicates interval between subsequent checkpoints. • FORMAT - [[[DD:]HH:]MM:]SS • EXAMPLE - CHECKPOINTINTERVAL 00:15:00 moab.cfg: