TeraGrid Advanced Scheduling Tools

TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu

Outline • Advance Reservation and Coscheduling • GUR • Metascheduling • MCP • Condor-G Matchmaking • Batch Queue Prediction • QBETS • Karnak • Serial Computing • MyCluster • Condor Glide Ins • Urgent Computing • SPRUCE Metascheduling and Co-Scheduling Tutorial

Advance Reservation • Reserve resources in advance • 128 nodes on Queen Bee at 2pm tomorrow for 3 hours • Reservation request from user handled in an automated manner • User can then submit jobs to those reserved nodes • Typically can submit to it once the reservation is accepted • Variety of uses • Classes or training when quick turnaround is needed • More efficient debugging and tuning • Needs to be supported by the batch scheduler • Capability is available in almost every scheduler • Currently only enabled on Queen Bee Metascheduling and Co-Scheduling Tutorial

Coscheduling • Simultaneous access to resources on two or more systems • Typically implemented using multiple advance reservations • 128 nodes on Queen Bee and 128 nodes on Lonestar at 2pm tomorrow for 3 hours • Depends on cluster schedulers supporting advance reservations • Variety of uses • Visualization of a simulation in progress • Multi-system simulations (e.g. MPIg) • Teaching and training Metascheduling and Co-Scheduling Tutorial

Grid Universal Remote (GUR) • GUR supports both advance reservation and coscheduling • Only TeraGrid-supported tool for this • (Not counting the reservation form on the web site) • Command line program that accepts a description file • Candidate systems • Total number of nodes needed • Total duration • Earliest start and latest end • Tries different configurations within the specified bounds • Client available: Queen Bee, new SDSC system (future) • Reserve nodes at: Queen Bee, Ranger (future), new SDSC system (future) • https://www.teragrid.org/web/user-support/gur Metascheduling and Co-Scheduling Tutorial

Metascheduling • Users have jobs that can run on any of several TeraGrid systems • Help users select where to submit them • Automatically on a per-job basis • Optimize execution of jobs • Manage the execution of the jobs Metascheduling and Co-Scheduling Tutorial

Master Control Program (MCP) • Submits multiple copies of a job to different systems • Once one copy starts, others are cancelled • Command line programs • Specify a submit script for each system a copy will be submitted to • Script expected by the batch scheduler on that system • MCP annotations describing how to access each system • In each submit script • Stored in a configuration file • Client available: Queen Bee • Send jobs to: Abe, Lincoln, Queen Bee, Cobalt, BigRed, NSTG • https://www.teragrid.org/web/user-support/mcp Metascheduling and Co-Scheduling Tutorial

Condor-G Condor atop Globus Globusprovides basic mechanisms Authentication & authorization File transfer Remote job execution & management Condor provides more advanced mechanisms Improved user interface (batch scheduling) User provides a submit script Typical batch scheduling commands condor_status – information about systems available to Condor condor_submit – submit a job condor_q – observe jobs submitted to the Condor install on this system condor_rm – cancel a job Fault tolerance with retries Improves the scalability of Globus v2 job management

Condor-G Matchmaking Condor’s term for selecting a resource for a job A job provides requirements and preferences for a resource it can execute on A resource provides requirements and preferences for jobs that can execute on it Jobs are paired to resources Satisfy all requirements of both job and resource Optimize preferences of job and resource Accessible from: Ranger, Queen Bee, Lonestar, Steele Can match jobs to: Ranger, Abe, Queen Bee, Lonestar, Cobalt, Pople, Big Red, NSTG https://www.teragrid.org/web/user-support/condorg_match

Batch Queue Prediction • Predict how long jobs will wait before they start • Useful information for resource selection • Manually by users • Automatically by tools Metascheduling and Co-Scheduling Tutorial

QBETS • Provides 2 types of predictions • The probability that a hypothetical job will start by a deadline • The amount of time that a job is expected to wait X % of the time • Job described by number of nodes and execution time • Integrated into the TeraGrid User Portal • Downgraded to experimental • Amount of funding provided to developers • Experience with the service • Provides predictions for Ranger, Abe, Queen Bee, Lonestar, Big Red Metascheduling and Co-Scheduling Tutorial

Karnak • Provides queue wait predictions for • Hypothetical jobs • Jobs already queued • Provides current and historical job statistics • Implemented as a REST service • HTTP protocol, various data formats (HTML, XML, text, JSON in progress) • Command line clients • Status is beta • TeraGrid User Portal integration in progress • Provides predictions for Ranger, Abe, Lonestar, Cobalt, Pople, NSTG • Any system that deploys the glue2 CTSS package and publishes job information • http://karnak.teragrid.org Metascheduling and Co-Scheduling Tutorial

Serial Computing • There are some TeraGrid users that have a lot of serial computation to run • One place for them to do that is the Condor pool at Purdue • The Condor pool may not satisfy some requirements • Amount of nodes available • Co-location with large data sets • TeraGrid cluster schedulers are optimized for parallel jobs, not serial jobs • Per-user limits on number of jobs • One job per node (> 1 processing core) • There are a few ways to run many serial jobs on TeraGrid clusters • Different RPs have different opinions about whether their clusters should be used this way • I think this should generally be resolved when allocations are reviewed Metascheduling and Co-Scheduling Tutorial

MyCluster • MyCluster lets a user create a personal cluster • This personal cluster is managed by a user-specified scheduler (e.g. Condor) • Parallel jobs are submitted to gather up nodes • This matches the scheduling strategies of most TeraGrid clusters • These jobs start up scheduler daemons • Scheduler daemons interact with the user’s personal scheduler • User can run serial jobs on the nodes • Via jobs submitted to their personal scheduler • Developer is no longer with TeraGrid so future is uncertain • Installed on Lonestar and Ranger • Can incorporate nodes from any TeraGrid system • https://www.teragrid.org/web/user-support/mycluster Metascheduling and Co-Scheduling Tutorial

Condor Glideins • Similar idea to MyCluster • User runs their own Condor scheduler • User submits parallel jobs to TeraGrid resources that start up Condor daemons • These nodes are then available to the user’s Condor pool • User submits serial jobs to their Condor scheduler • Isn’t officially documented/supported on TeraGrid • Is being used by a few science gateways • See Condor manual for more info: http://www.cs.wisc.edu/condor/manual/v7.5/5_4Glidein.html Metascheduling and Co-Scheduling Tutorial

Urgent Computing • High priority job execution • Elevated priority • Next to run • Preemption • Requested and managed in an automated way • Historically done via a manual process Metascheduling and Co-Scheduling Tutorial

Special PRiority and Urgent Computing Environment (SPRUCE) • Automated setup and execution of urgent jobs • Ahead of time: • Resource is configured to support SPRUCE • Project gets all of their code working well on the resource • Project is provided with tokens that can be used to request urgent access • To run an urgent job • User presents token to the resource • Was used a bit by the LEAD gateway • Not in production on TeraGrid • SPRUCE still installed on several TeraGrid systems • The status of those installs is unknown • SPRUCE project seems somewhat dormant • http://spruce.teragrid.org/index.php Metascheduling and Co-Scheduling Tutorial

Discussion • Any questions about those capabilities and tools? • Have you or any of your users used these capabilities? Any comments for us? • Have users asked for any other scheduling capabilities? Metascheduling and Co-Scheduling Tutorial

TeraGrid Advanced Scheduling Tools