SWARM: Large-Scale Job Scheduling for HPC Clusters

{swarm} :Scheduling Large-scale Jobs over Loosely-Coupled HPC Clusters Sangmi Lee Pallickara, and Marlon Pierce Indiana University

Distributed HPC clusters How to access the clusters? How to submit jobs remotely? How to decide where to submit? How to track submitted jobs? How to detect faults? Desktop users Web portals Scientific Gateways

Motivation1: mRNA Sequence Clustering and Assembly pipeline • Sequence Assembly: Deriving consensus sequences (contigs) from individual overlapping DNA fragments. • Expressed Sequence Tag(EST) sequencing : assemble fragments of messenger RNAs • Stage 1: data preprocess: serial job • Stage 2: clustering mRNA fragments: large scale parallel job • Stage 4: assemble fragments within each cluster: large number of parallel or serial jobs

Motivation 2: BioDrugScore Project Structure Selection Ligands are chosen via search results limited by user provided property ranges • SamyMerough: IU School of Medicine • Application for computational drug design • Docking of a large number of compounds to a binding pocket, followed by ranking of the ensuing complexes and the selection of top candidates. • Customizing scoring functions for each pocket within these large numbers of receptors to rank molecules and predict efficacy and potential toxicity due to off-target effects early in the discovery process Parameter Selection Parameters on which to derive the function are chosen via check boxes Parameter Deviation Function is displayed with numerical and graphical details. Validation Stage Function validation results are displayed Calculate Terms TeraGrid calculates energy and entropy terms via AMBER

Existing and On-going activities • Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) at SC08 • BoF on Megajobs: How to Run One Million Jobs at SC08

Computational Challenges • Dealing with policies of different HPC clusters: limits on concurrent jobs in batch queue, maximum WallClockTime • Dealing with different clusters: hardware, software, and maintenance issues • Dealing with different jobs: parallel jobs, serial jobs • Tracking the large number of jobs • Data management for input, output and temporary files including log/error message while executing large number of jobs. • Managing faults: Detecting, Reacting, Recovering and Preventing

SWARM at a glance Distributed HPC clusters Desktop users Swarm Infrastructure Web portals • Schedule millions of jobs over distributed clusters • A monitoring framework for large scale jobs • User based job scheduling • Ranking resources based on predicted wait times • Standard Web Service interface for web applications • Extensible design for the domain specific software logics Scientific Gateways

Desktop Users, Web Portal and Gateway style application Architecture Standard Web Service Interface Request Manager Resource pool Each of the users keeps resource pool containing certain number of tokens to the computing resource The number of tokens represents the limit on concurrent jobs in the batch system of the targeted resource. QBET Web Service Resource Ranking Manager DataModel Manager Fault Manager Hosted by UCSB User A’s Job Board RDMBS User A’s Job Queue User A’s Resource Pool Job Distributor Tokens for resource X,Y,Z MyProxy Server Hosted by TeraGrid Project Job Execution Manager Condor G with Birdbath High Performance Computing Clusters: Grid style clusters , and condor computing nodes

Queuing groups of jobs and matchmaking Task queue for User A First-in-first-served Finding first available resource in order of the resource list Request retrieved from Backend database Find available resource following the order of the list of resources Job Distributor Lonestar UserA with 100tokens 30 tokens ORNL UserAwith 10 tokens 30 tokens Resource pool Each of the users keeps a resource pool containing a certain number of tokens. The number of tokens represents the limit on the number of concurrent jobs in the batch system of the targeted resource. Cobalt UserB with 300tokens BigRed UserA with 800 tokens Steele UserA with 500 tokens

Describing A group of jobs • Standard Web service interface • Creating a ticket for the job group • Submitting job(s) with the associated ticket. • Encoded parameters into the Web Service interface • Location of executables • location of input data • location of output data • Arguments • job type (serial, parallel) • WallClockTime • Memory requirement • required number of computing nodes • list of resources that can execute this job (sorted or not)

Ranking the Computing Resources • Sorting the list of resources based on the wait time prediction for each of the batch queue systems. • Maintaining the Resource Ranking table with periodic access to the QBETS • QBETS provides queue delay predictions

Tracking the Status of Jobs • Statistical approach to provide the big picture for a large number of jobs • Tracking each of the jobs • Requested • Queued • Submitted • Idle • Completed • Held • Running • Store the information about job submission such as job description, timestamp, status history: easy to track job failure • Users can design their own data-model for log and error files: useful for Web portal or gateway style applications

Performance Evaluation • Host: 4.50GHz Intel Pentium 4 CPUs and 1 GB RAM • Client: 2.33 GHz Intel Xeon CPU and 8GB RAM • 1Gbps network • Axis2 Web service container Test scenario for the multi-users environment

Total turnaround time for the job submission and status check with various job sizes in a single-user environment

Average turnaround time for the various job sizes with varying number of concurrent users

Average turnaround time per operations with varying number of concurrent users

Conclusions • Swarm provides a light weight framework for users to submit a large number of jobs. • Swarm manages a large number of jobs based on user preferences. • Swarm schedules >100,000 of jobs and prioritizes multiple resources based on queue delay prediction. • Swarm provides a job monitoring scheme for a large number of job submissions. • Swarm provides an easy to customize software for application-specific requirement s including data-model and fault handling scheme.

Future work • Intelligent fault detecting • Extensible fault handling • Proactive fault prevention scheme • Incorporating with large scale data management to handle input and output data. • Scalability for an extremely large cluster

Thanks! {swarm}

SWARM: Large-Scale Job Scheduling for HPC Clusters

SWARM: Large-Scale Job Scheduling for HPC Clusters

Presentation Transcript

Multiprocessor and Real-Time Scheduling

Scale-up/down

Evaluating the Significance of Max-gap Clusters

Scheduling in Linux

The Chubby lock service for loosely-coupled distributed systems

CONSTRUCTION MANAGEMENT AND ADMINISTRATION

Contents

Observations of Large Scale Structure: Measures of Galaxy Clustering

Parallel Computing With High Performance Computing Clusters (HPCs)

Architectural Support for System Software on Large-Scale Clusters

Scheduling and Resource Management for Next-generation Clusters

4.01 Career Clusters

Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Detection of a Large-Scale Structure of Intracluster Globular Clusters in the Virgo Cluster

Scheduling Overview

Loosely Coupled Sakai

Cluster scheduling

Galaxy Structure, Galaxy Clusters, Large Scale Structure, Hubble ’ s Law and the Distance Ladder

Introduction to Large Scale Change

Clock-Driven Scheduling

Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters

Astro 101 Test #4 Review