100 likes | 176 Views
This research focuses on predicting delay in batch queues for individual TeraGrid jobs, addressing challenges in workload distribution, scheduling policies, and job execution time estimations. The proposed predictive methodology utilizes a new quantile estimator and changepoint detector for more accurate predictions. The system considers dynamic policy changes and clustering methodologies for improved estimations. Deadline scheduling and a process overview are also discussed.
E N D
Predicting Queue Waiting Time ForIndividual TeraGrid Jobs Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli, Ryan Garver Computer Science Department University of California, Santa Barbara
Problem: Predicting Delay in Batch Queues • Time in queue is experienced as application delay • Sounds like an easy problem, but • Distribution of load from users is a matter of some debate • Scheduling policy is partially hidden • Sites need to change the policies dynamically and without warning • Job execution times are difficult to predict • Much research in this area over the past 20 years, but few solutions • Current commercial systems provide high variance estimates • On-line simulation based on max requested time • “expected” value predictions • Most sites simply disable these features
For Scheduling: It’s all about the big Q • Predictions of the form • “What is the maximum time my job will wait with X% certainty?” • “What is the minimum time my job will wait with X% certainty?” • Requires two estimates if certainty is to be quantified • Estimate the (1-X) quantile for the distribution of availability => Qx • Estimate the upper or lower X% confidence bound on the statistic Qx=> Q(x,b) • If the estimates are unbiased, and the distribution is stationary, future availability duration will be larger than Q(x,b)X% of the time, guaranteed
BMBP: A New Predictive Methodology • New quantile estimator invention based on Binomial distribution • Requires carefully engineered numerical system to deal with large-scale combinatorics • New changepoint detector • Binomial method in a time series context is difficult • Need a system to determining • Stationary regions in the data • Minimum statistically meaningful history in each region • New clustering methodology • More accurate estimates are possible if predictions are made from jobs with similar characteristics • Takes dynamic policy changes into account more effectively
Predicting Things Upside Down • Deadline scheduling: My job needs to start in the next X seconds for the results to be meaningful. • Amitava Mujumdar, Tharaka Devaditha, Adam Birnbaum (SDSC) • Need to run a 4 minute image reconstruction that completes in the next 8 minutes • Given a • Machine • Queue • Processor count • Run time • Deadline • What is the probability that a job will meet the deadline? • http://nws.cs.ucsb.edu/batchq/invbqueue.php
See it In Action • http://nws.cs.ucsb.edu/batchq
How Does it Work? • NWS sensors at each site read batch queue scheduler logs • Sanitized: • Machine name • queue name • Node/core count • Max run time • Submit time • Start time • Sensors periodically send updated log records to UCSB • At UCSB • NWS log data is extracted • Forward and inverted predictions are asynchronously • all made for all machine/queue/cluster combinations • Data served through multiple interfaces • Web service, HTML, BQP
What are the Problems? • Batch queue scheduler logs are designed to support accounting • Each uses a different format and logs different information • Accuracy is not considered important • Not all scheduler relevant events are logged • Node decommisioning/addition • Static metadata is not provided • Queue constraints • Cores or nodes scheduled? • Number of processing elements (nodes/cores) • Better information is needed going forward • Evaluate scheduling policy changes • Urgent computing • Co-allocation/advanced reservations
Static Metadata Proposal • Per Machine • some short one word 'tag' identifying machine (ex: "ncsateragrid") • list of login hostnames that users log in to • hostname of machine with static hostname to ip mapping (net accessibleservices run here) • machine name (ex: "NCSA ia64 TeraGrid") • Number of nodes • Number of processing elements/node • Per Queue • UNIT of computational elements "core", "processor", "node" ...) • default queue? (boolean) • list of job restrictions placed on 'normal user' for this queue max number of computational elements available for request (int) max walltime request (int)
ANL Example • <machine> • <tag>ucteragrid</tag> • <sensorhost>tg-grid.uc.teragrid.org</sensorhost> <sensorport>8062</sensorport> • <totalcores>314</totalcores> • <loginhosts> • <host>tg-login.uc.teragrid.org</host> • <host>tg-login1.uc.teragrid.org</host> • <host>tg-login2.uc.teragrid.org</host> • </loginhosts> • <label>UofC/ANL TeraGrid Cluster</label> <defqueue>dque</defqueue> • <queues> • <queue> • <name>dque</name> • <procunit>cores</procunit> • <proclimit>2048</proclimit> • <walllimit>86400</walllimit> • </queue> • <queue> • <name>high</name> • <procunit>cores</procunit> • <proclimit>512</proclimit> • <walllimit>43200</walllimit> • </queue> • <queue> • name>interactive</name> • <procunit>nodes</procunit> • <proclimit>1</proclimit> • <walllimit>3600</walllimit> • </queue> • </queues> • </machine>