Active Sampling for Accelerated Learning of Performance Models

Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University

Networked Computing Utility A network of clusters or grid sites. Each site is a pool of heterogeneous resources (e.g., CPU, memory, storage, network) Managed as a shared utility. Jobs are task/data workflows. Challenge: choose the ‘best’ resource mapping/schedule for the job mix. Instance of “utility resource planning”. Solution under construction: NIMO Task workflow Task scheduler Site A C1 C3 C2 Site C Site B

Subproblem: Predict Job Completion Time

Premises (Limitations) • Important batch applications are run repeatedly. • Most resources are consumed by applications we have seen in the past. • Behavior is predictable across data sets. • …given some attributes associated with the data set. • Stable behavior per unit of data processed (D) • D is predictable from data set attributes. • Behavior depends only on resource attributes. • CPU type and clock, seek time, spindle count. • Utility controls the resources assigned to each job. • Virtualization enables precise control. • Your mileage may vary.

NIMONonInvasive Modeling for Optimization • NIMO learns end-to-end performance models • Models predict performance as a function of, (a) application profile, (b) data set profile, and (c) resource profile of candidate resource assignment • NIMO is active • NIMO collects training data for learning models by conducting proactive experiments on a ‘workbench’ • NIMO is noninvasive App/data profiles “What if…” Candidate resource profiles (Target) performance Model

Scheduler Training set database Site A C1 C3 C2 Site C Site B The Big Picture Jobs, benchmarks Resource profiler Application profiler Active learning Correlate metrics with job logs Pervasive instrumentation

stall phases (compute resource stalled on I/O) compute phases (compute resource busy) Os (stall occupancy) ) ( Od (storage occupancy) + On (network occupancy) + Oa (compute occupancy) T = D * comp. time total data Generic End-to-End Model occupancy: average time consumed per unit of data directly observable

Independent variables Dependent variables Data profile ( ) Resource profile ( ) Statistical Learning Complexity (e.g., latency hiding, concurrency, arm contention) is captured implicitly in the training data rather than in the structure of the model.

Sampling Challenges • Full system operating range • Samples must cover space of candidate resource assignments • Cost of sample acquisition • Acquiring a sample has a non-negligible cost, e.g., time to acquire a sample, or opportunity cost for the application • Curse of dimensionality • Too many parameters! • E.g., 10 dimensions X 10 values per dimension • 5 minutes for each sample => 951 years for 1% samples!

Active sampling 100% Accuracy of current model Passive sampling Number of training samples Active Learning in NIMO How to learn accurate models quickly? • Passive sampling might not expose the system operating range • Active sampling using “design of experiments” collects most relevant training data • Automatic and quick

Active sampling with acceleration 100% Passive sampling Accuracy of current model Active sampling without acceleration Number of training samples Sample Carefully

Active Sampling Challenges • How to expose themain factors and interactions in the shortest time? • Which dimensions/attributes to perturb? • What values to choose for the attributes? • Where to conduct the experiment? • On a separate system (“workbench”) or “live”?

Planning `active’ experiments • Choose a predictor function to refine • Focus in on the most significant/relevant predictors….or…the least accurate • Example: CPU-intensive app needs an accurate compute time predictor • Choose attribute (if any) to add to the predictor • Example: CPU speed • Choose the values of the attributes • Conduct the experiment • Compute current prediction error; Go to Step 1

Choosing the Next Predictor • Learn the most significant/relevant predictors first. • Static vs. dynamic ordering • Static: define total order, e.g., a priori or by pre-estimates of influence (Plackett-Burman). • Cycle through the order: round-robin vs. improvement threshold • Dynamic: choose the predictor with maximum current error

Choosing New Attributes • Include the most significant/relevant attributes • Choose attributes to expose main factors and interactions • Add an attribute when error reduction from further training with the current set falls below threshold. • Choose the attribute with maximum potential improvement in accuracy. • Establish total order using pre-estimate of relevance using Plackett-Burman.

Choosing New Values • Select a new value sample to train the selected predictor function with the chosen set of attributes. • Range of approaches balance coverage vs. interactions Binary search/bracket PB to identify interactions La-Ib a = #levels for value b = degree of interactions

Experimental Results • Biomedical applications • BLAST, fMRI, NAMD, CardioWave • Resources • 5 CPU speeds, 6 Network latencies, 5 Memory sizes • 5 X 6 X 5 = 150 resource assignments • Goal: Learn executing time model with least number of training assignments • Separate test set to evaluate the accuracy of the current model

BLAST Application • Total time for 150 assignments: 130 hrs • Active sampling: 5 hrs • Sample space: 2% • Incorrect order of predictor refinement • 12 hrs • 10% sample space

BLAST Application • Total time for 150 assignments: 130 hrs • Active sampling: 5 hrs • Sample space: 2% • Incorrect order of attribute refinement • 12 hrs • 10% sample space

Summary/Conclusions • Current SLT – given the right data, learn the right model • Use active sampling to acquire the right data • Ongoing experiments demonstrate the importance/potential of guided active sampling • 2% sample space, >= 90% model accuracy • Upcoming VLDB paper…

Active Sampling for Accelerated Learning of Performance Models