Experiment driven system management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Experiment-driven System Management PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on
  • Presentation posted in: General

Experiment-driven System Management. Shivnath Babu Duke University Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala. Managing DBs in Small to Medium Business Enterprises (SMBs). Peter is a system admin in an SMB Manages the database (DB)

Download Presentation

Experiment-driven System Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Experiment driven system management

Experiment-driven System Management

Shivnath Babu

Duke University

Joint work with Songyun Duan, Herodotos Herodotou, and Vamsidhar Thummala


Managing dbs in small to medium business enterprises smbs

Managing DBs in Small to Medium Business Enterprises (SMBs)

  • Peter is a system admin in an SMB

    • Manages the database (DB)

    • SMB cannot afford a DBA

  • Suppose Peter has to tune a poorly-performing DB

    • Design advisor may not help

    • Maybe the problem is with DB configuration parameters

Database (DB)


Tuning db configuration parameters

Tuning DB Configuration Parameters

  • Parameters that control

    • Memory distribution

    • I/O optimization

    • Parallelism

    • Optimizer’s cost model

  • Number of parameters ~ 100

    • 15-25 critical params depending on OLAP Vs. OLTP

  • Few holistic parameter tuning tools available

    • Peter may have to resort to 1000+ page tuning manuals or rules of thumb from experts

    • Can be a frustrating experience


Response surfaces

Response Surfaces

2-dim Projection of a 11-dim Surface

TPC-H 4 GB DB size, 1 GB memory, Query 18


Dba s approach to parameter tuning

DBA’s Approach to Parameter Tuning

  • DBAs run experiments

    • Here, an experiment is a run of the DB workload with a specific parameter configuration

    • Common strategy: vary one DB parameter at a time


Experiment driven management

Experiment-driven Management

Result

Mgmt. task

Are more

experiments

needed?

Yes

Process

output to extract

information

Plan

next set of

experiments

Conduct

experiments on

workbench

Goal: Automate this process


Roadmap

Roadmap

  • Use cases of experiment-driven mgmt.

    • Query tuning, benchmarking, Hadoop, testing, …

  • iTuned: Tool for DB conf parameter tuning

    • End-to-end application of experiment-driven mgmt.

  • .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools


What is an experiment

What is an Experiment?

  • Depends on the management task

    • Pay some extra cost, get new information in return

    • Even for a specific management task, there can be spectrum of possible experiments


Uses of experiment driven mgmt

Uses of Experiment-driven Mgmt.

  • DB conf parameter tuning


Uses of experiment driven mgmt1

Uses of Experiment-driven Mgmt.

DB conf parameter tuning

MapReduce job tuning in Hadoop


Uses of experiment driven mgmt2

Uses of Experiment-driven Mgmt.

  • DB conf parameter tuning

  • MapReduce job tuning in Hadoop

  • Server benchmarking

    • Capacity planning

    • Cost/perf modeling


Experiment driven system management

Uses of Experiment-driven Mgmt.

  • Tuning “problem queries”

<100, 187>

<100, 187>

<100, 436>

<2473, 7496>

<2473, 7496>

<1, 0.6>

<65, 309>

<380459, 229739>

<65, 309>

<1629, 1615>

<1, 1>

<1, 1>

<Estimated, Actual> Cardinality


Experiment driven system management

Uses of Experiment-driven Mgmt.

  • Tuning “problem queries”


Experiment driven system management

Uses of Experiment-driven Mgmt.

  • DB conf parameter tuning

  • MapReduce job tuning in Hadoop

  • Server benchmarking

    • Capacity planning

    • Cost/perf modeling

  • Tuning “problem queries”

  • Troubleshooting

  • Testing

  • Canary in the server farm (James Hamilton, Amazon)


Roadmap1

Roadmap

  • Use cases of experiment-driven mgmt.

    • Query tuning, benchmarking, Hadoop, testing, …

  • iTuned: Tool for DB conf parameter tuning

    • End-to-end application of experiment-driven mgmt.

  • .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools


Problem abstraction

Problem Abstraction

  • Unknown response surface: y = F(X)

    • X = Parameters x1, x2, …, xm

  • Each experiment gives a <Xi,yi> sample

    • Set DB to conf Xi

    • Run workload that needs tuning

    • Measure performance yi at Xi

  • Goal: Find high performance setting with low total cost of running experiments


Example

Utility(X)

Example

  • Goal: Compute the potential utility of candidate experiments

Where to do the

next experiment?


Ituned s adaptive sampling algorithm for experiment planning

iTuned’s Adaptive Sampling Algorithm for Experiment Planning

// Phase I: Bootstrapping

  • Conduct some initial experiments

    // Phase II: Sequential Sampling

  • Loop: Until stopping condition is reached

    • Identify candidate experiments to do next

    • Based on current samples, estimate the utility of each candidate experiment

    • Conduct the next experiment at the candidate with highest utility


Utility of an experiment

Utility of an Experiment

  • Let <X1,y1>--<Xn,yn> be the samples from n experiments done so far

  • Let <X*,y*> be the best setting so far (i.e., y* = mini yi)

    • wlg assuming minimization

  • U(X), Utility of experiment at X is

    // y = F(X)

    • y* - y if y* > y

    • 0 otherwise

  • However, U(X) poses a chicken-and-egg problem

    • y will be known only after experiment is run at X

  • Goal: Compute expected utility EU(X)


Expected utility of an experiment

Expected Utility of an Experiment

  • Suppose we have the probability density function of y (y is the perf at X)

    • Prob(y = v | <Xi,yi> for i=1,…,n)

  • Then, EU(X) = sv=-1 U(X) Prob(y = v) dv

    EU(X) = sv=-1 (y* - v) Prob(y = v) dv

  • Goal: Compute Prob(y = v | <Xi,yi> for i=1,…,n)

v=+1

v=y*


Model gaussian process representation grs of a response surface

Model: Gaussian Process Representation (GRS) of a Response Surface

  • GRS models the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)

    • E.g., g(X) = x1 – 2x2 + 0.1x12 (Learned using common techniques)

    • Z: Gaussian Process to capture regression residual


Primer on gaussian process

Primer on Gaussian Process

  • Univariate Gaussian distribution

    • G = N(,)

    • Described by mean , variance 

  • Multivariate Gaussian distribution

    • [G1, G2, …, Gn]

    • Described by mean vector and covariance matrix

  • Gaussian Process

    • Generalizes multivariate Gaussian to arbitrary number of dimensions

    • Described by mean and covariance functions


Model gaussian process representation grs of a response surface1

Model: Gaussian Process Representation (GRS) of a Response Surface

  • GRS captures the response surface as: y(X) = g(X) + Z(X) (+ (X) for measurement error)

  • If Z is a Gaussian process, then:

     [Z(X1),…,Z(Xn),Z(X)] is multivariate Gaussian

    Z(X) | Z(X1),…,Z(Xn) is a univariate Gaussian

     y(X) is a univariate Gaussian


Parameters of the grs model

Parameters of the GRS Model

  • [Z(X1),…,Z(Xn)] is multivariate Gaussian

    • Z(Xi) has zero mean

    • Covariance(Z(Xi),Z(Xj)) / exp(k–k |xik – xjk|k)

      • Residuals at nearby points have higher correlation

      • k, ½k learned from <X1,y1>--<Xn,yn>


Use of the grs model

Use of the GRS Model

v=y*

  • Recall our goals to compute

    • EU(X) = sv=-1 (y* - v) Prob(y = v) dv

    • Prob(y = v | <Xi,yi> for i=1,…,n)

  • Lemma: Using the GRS, we can compute the mean (X) and variance 2(X) of the Gaussian y(X)

  • Theorem: EU(X) has a closed form that is a product of:

    • Term that depends on (y* - (X))

    • Term that depends on (X)

  • It follows that settings X with high EU are either:

    • Close to known good settings (for exploitation)

    • In highly uncertain regions (for exploration)


Example1

Example

  • Settings X with high EU are either:

    • Close to known good settings (high y*-(X))

    • In highly uncertain regions (high (X))

Unknown actual

surface

(X)

4(X)

y*

EU(X)


Where to conduct experiments

Where to Conduct Experiments?

Clients

Clients

Clients

TestPlatform

Middle Tier

DBMS

TestData

ProductionPlatform

DBMS

DBMS

Data

Data

StandbyPlatform

Write Ahead Log (WAL) shipping


Ituned s solution

iTuned’s Solution

  • Exploit underutilizedresources with minimal impact on production workload

  • DBA/User designates resources where experiments can be run

    • E.g., production/standby/test

  • DBA/User specifies policies that dictate when experiments can be run

    • Separate regular use (home) from experiments (garage)

    • Example: If CPU, mem, & disk utilization < 10% for past 15mins, then resource can be used for experiments


One implementation of home garage

One Implementation of Home/Garage

Home

Home

Garage

Workbench for experiments

Apply

WAL

Apply

WAL

DBMS

DBMS

DBMS

Clients

Clients

Clients

Standby Machine

Middle Tier

ProductionPlatform

WAL shipping

DBMS

Data

Copy on

Write

Data

iTuned

Interface

Engine

Experiment Planner & Scheduler


Overheads are low

Overheads are Low


Empirical evaluation 1

Empirical Evaluation (1)

  • Cluster of machines with 2GHz processors and 3GB memory

  • Two database systems: PostgreSQL & MySQL

  • Various workloads

    • OLAP: Mixes of heavy-weight TPC-H queries

      • Varying #queries, #query_types, and MPL

      • Scale factors 1 and 10

    • OLTP: TPC-W and RUBiS

  • Tuning of up to 30 configuration parameters


Empirical evaluation 2

Empirical Evaluation (2)

  • Techniques compared

    • Default parameter settings shipped (D)

    • Manual rule-based tuning (M)

    • Smart Hill Climbing (S): State-of-the-art technique

    • Brute-Force search (B): Run many experiments to find approximation to optimal setting

    • iTuned (I)

  • Evaluation metrics

    • Quality: workload running time after tuning

    • Efficiency: time needed for tuning


Comparison of tuning quality

Comparison of Tuning Quality


Ituned s scalability features 1

iTuned’s Scalability Features (1)

  • Identify important parameters quickly

  • Run experiments in parallel

  • Stop low-utility experiments early

  • Compress the workload

  • Work in progress:

    • Apply database-specific knowledge

    • Incremental tuning

    • Interactive tuning


Ituned s scalability features 2

#Parameters = 9, #Experiments = 10

iTuned’s Scalability Features (2)

  • Identify important parameters quickly

    • Using sensitivity analysis with a few experiments


Ituned s scalability features 3

iTuned’s Scalability Features (3)


Roadmap2

Roadmap

  • Use cases of experiment-driven mgmt.

    • Query tuning, benchmarking, Hadoop, testing, …

  • iTuned: Tool for DB conf parameter tuning

    • End-to-end application of experiment-driven mgmt.

  • .eX: Language and run-time system that brings experiment-driven mgmt. to users & tuning tools


Back of the envelope calculation

Back of the Envelope Calculation

  • Cost of running these experiments for 1 day on Amazon Web Serv.

    • Server: $10/day

    • Storage: $0.4/day

    • I/O: $5/day

    • TOTAL: $15/day

  • DBAs cost $300/day; Consultants cost $100/hr

  • 1 Day of experiments gives a wealth of info.

    • TPC-H, TPC-W, RUBiS workloads; 10-30 conf. params


Ex power of experiments to the people

Run-time

engine

.eX: Power of Experiments to the People

  • Users & tools express needs as scripts in eXL (eXperiment Language)

  • .eX engine plans and conducts experiments on designated resources

  • Intuitive visualization of results

eXL script

Language processor

.eX

Resources


Current focus of ex

Result

Are more

experiments

needed?

Yes

Process

output to extract

information

Plan

next set of

experiments

Conduct

experiments on

workbench

Current Focus of .eX

  • Parts of an eXL script

    • Query: (approx.) response surface mapping, search

    • Expt. setup & monitoring

    • Constraints & optimization: resources, cost, time

Automatically

generate the

experiment-driven

workflow


Summary

Summary

  • Automated expt-driven mgmt: The time has come

    • Need, infrastructure, & promise are all there

  • We have built many tools around this paradigm

    • http://www.cs.duke.edu/~shivnath/dotex.html

  • Poses interesting questions and challenges

    • Make it easy for users/admins to do expts

    • Make experiments first-class citizens in systems


  • Login