tuning database configuration parameters with ituned n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tuning Database Configuration Parameters with iTuned PowerPoint Presentation
Download Presentation
Tuning Database Configuration Parameters with iTuned

Loading in 2 Seconds...

play fullscreen
1 / 30

Tuning Database Configuration Parameters with iTuned - PowerPoint PPT Presentation


  • 224 Views
  • Uploaded on

Tuning Database Configuration Parameters with iTuned. Vamsidhar Thummala Collaborators: Songyun Duan, Shivnath Babu Duke University. Performance Tuning of Database Systems. Physical design tuning Indexes [SIGMOD’98, VLDB’04] Materialized views [SIGMOD’00, VLDB’04]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tuning Database Configuration Parameters with iTuned' - delilah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tuning database configuration parameters with ituned

Tuning Database Configuration Parameters with iTuned

Vamsidhar Thummala

Collaborators: Songyun Duan, Shivnath Babu

Duke University

performance tuning of database systems
Performance Tuning of Database Systems
  • Physical design tuning
    • Indexes [SIGMOD’98, VLDB’04]
    • Materialized views [SIGMOD’00, VLDB’04]
    • Partitioning [SIGMOD’82, SIGMOD’88, SIGMOD’89]
  • Statistics tuning [ICDE’00, ICDE’07]
  • SQL Query tuning [VLDB’04 ]
  • Configuration parameter or Server parameter tuning

[This talk]

database configuration parameters
Database Configuration Parameters
  • Parameters that control
    • Memory distribution
      • shared_buffers, work_mem
    • I/O optimization
      • wal_buffers, checkpoint_segments, checkpoint_timeout, fsync
    • Parallelism
      • max_connections
    • Optimizer’s cost model
      • effective_cache_size, random_page_costdefault_statistics_target, enable_indexscan
need for automated configuration parameter tuning 2 2
Need for Automated Configuration Parameter Tuning (2/2)
  • Number of threads related to configuration parameter tuning vs. other under “PostgreSQL performance” mailing list

Recently, there has been some effort from community to summarize the important parameters [PgCon’08]

typical approach trial and error
Typical Approach: Trial and Error

2

Response:

All the values seem quite reasonable to me. What about the _costvariables? I guess one or more queries are evaluated using a different execution plan, probably sequential scan instead of index scan, hash join instead of merge join, or something like that. Try to log the "slow" statements - see "log_min_statement_duration". That might give you slow queries (although not necessarily the ones causing problems), and you can analyze them. What is the general I/O activity? Is there a lot of data read/written to the disks, is there a lot of I/O wait?

PS: Was the database analyzedrecently?

1

User*:

Hi, list. I've just upgraded pgsql from 8.3 to 8.4. I've used pgtune before and everything worked fine for me. And now i have ~93% cpuload. Here's changed values of config:

default_statistics_target= 50

maintenance_work_mem = 1GB

constraint_exclusion = on

checkpoint_completion_target= 0.9

effective_cache_size = 22GB

work_mem = 192MB

wal_buffers= 8MB

checkpoint_segments= 16

shared_buffers = 7680MB

max_connections = 80

My box is Nehalem 2xQuad 2.8 with RAM32Gb, and there's only postgresql working on it.

What parameters I should give more attention on?

*http://archives.postgresql.org/pgsql-performance/2009-07/msg00323.php, 30th Jul 2009

doing experiments to understand the underlying response surface
Doing Experiments to Understand the Underlying Response Surface
  • TPC-H 4 GB database, 1 GB memory, Query 18
challenges
Challenges
  • Large number of configuration parameters
    • Total ~ 100
    • 10-15 are important depending on OLTP vs. OLAP
    • Brute-Force will not work
      • Results in exponential number of experiments
  • Parameters can have complex interactions
    • Sometimes non-monotonic and counterintuitive
    • Limits the one-parameter-at-a-time approach
  • No holistic configuration tuning tools
    • Existing techniques focus on specific memory related parameters or recommend default settings
our solution ituned
Our Solution: iTuned
  • Practical tool that uses planned experiments to tune configuration parameters
    • An adaptive sampling algorithmto plan the sequence of experiments (Planner)
    • A novel workbench for conducting experiments in enterprise systems (Executor)
    • Features for scalability like sensitivity analysis and use of parallel experiments to reduce
      • Total number of experiments
      • Per experiment running time/cost
outline of the talk
Outline of the talk
  • iTuned Planner
  • iTuned Executor
  • Evaluation
  • Conclusion
problem abstraction
Problem Abstraction
  • Given
    • A database D and workload W
    • Configuration Parameter Vector X = <x1, x2, …, xm >
    • Cost Budget R
    • Goal: Find high performance setting X* subject to the budget constraint
    • Problem: Response surface y = (X) is unknown
  • Solution Approach:
    • Conduct experiments to learn about the response surface
    • Each experiment has some cost and gives sample <Xi, yi >
ituned planner
iTuned Planner
  • Uses an adaptive sampling algorithm

Stopping Criteria:

Based on cost budget, R

1

2

  • Sequential Sampling:
  • Select NEXT experiment, XNEXT based on previous samples
    • Calculate the improvement, IP(X) of each candidate sample and select the sample with highest improvement as XNEXT

Boot Strapping:

Conduct initial set of experiments

Latin Hypercube Sampling

k-Furthest First

improvement of an experiment
Improvement of an Experiment
  • Improvement IP(X) is defined as:
    • y(X*) – y(X) if y(X) < y(X*)
    • 0 otherwise
  • Issue: IP(X) is known only after y(X) is known, i.e., an experiment has to be conducted at X to measure y(X)
  • We estimate IP(X) by calculating the Expected Improvement, EIP(X)
  • To calculate EIP(X), we need to approximate

Improvement at each configuration setting

Probability density function of

(Uncertainty estimate)

conducting experiment at x next using expected improvement

EIP(X)

Conducting Experiment at XNEXT using Expected Improvement

Projection on 1D

Conduct NEXT experiment here

generating pdf through gaussian process
Generating pdf through Gaussian Process
  • We estimate the performance as
    • Where is a regression model, is the residual of the model, captured through Gaussian Process
  • Gaussian Process, captures the uncertainty of the surface
    • is specified by mean and covariance functions
    • We use zero-mean Gaussian process
    • Covariance is a kernel function that inversely depends on the distance between two samples Xi and Xj
      • Residuals at nearby points exhibit higher correlation
calculating expected improvement using gaussian process
Calculating Expected Improvement using Gaussian Process
  • Lemma: Gaussian Process models as a uni-variate Gaussian with mean and variance as
  • Theorem: There exists a closed form for EIP(X)
  • [See paper for proof and details]
tradeoff between exploration vs exploitation
Tradeoff between Exploration vs. Exploitation
  • Settings X with high EIP are either
    • Close to known good settings
      • Assists in exploitation
    • In highly uncertain regions
      • Assists in exploration

EIP(X)

Gaussian Process tries to achieve the balance between exploration vs. exploitation

outline of the talk1
Outline of the talk
  • iTuned Planner
  • iTuned Executor
  • Evaluation
  • Conclusion
goal of the executor
Goal of the Executor
  • To conduct experiments
    • Without impacting production system
    • As close to real production runs as possible
  • Traditional choices
    • Production system itself
      • May impact running applications
    • Test system
      • Hard to replicate exact production settings
      • Manual set-up
ituned executor
iTuned Executor
  • Exploits the underutilized resources to conduct experiments
    • Production systems, Stand-by systems, Test systems, On the cloud
  • Design:
    • Mechanisms: Home & garage containers, efficient snapshots of data
    • Policies: Specified by admins
      • If CPU, memory, disk utilization is below 20% for the past 10 minutes, then 70% resources can be taken for experiments
example mechanism set up on stand by system using zfs solaris and pitr

Home

Apply

WAL

DBMS

Example Mechanism set-up on Stand-by System using ZFS, Solaris, and PITR

Clients

Clients

Clients

Standby Environment

ProductionEnvironment

Standby Machine

Home

Garage

Middle Tier

Apply WAL continuously

Workbench for conducting experiments

Write Ahead Log shipping

DBMS

DBMS

DBMS

Database

Copy on Write

Database

Policy Manager

Interface

Experiment Planner & Scheduler

Engine

outline of the talk2
Outline of the talk
  • iTuned Planner
  • iTuned Executor
  • Evaluation
  • Conclusion
empirical evaluation 1
Empirical Evaluation (1)
  • Two database systems, PostgreSQL v8.2 and MySQL v5.0
  • Cluster of machines with 2GHz processor and 3GB RAM
  • Mixture of workloads
    • OLAP: Mixes of TPC-H queries
      • Varying #queries, #query_types, and MPL
      • Varying scale factors (SF = 1 to SF = 10)
    • OLTP: TPC-W and RuBIS
  • Number of parameters varied: up to 30
empirical evaluation 2
Empirical Evaluation (2)
  • Techniques compared
    • Default parameter settings shipped (D)
    • Manual rule-based tuning (M)
    • Smart Hill Climbing (S)
      • State-of-the-art technique
    • Brute-Force search (B)
      • Run many experiments to find approximation to optimal setting
    • iTuned (I)
  • Evaluation metrics
    • Quality: workload running time after tuning
    • Efficiency: time needed for tuning
comparison of tuning quality
Comparison of Tuning Quality

Simple Workload with one TPC-H Query (Q1)

Complex Workload with mix of TPC-H Queries (Q1+Q18)

ituned s efficiency and scalability
iTuned’s Efficiency and Scalability
  • Run experiments in parallel
  • Abort low-utility experiments early
ituned s sensitivity analysis
iTuned’s Sensitivity Analysis
  • Identify important parameters quickly
    • Use Sensitivity Analysis to reduce experiments
related work
Related work
  • Parameter tuning
    • Focus on specific classes of parameters (mainly memory related buffer pools) [ACM TOS’08, VLDB’06]
    • Statistical Approach for Ranking Parameters [SMDB’08]
    • Brute force approach to experiment design
  • Tools like DB2 Configuration advisor and pg_tune recommend default settings
  • Adaptive approaches to sampling [SIGMETRICS’06]
  • Work related to iTuned’s executor
    • Oracle SQL Performance Analyzer [SIGMOD’09, ICDE’09]
    • Virtualization, snapshots, suspend-resume
conclusion
Conclusion
  • iTuned automates the tuning process by adaptively conducting experiments
    • Our initial results are promising
  • Future work
    • Apply database-specific knowledge to keep optimizer in loop for end-to-end tuning
      • Query plan information
    • Workload compression
    • Experiments in cloud
questions
Questions?
  • Thank You