1 / 22

Automatic Index Selection for Shifting Workloads

Automatic Index Selection for Shifting Workloads. Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv). On-line Index Selection. A heavily used database needs indexes

abia
Download Presentation

Automatic Index Selection for Shifting Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Index Selectionfor Shifting Workloads Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv)

  2. On-line Index Selection • A heavily used database needs indexes • But an evolving workload makes it tough • Popularity shifts over time • Applications come and go • The problem: on-line index selection Maintain a dynamic set of indexes that improve performance of current queries

  3. Attempted On-line Solution • Can off-line techniques be adapted? • Off-line algorithms need a fixed query load • This is not given in the on-line problem • We could try: no obvious implementation Gather query workload W Loop can be very slow Optimize indexes for W creation cost not considered

  4. Our Solution • COLT: Continuous On-Line Tuning • A system for on-line index selection • Features: • Continuously tracks query load • Evaluates indexes with what-if optimizer • Adaptively controls overhead • Selects indexes based on recent trends • Prototype implementation in PostgreSQL

  5. Problem Assumptions • We make the following assumptions • Data is in a relational database • Workload of SELECT statements • Restricted to single-column indexes • Given a budget for index storage • This problem still has major challenges • Lifting these assumptions in future work

  6. System Architecture Parser epoch query Optimizer What-if Interface index benefits COLT indexes to profile query & plan Database CREATE INDEX ... DROP INDEX ... Executor (between epochs) queries

  7. COLT Internal Organization COLT recent queries index benefits What-if Interface indexes to profile query & plan candidate indexes

  8. Organizing the Queries • Similar queries are grouped in clusters • Each query is placed in a cluster on arrival • Aggregate benefit statistics for each cluster • Confidence intervals of past measurements Cluster 1 Cluster 2 Query 2 Query 5 Query 1 Query 3 Query 4

  9. Organizing the Indexes new candidate discarded Cold Set • Relevant, but not promising candidates • Benefit measured with crude metric Hot Set • Promising candidates for materialization • Profiled accurately with what-if calls • Indexes materialized by COLT • Disk storage must fit in budget • Profiled with reverse what-if calls Materialized Set

  10. Key Challenges Challenge: Selective Profiling Which hot and materialized indexes are profiled w.r.t. the current query? Challenge: Index Selection How are index candidates selected for the cold, hot, and materialized sets?

  11. Selective Profiling (1/2) map to cluster query C1 C2 C3 C4 relevant for profiling Cold Hot Mat random sample focus on indexes with uncertain benefit in cluster indexes to profile

  12. Selective Profiling (2/2) • Profiling budget: max what-if calls per epoch • Budget is adjusted after each epoch • Set proportional to potential of hot indexes • Potential based on optimistic assumptions • Result: stable workload  suspend profiling shifting workload  intensify profiling

  13. Index Selection (1/3) new candidate Cold • How are new candidates chosen? • Use a crude benefit metric • Cost of index access vs. sequential scan • Approximate and cheap to compute • When each query arrives: • Compute crude benefit of relevant indexes • Indexes with benefit > 0 become candidates

  14. Index Selection (2/3) Cold Hot • How are hot indexes chosen? • At the end of each epoch: • Get crude benefit of hot and cold indexes • Find cut-off point to be in hot set: COLD HOT Crude Benefit Cut-off point derived from two-cluster model

  15. Observed Benefit Predicted Benefit time Index Selection (3/3) Hot Materialized • How to choose materialized set? • At the end of each epoch: • Predict benefit of hot and materialized indexes • Cost of materialization is discounted • Select indexes to maximize total benefit

  16. Minimum Time COLT Extra Time Off-line Extra Time Performance of COLT Phase 1 Phase 2 Phase 3 Phase 4 • Experimented with a 4-phase workload • COLT adapts to each phase • Off-line chooses best static index set

  17. Overhead of Profiling Phase 1 Phase 2 Phase 3 Phase 4 • Overhead peaks at start of each phase • Decreases when system is well tuned • Average < 1 what-if call per query

  18. Closing Remarks • On-line index selection harder than off-line • Efficiency is a higher priority • Essentially need to guess the future • Index creation is an issue • COLT is our solution • Solves a constrained problem • Potentially extensible to other domains • Only some components would change

  19. Thank You

  20. Setting System Parameters • Epoch length = 10 • What-if limit = 20 • Or less if very worried about overhead • Averaging window • Used when predicting future benefit • Indicates the number of epochs that give a good picture of the query distribution • Not easy to set

  21. Other On-line Systems • Bruno and Chaudhuri, ICDE 2007 • Avoids extra optimizer calls completely • Heuristics to capture index interaction • Very different mechanisms for index selection • Sattler, Schallehn, and Geist, IDEAS 2004 • More similarities to COLT • Different prediction of future benefit • No control on profiling overhead

  22. 100% 80% 60% 40% 20% Performance with Noise • The worst-case scenario for COLT • Concentrated bursts of noise queries • Performance loss in some cases

More Related