1 / 16

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server. Surajit Chaudhuri, Vivek Narasayya Presented by Robert Chen. Motivation. Automate the choice of indexes in the physical design of SQL database input: a workload of SQL queries suggest a set of suitable indexes

dixon
Download Presentation

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server Surajit Chaudhuri, Vivek Narasayya Presented by Robert Chen

  2. Motivation • Automate the choice of indexes in the physical design of SQL database • input: a workload of SQL queries • suggest a set of suitable indexes • achieve performance competitive with that of systems cares by administrators.

  3. Basic approaches • Textbook solutions • take semantic information, produce a design • ignore workload info. • Expert system approach • knowledge are encoded as rules • disconnect from the query optimizer • Cost-driven approach • use optimizer’s cost estimates to compare goodness of hypothetical designs

  4. Want to reduce The number of indexes considered The number of configurations enumerated Index Selection Tool iteratively picks more complex index structures Reduce total number of optimizer invocation Cost Evaluation module Efficiency Queries Cost Config

  5. Architecture of Index Selection Tool Workload What-if Index Creation Candidate index selection SQL Server Configuration Enumeration Cost Evaluation Multi-column Index Generation Final indexes

  6. Starting Point • admissible index • indexable columns: columns that appear in the where clause of a query. • Index can be multi-column. • Starting points for the index selection tool • batch invocation on the optimizer • invocation is expensive since it requires communication across process boundaries.

  7. Cost Evaluation • efficiency goal: reduce the number of optimizer calls, in addition to batching. • Atomic Configurations: • a configuration C is automatic if a query uses all indexes in C. • Derive cost of a Configuration from Atomic configuration: • Cost(Q, C) = Mini{ Cost(Q, Ci)} (for select) • Cost(Q, C) = T + (for insert/delete, it’s an overestimation)

  8. Identifying Atomic Config • Reduce the number of atomic configurations • limit the number of indexes per table(j) • limit the number of tables per configuration(t) • (2, 2), called single-join atomic configurations • two tier • Reduce the cost of evaluation • Relevant index set optimization • Cost(Q, C) = Cost(Q, C’), C’ is a subset of C that only consists of indexable columns.

  9. Relevant Atomic Configurations • Decide the atomic configs to evaluate (adaptive Detection) • 1. N = 2, A = {atomic configuration of size <=2} • 2. Evaluate everything in A. A’ = {} • 3. For each C in A, tests if the evaluated cost of C differs significantly from the derived cost. If so add all atomic configurations of size N+1 that are supersets of C to A’ • 4. If A’ = {}, exit • else A = A’, N = N+1, goto 2.

  10. Candidate Index Selection • Goal: Pick the set of indexes from admissible indexes • Approach:Determine the best configuration for each query, the candidate index set is just the union of all such best configurations. • Determine the best configuration for each query? Enumerate(Ii, Wi)

  11. Configuration Enumeration • The Problem: Given an index set (I) and a Workload(W), determine the best subset of I with K or less indexes. • Enumerate(K, I, W) • Greedy(M, K) • In our system, the value K is constrained by the single-join atomic configuration pruning.

  12. Greedy(M, k) • let S be the best M index configuration using naïve enumeration. If M= k exit • while(|S| < k){ • Pick a new index I s.t. Cost(S {I}) < = Cost(S {I’}) for all I’ <> I; • if(Cost(S {I}) >= Cost(S) ) • exit • else • S = S {I}; • }

  13. Multi-Column Index Generation • Goal: Choose a set of admissible multi(two) column indexes M(a, b) • MC-LEAD: leading column (a) should be from the output of the configurations enumeration step. • MC-ALL: Both columns (a, b) are important

  14. Putting it all together • Candidate index set = admissible index set; • Repeat{ • index-selection; • Greedy(2, k), subject to (2, 2) constraints; • select multi-column indexes using MC-LEAD; • Candidate index set = Union of the above 2 steps; • }until( no considerable improvements are made)

  15. Performance • It works • Overall running time of algorithms improves by a factor of 4 to 10 over baseline algorithms • Drop in quality is small, below 10%.

  16. Summary • 3 “novel” techniques: • remove spurious indexes from consideration by taking into account syntax and cost info • an iterative approach to handle multi-column indexes • reduce the number of atomic configurations (thus the number of optimizer calls) that must be evaluated for a workload.

More Related