360 likes | 460 Views
Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications. Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise Getoor, Univ. of Maryland. VLDB 2009, Lyon, France. Index Selection. Index selection problem: Given a query workload
E N D
Index Interactions in Physical Design TuningModeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise Getoor, Univ. of Maryland VLDB 2009, Lyon, France
Index Selection • Index selection problem: • Given a query workload • Choose indices that improve workload performance • Index “benefit” is a key concept • Informally, for an index i, • [benefit of i] = [exec cost without i] – [exec cost with i] • Does index benefit depend on other indices? • If so, this is called index interaction
Related Work • Interactions are a key concern in physical tuning • [Whang et al. 1981] make assumptions implying that indices on different tables do not interact • [Finklestein et al. 1988] assume that indices do not interact if they are relevant to separate queries • [Bruno and Chaudhuri 2007] explicitly account for some interactions in on-line index selection • Many more… • These studies treat interactions as a secondary issue, and often rely on ad hoc assumptions
Index Interactions • Let S be a set of indices relevant to a query Q cost(X) cost(X {a}) benefit({a}, X) cost(X {b}) Indices a,b are independent with respect to X cost(X {a,b}) benefit({a}, X {b})
Index Interactions • Let S be a set of indices relevant to a query Q cost(X) cost(X {a}) benefit({a}, X) cost(X {b}) Indices a,bpositively interact with respect to X cost(X {a,b}) benefit({a}, X {b})
Index Interactions • Let S be a set of indices relevant to a query Q cost(X) cost(X {a}) benefit({a}, X) cost(X {b}) Indices a,bnegatively interact with respect to X cost(X {a,b}) benefit({a}, X {b})
Degree of Interaction • = degree of interaction between a,b with respect to X • = =
Problem Statement • Which indices in S interact? • How strong are the interactions? • The Degree of Interaction Problem:
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information
Query Optimization • Computing doi(a,b) is not practical if the optimizer is totally arbitrary • Need to compute • In practice, query optimization is not arbitrary • E.g., we expect • We put mild assumptions on query optimization: • Plans are selected from some fixed space P • Optimizer chooses the cheapest feasible plan from P • Ties are broken consistently
used in opt plan = 20 cost of plan a b c b c d = 50 = 45 a c b c c d = 65 = 50 c = 80 d = 80 Index Benefit Graph • An Index Benefit Graph (IBG) encodes the selection of optimal plans for a query • Introduced by [Frank, Omiecinski, and Navathe 1992] • Example IBG when S = {a,b,c,d} • There are 16 subsets of S • IBG has 8 nodes • But IBG can compute a b c d = 80
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information
Naive Algorithm • Recall that we want the degree of interaction between all pairs of indices in S • Each doi(a,b) may be computed directly Upon termination, T[a,b] = doi(a,b) for all a,b Can save time using an IBG as a cache of cost function Downside: iteration over all subsets of S
The QINTERACT Algorithm Naive Algorithm (condensed) QINTERACT Algorithm QINTERACT algorithm processes two index sets per IBG node We should avoid evaluating doi(a,b,X) for all
QINTERACT Example • Let’s calculate doi(a,b) on the graph below • What happens on iteration Y = {u} ? a b u v = 20 a b u v = 20 a u v = 30 b u v = 30 a u v = 30 b u v = 30 b v = 40 a u = 40 u v = 40 b v = 40 a u = 40 u v = 40 Y Y v = 50 u = 50 v = 50 u = 50
c d = 65 Interleaved IBG Processing • In QINTERACT, the IBG is built, then analyzed • I.e., IBG construction and analysis is serial • We can discover interactions in a partial IBG a b c d = 20 a b c b c d = 50 = 45 . . . . . . a c b c = 80 = 50 c = 80 d = 80 • IBG construction and analysis may be interleaved • Improves accuracy of doi over time
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information • Visualizing Index Interactions • Scheduling Index Creation
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information • Visualizing Index Interactions • Scheduling Index Creation
Visualizing Index Interactions • We can visualize the doi function as a graph • Nodes correspond to indices • Edge between a and b has weight doi(a,b) LI(SD,D) C(CK,NK) 0.02 0.02 0.02 TPC-H Query 7 0.03 LI(SK,SD,D,EP,OK) C(NK,CK) 0.01 0.02 0.04 0.09 0.01 LI(SD,Q) O(CK,OK) S(NK,N,SK) S(NK,SK) S(SK,NK)
Interaction Graph • The connected components have special meaning
Outline • Properties of Query Optimization • Degree of Interaction Algorithm • Applying Interaction Information • Visualizing Index Interactions • Scheduling Index Creation
Scheduling Index Creation • Suppose we want to materialize new indices • In what order should they be created? Schedule = a,b,c Schedule = b,a,c Schedule = c,a,b Benefit a a,b a,b,c c a,c a,b,c b a,b a,b,c Materialized Indices Choose first schedule to maximize benefit over time (shaded area)
Scheduling Index Creation • We define an optimization problem • M = preexisting indices • {a1, …, an} = new indices to create • Permute new indices as t1, …, tn to maximize • This problem is computationally hard • There is a connection to the Set Cover problem, since each new index “covers” more benefit
Greedy Scheduling • We are tempted to use a greedy heuristic • This results in the third schedule Schedule = a,b,c Schedule = b,a,c Schedule = c,a,b Benefit a a,b a,b,c c a,c a,b,c b a,b a,b,c Materialized Indices Greedy schedule can be suboptimal by a factor of about (n – 1)
Interaction-Aware Scheduling • Scheduling can use interaction graph Idea: First find optimal sub-schedules for each Ci Then choose the best interleaving of sub-schedules This heuristic avoids the pitfalls of greedy scheduling We can also show stronger performance guarantees
Conclusions • Index interactions provide useful insights for physical design tuning • The doi metric is an effective characterization of interaction relationships • We can analyze interactions efficiently when the Index Benefit Graph has limited size • Future work?
Performance Evaluation • QINTERACT implementation in Java • Uses JDBC to connect to IBM DB2 database • Experiments use 22 TPC-H benchmark queries • We generate indices based on the DB2 advisor • SALL = all indices recommended by DB2 • S1C = indices in SALL with first column only • We monitor the progress of the “serial” and “interleaved” approaches over time
Experimental Results SALL index set 0.1 threshold S1C index set 0.1 threshold
Applications • QINTERACT returns doi(a,b) for all a,b • We propose two applications of this information • Visualizing index interactions • Illustrates the global interactions as a graph • Useful when manually tuning the index set • Scheduling index construction • Want to choose when new indices will be created • Goal is to increase performance as quickly as possible • Knowledge of index interactions can help
Problem Statement • Which indices in S interact? • How strong are the interactions? • The Degree of Interaction Problem: • It may be useful to ignore “minor” interactions • A threshold-based variant:
Index Selection • Index selection problem: • We can quantify the benefit of an index: • Does benefit(a, X) depend on X? • If so, this is called index interaction
Future Work • Expand our support for updates • Implementation of visualization tool • Experiments with materialization scheduling • Incremental updates to doi function • Exploring stronger assumptions on query optimization • Efficient upper bounds on doi function?