1 / 64

CS 440 Database Management Systems

CS 440 Database Management Systems. Query Optimization. Today ’ s lecture. Past lectures. DBMS Architecture. User/Web Forms /Applications/ DBA. query. transaction. Query Parser. Transaction Manager. Query Rewriter. Logging & Recovery. Query Optimizer. Lock Manager.

jetta
Download Presentation

CS 440 Database Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 440 Database Management Systems Query Optimization

  2. Today’s lecture Past lectures DBMS Architecture User/Web Forms/Applications/DBA query transaction Query Parser Transaction Manager Query Rewriter Logging & Recovery Query Optimizer Lock Manager Query Executor Files & Access Methods Lock Tables Buffers Buffer Manager Main Memory Storage Manager Storage

  3. Query Rewriting/Optimization • Parse SQL into a logical query plan. • More than one logical plan! • Pick the fastest -> optimization • Convert the logical query plan to a physical query plan. • More than one physical plan! • Pick the fastest -> optimization

  4. From SQL Query to Logical Query Plan

  5. Logical Query Plan • A tree whose nodes are relational algebra operators. PA1,…,An(sC (R1 ∞ R2 ∞ … ∞ Rk)) SELECTA1, …, An FROMR1, …, Rk WHEREC A1,…,An sC Rk … R1

  6. Logical Query Plan • A tree whose nodes are extended relational algebra operators. A1,…,An SELECT A1, …, An, Func FROMR1, …, Rk WHEREC GROUP BY Ai, …, Aj PA1,…,An(gAi, …, Aj, Func(sC (R1 ∞ … ∞ Rk))) gAi, …, Aj, Func sC Rk … R1

  7. SQL -> Logical Query Plan • Straightforward for simple SQL queries. • Map select-from-where to projection-join-selection • Subquery in where clause • Remove it!

  8. Example: Removing Subqueries StarsIn(title, year, starName) Star(name, address, gender, birthdate) • More examples in 16.3.2 SELECT title FROMStarsIn WHEREstarNameIN ( SELECT name FROM Star WHERE birthdate LIKE ‘%1960’) SELECT title FROMStarsIn, Star WHEREstarName = name AND birthdate LIKE ‘%1960’

  9. Query Rewriting

  10. Query Rewriting • Different logical plans with the same meanings • Same outputs for all possible databases Star(name, address, gender, birthdate) sgender=‘male’ OR name=‘waltz’ U sgender=‘male’ sname = ‘waltz’ Star Generally Faster Star Star

  11. Algebraic Laws • The ways that we can rewrite a logical plan. • Associative and communicative laws R ∞ S = S ∞R, R ∞(S ∞T) = (R ∞S) ∞ T R U S = S U R, R U (S U T) = (R U S) U T R S = S R, R (S T) = (R S) T • Distributive laws R ∞ (S U T) = (R ∞ S) U (R ∞ T) U U U U U U

  12. Algebraic Laws for Selection U • C AND C’(R) = sC(sC’(R)) = sC(R) sC’(R) • C OR C’(R) = sC(R) U sC’(R) • C (R ∞ S) = sC (R) ∞ S • C includes only attributes of R • What if it includes attributes of R and S? sC (R – S) = sC (R) – S sC (R US) = sC (R) UsC (S) sC (R S) = sC (R) S U U

  13. Example: Algebraic Laws for Selection • R(A, B, C, D), S(E, F) • sF=1 (R ∞ A=E S) = ? • sB=2 AND F=3 (R ∞ A=E S) = ?

  14. Algebraic Laws for Projection U PM(PN(R)) = PM N(R) PM(R ∞ S) = PN(PP(R) ∞PQ(S)) • N, P, Q are appropriate subsets of M • Example R(A,B,C,D), S(E, F) • PB,C,F(R ∞A=E S) = P ? (P?(R) ∞P?(S))

  15. Push Selection Down Uses sC (R ∞ S) = sC (R) ∞ S If the selection is executed, fewer tuples will be processes afterwards. It may cause to lose desired ordering, if we use indexes title title sgender=‘male’ AND year=1950 starName=name starName=name gender=‘male’ year=1950 Star StarsIn Star StarsIn

  16. Push Selection Up and Down StarsIn(title, year, starName) Movie(title, year, genre, producer) starName Movie.title=StarsIn.title AND Movie.year=StarsIn.year StarsIn year=1940 Movie

  17. Push Selection Up and Down StarsIn(title, year, starName) Movie(title, year, genre, producer) starName starName syear =1940 title=title AND year=year StarsIn year=1940 Movie.title=StarsIn.title AND Movie.year = StarsIn.year Movie StarsIn Movie

  18. Push Selection Up and Down StarsIn(title, year, starName) Movie(title, year, genre, producer) starName starName starName syear =1940 title=title AND year=year Movie.title=StarsIn.title AND Movie.year=StarsIn.year StarsIn year=1940 title=title AND year=year year=1940 year=1940 Movie StarsIn Movie Movie StarsIn

  19. Push Projection Down StarsIn(title, year, starName) Movie(title, year, genre, producer) title title StarsIn.title=Movie.title StarsIn.title=Movie.title title title Movie StarsIn Movie StarsIn We keep the notation at the top. Less effective than pushing down selection.

  20. Push Projection Down It is not always possible. title sstarName=‘Waltz’ AND producer=‘Tarantino’ AND year=2012 StarsIn.title=Movie.title Movie StarsIn Be careful when there are set operations or aggregation functions in the tree!

  21. Push Duplicate Elimination Down δ syear =1940 syear =1940 StarsIn.title=Movie.title StarsIn.title=Movie.title δ δ Movie StarsIn Movie StarsIn It is not always possible, e.g. aggregation functions. Read the book for algebraic laws of duplicate elimination.

  22. Remove Unnecessary Duplicate Elimination δ syear > 1940 Uset title,year ttle,year StarsIn Movie

  23. Remove Unnecessary Duplicate Elimination syear > 1940 δ δ syear > 1940 Uset Uset title,year ttle,year title,year ttle,year StarsIn Movie StarsIn Movie

  24. Remove Unnecessary Duplicate Elimination syear > 1940 δ δ syear > 1940 syear > 1940 Uset Uset Uset title,year ttle,year title,year ttle,year title,year ttle,year StarsIn Movie StarsIn Movie StarsIn Movie

  25. Query Optimization • 16.5

  26. Optimization Strategies • Optimal approach • Enumerate each possible plan • Measure its performance by running it • Pick the fastest one • Heuristics approach • Use some fixed heuristics • e.g. always nested loop joins • e.g. order relations from smallest to largest • Selinger/ System R optimization

  27. Cost-based Optimization • Plan Space • What is the space of query plans? • Cost estimation • How to estimate cost of each plan, without executing it? • Search Algorithms • How to search the space, as guided by cost estimates

  28. Query Plans • Order of the associative and communicative operators • Join, union, …. • The algorithms for operators • Hash-join, merge-join, … • Implicit operators • Full-scan, index-scan, … • Communication between operators • Pipelining versus materializing • “Interesting order”

  29. Join Order • Many join algorithms are asymmetric • Nested loop • S: outer relation, R: inner relation • S’s tuples are examined against R’s • Index-based join • S has index • S’s tuples are examined against R’s • Left relation: build relation • Right relation: probe relation

  30. Join Order Many possible query plans (join trees) for R ∞ S ∞T ∞ U: BushyLeft deep U T R S T U R S R Right deep S T U

  31. Join Ordering Problem • Given: R1 ⋈ R2 ⋈ … ⋈ Rn • We have a cost function that computes the cost of every join tree. • Find the most efficient join tree for the query. • Dynamic programming: try to optimize partial plans instead of the complete plan.

  32. Dynamic Programming • Find the best plan for each subset of {R1, …, Rn} • Use the information to compute the best plan for larger subsets • Step 1: best plans for {R1}, {R2}, …, {Rn} • Step 2: best plans for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} • … • Step n: best plan for {R1, …, Rn} • A bottom-up strategy

  33. Dynamic Programming • For each subset S ⊆ {R1, …, Rn} calculate: • The estimated size of S: Size(S) • Plan(S): A best plan for S • Cost(S): The cost of Plan(S) • The cost of a plan is the size of intermediate relations in the plan. • We can use other types of costs.

  34. Dynamic Programming • Step 1: For each {Ri}: • Size({Ri}) = B(Ri) • Plan({Ri}) = Ri • Cost({Ri}) = 0. • Step 2: For each {Ri, Rj}: • Size({Ri,Rj}) = estimate of the size of join (more later) • Plan({Ri,Rj}) = Ri∞ Rjor Rj∞Ri • Cost = 0 (no intermediate relations)

  35. Dynamic Programming • Step i: For each S ⊆ {R1, …, Rn} of cardinality i do: • Compute Size(S) • For every S1 ,S2s.t. S = S1S2C = cost(S1) + cost(S2) + size(S1) + size(S2) • If Si is a base table, ignore its size. • Cost(S) = the smallest C • Plan(S) = the corresponding plan • Return Plan({R1, …, Rn})

  36. Dynamic Programming: Example • Cost(R ⋈S) = 0 (no intermediate results) • Cost((R ⋈S) ⋈T) = Cost(R ⋈S) + Cost(T) + size(R ⋈S) [+ Size(T)] = size(R ⋈S)

  37. Dynamic Programming: Example • Relations: R, S, T, U • Number of tuples: 2000, 5000, 3000, 1000 • A toy size estimation method: • Size (A ⋈ B) = 0.01*T(A)*T(B)

  38. Dynamic Programming: Reducing Search Space • Exponential computation! • Consider only left-deep join trees • Ignore join trees with Cartesian product • The size of a Cartesian product is generally much larger than (natural) joins. • Example: R(A,B), S(B,C), U(C,D)(R ∞ U) ∞S has a Cartesian product pick (R ∞ S) ∞ U instead

  39. Cost Estimation • Estimating the cost of a physical plan • The cost of a physical operators depend mainly on the size of its inputs • Sort-merge join for R ∞ S : 3 B(R) + 3 B(S) • In many queries the inputs of operators are intermediate results • Sort-merge join for (R ∞ S) ∞ T • 3 B(T) + 3 B(R ∞ S) + 3 B(R) + 3 B(S)

  40. Size Estimation: Base Tables • Cost factors in query execution • B(R): Number of blocks in R • T(R): Number of tuples in R • V(R,A): Number of distinct values of attribute A in R

  41. Size Estimation: Base Tables • B(R) = T(R ) / block size • T(R): scan the table • Update T(R) • Whenever DBA asks • Periodically

  42. Size Estimation: Base Tables • V(R,A): too much information • Use histogram, …

  43. Size Estimation: Intermediate Results • Estimating the size of a projection • Projection does not eliminate duplicates: T(PA(R)) = T(R)

  44. Size Estimation: Selection • Equality: E = sA=a(R) • T(E) ranges from 0 to T(R) – V(R,A) + 1 • Consider its mean: T(E) = T(R)/V(R,A) • Inequality: E = sA<a(R) • T(E) ranges from 0 to T(R) • Rule of thumb: T(E) = T(R)/3 • Not equal: E = sA<>a(R) • Rule of thumb: T(E) = T(R)

  45. Size Estimation: Selection • ANDin the condition: • R(A,B), E = sA=1 AND B<10(R) • Multiply: • 1/V(R,A) for equality, • 1/3 for inequality • 1 for not equal • T(R) = 10,000, V(R,A) = 50, T(E) = ? • Similar method for OR: • Read example 16.22

  46. A A A Size Estimation: Join • Natural join, R S • Simple cases • The set of A values in R and S are disjoint: T(R S) = 0 • A is a key in S and a foreign key in R: T(R S) = T(R)

More Related