1 / 53

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. C. Faloutsos Query Optimization – part 2. General Overview - rel. model. Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing.

ratana
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications C. Faloutsos Query Optimization – part 2

  2. General Overview - rel. model • Relational model - SQL • Functional Dependencies & Normalization • Physical Design • Indexing • Query optimization • Transaction processing 15-415 - C. Faloutsos

  3. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos

  4. Sr #1 #2 #3 … #nr Reminder – statistics: • for each relation ‘r’ we keep • nr : # tuples; • Sr : size of tuple in bytes • V(A,r): number of distinct values of attr. ‘A’ 15-415 - C. Faloutsos

  5. Derivable statistics • fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) • br: # blocks (= nr / fr ) 15-415 - C. Faloutsos

  6. Derivable statistics • SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? 15-415 - C. Faloutsos

  7. Selections • we saw simple predicates (A=constant; eg., ‘name=Smith’) • how about more complex predicates, like • ‘salary > 10K’ • ‘age = 30 and job-code=“analyst” ’ • what is their selectivity? 15-415 - C. Faloutsos

  8. Selections – complex predicates • selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr 15-415 - C. Faloutsos

  9. count A F grade Selections – complex predicates • eg., assume that V(grade, TAKES)=5 distinct values • simple predicate P: A=constant • sel(A=constant) = 1/V(A,r) • eg., sel(grade=‘B’) = 1/5 • (what if V(A,r) is unknown??) 15-415 - C. Faloutsos

  10. count A F grade Selections – complex predicates • range query: sel( grade >= ‘C’) • sel(A>a) = (Amax – a) / (Amax – Amin) 15-415 - C. Faloutsos

  11. count A F grade Selections - complex predicates • negation: sel( grade != ‘C’) • sel( not P) = 1 – sel(P) • (Observation: selectivity =~ probability) ‘P’ 15-415 - C. Faloutsos

  12. P1 P2 Selections – complex predicates conjunction: • sel( grade = ‘C’ and course = ‘415’) • sel(P1 and P2) = sel(P1) * sel(P2) • INDEPENDENCE ASSUMPTION 15-415 - C. Faloutsos

  13. P1 P2 Selections – complex predicates disjunction: • sel( grade = ‘C’ or course = ‘415’) • sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) • = sel(P1) + sel(P2) – sel(P1)*sel(P2) • INDEPENDENCE ASSUMPTION, again 15-415 - C. Faloutsos

  14. P1 P2 Selections – complex predicates disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) 15-415 - C. Faloutsos

  15. Selections – summary • sel(A=constant) = 1/V(A,r) • sel( A>a) = (Amax – a) / (Amax – Amin) • sel(not P) = 1 – sel(P) • sel(P1 and P2) = sel(P1) * sel(P2) • sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) • UNIFORMITY and INDEPENDENCE ASSUMPTIONS 15-415 - C. Faloutsos

  16. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos

  17. Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ‘r’ 1 1 2 ... ... M br 15-415 - C. Faloutsos

  18. Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ‘r’ 1 1 2 ... ... M br 15-415 - C. Faloutsos

  19. Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... 15-415 - C. Faloutsos

  20. Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... 15-415 - C. Faloutsos

  21. Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... 15-415 - C. Faloutsos

  22. Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... 15-415 - C. Faloutsos

  23. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best 15-415 - C. Faloutsos

  24. Projection - dup. elimination eg., selectdistinct c-id from TAKES How? Pros and cons? 15-415 - C. Faloutsos

  25. Set operations eg., select* from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? 15-415 - C. Faloutsos

  26. Aggregations eg., selectssn, avg(grade) from TAKES group by ssn How? 15-415 - C. Faloutsos

  27. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins n-way joins estimate cost; pick best 15-415 - C. Faloutsos

  28. 2-way joins output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S have no common attribute) #of output tuples=?? 15-415 - C. Faloutsos

  29. 2-way joins output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  30. 2-way joins output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  31. 2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  32. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos

  33. 2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  34. 2-way joins Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  35. 2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? r(A, ...) s(A, ......) nr ns br + nr*bs 15-415 - C. Faloutsos

  36. 2-way joins Algorithm #1: Blocked nested-loop join read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

  37. 2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ......) 10,000 1,000 1,000 records, 200 blocks 15-415 - C. Faloutsos

  38. 2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs- reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

  39. 2-way joins Observation2 [NOT IN BOOK]: what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

  40. 2-way joins Cost? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples br + br/(k-1) * bs • what if br=k-1? • what if we assign k-1 blocks to inner?) r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

  41. 2-way joins Observation3: can we get rid of the ‘br’ term? cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos

  42. 2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  43. 2-way joins - other algo’s sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  44. 2-way joins - other algo’s sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos

  45. use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) 2-way joins - other algo’s s(A, ......) ns r(A, ...) nr 15-415 - C. Faloutsos

  46. hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets 2-way joins - other algo’s s(A, ......) r(A, ...) 0 1 max 15-415 - C. Faloutsos

  47. how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i 2-way joins - hash join details s(A, ......) Hs-0 Hr-0 r(A, ...) 0 1 max 15-415 - C. Faloutsos

  48. what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max 2-way joins - hash join details 15-415 - C. Faloutsos

  49. Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins - output size estimation;algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos

  50. r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way joins n-way joins 15-415 - C. Faloutsos

More Related