1 / 63

Query optimization

Query optimization. Lecturer: Assoc. Prof. Dr. DANG Tran Khanh Report: 13070243 Trần Duy Linh 13070263 Nguyễn Minh Thành. Outline. Introduction to Query Processing Translating SQL Queries into Relational Algebra Rules for equivalent RAEs Using Heuristics in Query Optimization

Download Presentation

Query optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query optimization Lecturer: Assoc. Prof. Dr. DANG Tran Khanh Report: 13070243 Trần Duy Linh 13070263 Nguyễn Minh Thành

  2. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  3. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  4. Introduction to Query Processing • Query processing: • The process by which the query results are retrieved from a high-level query such as SQL or OQL, ODBMS • Query optimization: • The process of choosing a suitable execution strategy for processing a query. • Two internal representations of a query: • Query Tree • Query Graph

  5. Processing a high-level query Query in a high-level language Scanning, parsing, and validating Immediate form of query Query optimizer Execution plan Query code generator Code to execute the query Runtime database processor Result of query

  6. Example Example of query optimazation

  7. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  8. Translating SQL Queries into Relational Algebra SELECT * FROM R WHEREc σC(R) A1, A2, …(R) SELECT A1, A2, … FROM R SELECT * FROM R, S WHERE c RcS

  9. Translating SQL Queries into Relational Algebra • Query block: the basic unit that can be translated into the algebraic operators and optimized. • A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block. • Nested querieswithin a query are identified as separate query blocks. • Aggregate operators (MAX, MIN, SUM, and COUNT) in SQL must be included in the extended algebra.

  10. Translating SQL Queries into Relational Algebra SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ( SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5 πLNAME, FNAME(σSALARY>C(EMPLOYEE)) ℱMAX SALARY(σDNO=5 (EMPLOYEE))

  11. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Query Trees and Query Graphs • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  12. Query Trees and Query Graphs • Query tree: • A tree data structure that corresponds to a relational algebra expression. • It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. • An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation. • Query graph: • A graph data structure that corresponds to a relational calculus expression. • It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.

  13. Query Trees and Query Graphs • Example: • EMPLOYEE • DEPARTMENT • PROJECT

  14. Query Trees and Query Graphs • Example: For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate. • Relation algebra: PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT)) DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE)) • SQL query: Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;

  15. Query Trees and Query Graphs

  16. Query Trees and Query Graphs

  17. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  18. Equivalent Relational Expressions • Two Relational Algebra Expressions are equivalent if they produce the same results (tuples) on the same input relations - Although their tuples/attributes may be ordered differently. • An equivalent rule says that expressions of two forms are equivalent Can replace expression of first form by second, or vice versa

  19. Rules for equivalent RAEs • Cascade of σ A conjunctive selection condition can be broken up into a cascade (that is, a sequence) of individual σ operations :

  20. Rules for equivalent RAEs • Commutativity of σ. The σ operation is commutative: • Cascade of π:

  21. Rules for equivalent RAEs • Commuting σ with π: • Commutativity of (and ×)

  22. Rules for equivalent RAEs • Commuting σ with (or x ) a. When involves only the attributes of R b. 1 in R , 2 in S c. 1 in R , 2 in S,  involves attributes in both R and S

  23. Rules for equivalent RAEs • Commuting π with (or x ) a. L = L1 L2, L1 in R, L2 in S  only involves attributes in L b. L = L1 L2,  not in L,  = L3  L4 L3 in R, L4 in S

  24. Example rule 7 Ex : R(ACD), S(BEF) S R R S

  25. Rules for equivalent RAEs Commutativityof set operations • Associativity of ( X , ∪, and ∩)

  26. Example rule 9 E T R S T E R S • If R S is better than ST then execute RS first ( Choose a join order ) • Pipeline ( Left Deep Tree or Right Deep Tree )

  27. Rules for equivalent RAEs The π operation commutes with ∪ • Commuting σ with set operations ∪(∩, - ) • Converting a (σ,×) sequence into

  28. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  29. Using Heuristics in Query Optimization • Query Optimizer : choose good execution plan. Two main approach : • Heuristic-based query optimization Reduce the number of operations … • Cost-based query optimization Estimate cost of each operation … • Other : Semantic Query Optimization

  30. Heuristics? • Each Relational Algebra Expression (E) is represented by a Query Tree (Q) • Heuristic Algorithm ? Rewrite E (Q)  E1 (Q1)  ..  En (Qn) + E  E1  …  En ( : equivalent ) + So En is better than E • Rewrite must be based on equivalent RAE rules

  31. Heuristics : example 1 Ex : R(AB),S(CD) Select A,B From R,S Where R.B = S.C And S.D = 100 (1) (2) (2) Is better than

  32. Heuristics : example 2 (2) Ex : R(AB),S(CD) Select A,B From R,S Where R.B = S.C And S.D>100 (3) (3) Is better than

  33. Outline of a Heuristic Algebraic Optimization Algorithm 1. Break up any SELECT operations with conjunctive conditions into a cascade of SELECT operations (Rule 1) 2. Move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition ( Rules 2, 4, 6, and 10 )

  34. Outline of a Heuristic Algebraic Optimization Algorithm 3. Rearrange the leaf nodes of the tree using the following criteria ( Rules 5 , 9 concerning commutativity and associativity of binary operations ) • position the leaf node relations with the most restrictive SELECT operationsso they are executed first in the query tree representation • make sure that the ordering of leaf nodes does not cause CARTESIAN PRODUCT operations

  35. Outline of a Heuristic Algebraic Optimization Algorithm 4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a JOIN operation, if the condition represents a join condition. (Rule 12) 5. Break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed. ( Using Rules 3, 4, 7, and 11 )

  36. Outline of a Heuristic Algebraic Optimization Algorithm 6. Identify subtrees that represent groups of operations that can be executed by a single algorithm.

  37. EXAMPLE

  38. EXAMPLE SELECT Pname FROM PROJECT As P, DEPARTMENT As D, EMPLOYEE As E WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Lname=‘Smith’ Pname((Dnum=Dnumber)  (Mgr_ssn=Ssn)  (Lname=‘Smith’)(PxDxE) Pname (Dnum=Dnumber)  (Mgr_ssn=Ssn)  (Lname=‘Smith’) x x E Initial Query Tree P D

  39. EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber)(Mgr_ssn=Ssn)(Lname=‘Smith’) (Mgr_ssn=Ssn)  (Lname=‘Smith’) x x x E x E P D D P Break up SELECT operation

  40. EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) (Mgr_ssn=Ssn) x  (Lname=‘Smith’) (Lname=‘Smith’) (Mgr_ssn=Ssn) x x E x E D P D P Moving SELECT operations down …

  41. EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) x x (Mgr_ssn=Ssn) (Lname=‘Smith’) (Mgr_ssn=Ssn) P x x E (Lname=‘Smith’) D D P E Rearrange the leaf nodes to apply restrictive SELECT operation first

  42. EXAMPLE Pnumber Pnumber (Dnum=Dnumber) (Dnum=Dnumber) x x (Mgr_ssn=Ssn) P P (Mgr_ssn=Ssn) x (Lname=‘Smith’) D (Lname=‘Smith’) D E E Replacing CARTESIAN PRODUCT and SELECT with JOIN operations

  43. EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) x P (Mgr_ssn=Ssn) P (Mgr_ssn=Ssn) (Lname=‘Smith’) D (Lname=‘Smith’) D E E

  44. EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) Pname,Dnum Dnumber P (Mgr_ssn=Ssn) P (Mgr_ssn=Ssn) (Lname=‘Smith’) D (Lname=‘Smith’) D E E Moving PROJECT operations down the query tree

  45. EXAMPLE Pname (Dnum=Dnumber) Pname,Dnum Dnumber P (Mgr_ssn=Ssn) (Lname=‘Smith’) D E Pname(Dnumber((Lname=‘Smith’)(E) D) Pname,Dnum(P)) (Mgr_ssn=Ssn) (Dnum=Dnumber)

  46. Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary

  47. Overview • The cost of executing a query includes the following components: • Access cost to secondary storage • Disk storage cost • Computation cost • Memory usage cost • Communication cost

  48. Overview • The cost of an operation depends on the size and other statistics of its inputs • List some statistics about database relations that are stored in database-system catalogs • Use the statistics to estimate statistics on the results of various relational operations

  49. Catalog Information • nr: number of tuples in a relation r. • br: number of blocks containing tuples of r. • sr: size of a tuple of r. • fr: blocking factor of r . • V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r).

  50. Selection Size Estimation • The size estimate of the result of a selection operation depends on the selection predicate. • single equality predicate • single comparison predicate • combinations of predicates

More Related