670 likes | 919 Views
Query optimization. Lecturer: Assoc. Prof. Dr. DANG Tran Khanh Report: 13070243 Trần Duy Linh 13070263 Nguyễn Minh Thành. Outline. Introduction to Query Processing Translating SQL Queries into Relational Algebra Rules for equivalent RAEs Using Heuristics in Query Optimization
E N D
Query optimization Lecturer: Assoc. Prof. Dr. DANG Tran Khanh Report: 13070243 Trần Duy Linh 13070263 Nguyễn Minh Thành
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Introduction to Query Processing • Query processing: • The process by which the query results are retrieved from a high-level query such as SQL or OQL, ODBMS • Query optimization: • The process of choosing a suitable execution strategy for processing a query. • Two internal representations of a query: • Query Tree • Query Graph
Processing a high-level query Query in a high-level language Scanning, parsing, and validating Immediate form of query Query optimizer Execution plan Query code generator Code to execute the query Runtime database processor Result of query
Example Example of query optimazation
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Translating SQL Queries into Relational Algebra SELECT * FROM R WHEREc σC(R) A1, A2, …(R) SELECT A1, A2, … FROM R SELECT * FROM R, S WHERE c RcS
Translating SQL Queries into Relational Algebra • Query block: the basic unit that can be translated into the algebraic operators and optimized. • A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block. • Nested querieswithin a query are identified as separate query blocks. • Aggregate operators (MAX, MIN, SUM, and COUNT) in SQL must be included in the extended algebra.
Translating SQL Queries into Relational Algebra SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ( SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5 πLNAME, FNAME(σSALARY>C(EMPLOYEE)) ℱMAX SALARY(σDNO=5 (EMPLOYEE))
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Query Trees and Query Graphs • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Query Trees and Query Graphs • Query tree: • A tree data structure that corresponds to a relational algebra expression. • It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. • An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation. • Query graph: • A graph data structure that corresponds to a relational calculus expression. • It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.
Query Trees and Query Graphs • Example: • EMPLOYEE • DEPARTMENT • PROJECT
Query Trees and Query Graphs • Example: For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate. • Relation algebra: PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT)) DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE)) • SQL query: Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Equivalent Relational Expressions • Two Relational Algebra Expressions are equivalent if they produce the same results (tuples) on the same input relations - Although their tuples/attributes may be ordered differently. • An equivalent rule says that expressions of two forms are equivalent Can replace expression of first form by second, or vice versa
Rules for equivalent RAEs • Cascade of σ A conjunctive selection condition can be broken up into a cascade (that is, a sequence) of individual σ operations :
Rules for equivalent RAEs • Commutativity of σ. The σ operation is commutative: • Cascade of π:
Rules for equivalent RAEs • Commuting σ with π: • Commutativity of (and ×)
Rules for equivalent RAEs • Commuting σ with (or x ) a. When involves only the attributes of R b. 1 in R , 2 in S c. 1 in R , 2 in S, involves attributes in both R and S
Rules for equivalent RAEs • Commuting π with (or x ) a. L = L1 L2, L1 in R, L2 in S only involves attributes in L b. L = L1 L2, not in L, = L3 L4 L3 in R, L4 in S
Example rule 7 Ex : R(ACD), S(BEF) S R R S
Rules for equivalent RAEs Commutativityof set operations • Associativity of ( X , ∪, and ∩)
Example rule 9 E T R S T E R S • If R S is better than ST then execute RS first ( Choose a join order ) • Pipeline ( Left Deep Tree or Right Deep Tree )
Rules for equivalent RAEs The π operation commutes with ∪ • Commuting σ with set operations ∪(∩, - ) • Converting a (σ,×) sequence into
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Using Heuristics in Query Optimization • Query Optimizer : choose good execution plan. Two main approach : • Heuristic-based query optimization Reduce the number of operations … • Cost-based query optimization Estimate cost of each operation … • Other : Semantic Query Optimization
Heuristics? • Each Relational Algebra Expression (E) is represented by a Query Tree (Q) • Heuristic Algorithm ? Rewrite E (Q) E1 (Q1) .. En (Qn) + E E1 … En ( : equivalent ) + So En is better than E • Rewrite must be based on equivalent RAE rules
Heuristics : example 1 Ex : R(AB),S(CD) Select A,B From R,S Where R.B = S.C And S.D = 100 (1) (2) (2) Is better than
Heuristics : example 2 (2) Ex : R(AB),S(CD) Select A,B From R,S Where R.B = S.C And S.D>100 (3) (3) Is better than
Outline of a Heuristic Algebraic Optimization Algorithm 1. Break up any SELECT operations with conjunctive conditions into a cascade of SELECT operations (Rule 1) 2. Move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition ( Rules 2, 4, 6, and 10 )
Outline of a Heuristic Algebraic Optimization Algorithm 3. Rearrange the leaf nodes of the tree using the following criteria ( Rules 5 , 9 concerning commutativity and associativity of binary operations ) • position the leaf node relations with the most restrictive SELECT operationsso they are executed first in the query tree representation • make sure that the ordering of leaf nodes does not cause CARTESIAN PRODUCT operations
Outline of a Heuristic Algebraic Optimization Algorithm 4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a JOIN operation, if the condition represents a join condition. (Rule 12) 5. Break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed. ( Using Rules 3, 4, 7, and 11 )
Outline of a Heuristic Algebraic Optimization Algorithm 6. Identify subtrees that represent groups of operations that can be executed by a single algorithm.
EXAMPLE SELECT Pname FROM PROJECT As P, DEPARTMENT As D, EMPLOYEE As E WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Lname=‘Smith’ Pname((Dnum=Dnumber) (Mgr_ssn=Ssn) (Lname=‘Smith’)(PxDxE) Pname (Dnum=Dnumber) (Mgr_ssn=Ssn) (Lname=‘Smith’) x x E Initial Query Tree P D
EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber)(Mgr_ssn=Ssn)(Lname=‘Smith’) (Mgr_ssn=Ssn) (Lname=‘Smith’) x x x E x E P D D P Break up SELECT operation
EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) (Mgr_ssn=Ssn) x (Lname=‘Smith’) (Lname=‘Smith’) (Mgr_ssn=Ssn) x x E x E D P D P Moving SELECT operations down …
EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) x x (Mgr_ssn=Ssn) (Lname=‘Smith’) (Mgr_ssn=Ssn) P x x E (Lname=‘Smith’) D D P E Rearrange the leaf nodes to apply restrictive SELECT operation first
EXAMPLE Pnumber Pnumber (Dnum=Dnumber) (Dnum=Dnumber) x x (Mgr_ssn=Ssn) P P (Mgr_ssn=Ssn) x (Lname=‘Smith’) D (Lname=‘Smith’) D E E Replacing CARTESIAN PRODUCT and SELECT with JOIN operations
EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) x P (Mgr_ssn=Ssn) P (Mgr_ssn=Ssn) (Lname=‘Smith’) D (Lname=‘Smith’) D E E
EXAMPLE Pname Pname (Dnum=Dnumber) (Dnum=Dnumber) Pname,Dnum Dnumber P (Mgr_ssn=Ssn) P (Mgr_ssn=Ssn) (Lname=‘Smith’) D (Lname=‘Smith’) D E E Moving PROJECT operations down the query tree
EXAMPLE Pname (Dnum=Dnumber) Pname,Dnum Dnumber P (Mgr_ssn=Ssn) (Lname=‘Smith’) D E Pname(Dnumber((Lname=‘Smith’)(E) D) Pname,Dnum(P)) (Mgr_ssn=Ssn) (Dnum=Dnumber)
Outline • Introduction to Query Processing • Translating SQL Queries into Relational Algebra • Rules for equivalent RAEs • Using Heuristics in Query Optimization • Cost-based query optimization • Summary
Overview • The cost of executing a query includes the following components: • Access cost to secondary storage • Disk storage cost • Computation cost • Memory usage cost • Communication cost
Overview • The cost of an operation depends on the size and other statistics of its inputs • List some statistics about database relations that are stored in database-system catalogs • Use the statistics to estimate statistics on the results of various relational operations
Catalog Information • nr: number of tuples in a relation r. • br: number of blocks containing tuples of r. • sr: size of a tuple of r. • fr: blocking factor of r . • V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r).
Selection Size Estimation • The size estimate of the result of a selection operation depends on the selection predicate. • single equality predicate • single comparison predicate • combinations of predicates