1 / 24

Lecture 26

Lecture 26. Query Processing and Optimization. Query Execution Plans. A query execution plan is a combination of Relational algebra query representation (i.e. tree) There is order Information about the access methods to be used for each relation How to access every relation

wright
Download Presentation

Lecture 26

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 26 Query Processing and Optimization

  2. Query Execution Plans • Aquery execution plan is a combination of • Relational algebra query representation (i.e. tree) • There is order • Information about theaccess methods to be used for each relation • How to access every relation • E.g., PRIMARY INDEX, CLUSTERING INDEX, SECONDARY INDEX • Why not index all fields? • Methods for computing the relational operators stored in the tree • Manyalgorithms to do joins, sorting, selection etc …

  3. Query Execution Plans • Searching algorithms • Linear (#recs/2) or binary (log2(#recs)) • Use index or not? • Joins • Nested loop join • R JOIN S • For each tr in R do • For each ts in S do • Test pair (tr, ts) to see if they can be joined • If so, add to result • Loop • Loop • E.g. Department Join Employee

  4. Query Execution Plans • Select salaryFrom EmployeeWhere salary < 25000 • P salary • salary < 25000; use secondary index on salary; use binary search • Employee

  5. Query Optimization • Can be achieved through two techniques • Using heuristic rules • Reorder the operations in the internal representation of a query (tree or graph) to improve performance • A heuristic rule works well in MOST cases but it is NOT GUARANTEED to work in ALL possible cases • Selections before joins  better efficiency • Using cost estimations • Find the costs of the different execution strategies and choose the one with the lowest cost • Computationally intensive • Most DBMSs combine both

  6. Issues with Heuristics • Which is a better? • (1) Join with a selection afterwards •  (R.A=?) (R R.B = S.B S) • (2) Selection with a join afterwards • ( R.A=? (R)) R.B = S.B S • What if S is too small compared to R • Index on B in R but no index on A •  (2) Apply selection a linear scan of all of R •  (1) Join and reject results that don’t satisfy selection

  7. Query Representation • Query tree • Tree data structure that corresponds to a relational algebra expression • Input relations of the query as leaf nodes of the tree • The relational algebra operations as internal nodes • An execution of the query tree consists of • executing an internal node operation whenever its operands are available • replacing that internal node by the relation that results from executing the operation • There are many trees for the same query • Trees always have a strict order among their operations • Query optimization must find “best” order

  8. Sample Query • Example: For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate • SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATEFROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS EWHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’; • Relational algebra: PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT)) DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

  9. Internal nodes Input relations • Internal nodes are • Executed when inputs are ready • - Replaced by results

  10. Different representation for the same algebra expression … assumed to be the initial form • There are many trees for the same query • - strict order among their operations • - Query optimization must find “best” order

  11. Heuristics in Query Optimization • The main heuristic is to first apply the operations that reduce the size of intermediate results • E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary operations • General heuristic optimization Algorithm • 1- Push selections down • 2- Apply more restrictive selections first • Selectivity estimated by DBMS • 3- Combine cross products and selections to become joins • 4- Push projections down

  12. Steps in converting a query tree during heuristic optimization • Select names of employees working on the ‘Aquarius’ Project and born after 1957 • Select LnameFrom Employee, Works_On, ProjectWherePname = ‘Acquarius’ and bithdate> ’12/31/1957’ and SSN=ESSN and PNumber = PNO • Usually, we start with Cross Products, followed by selections, followed by Projects

  13. Steps in converting a query tree during heuristic optimization (a) Initial query tree for the SQL query made by parser • 1-Push selections down • 2-Apply more restrictive selections first -e.g. equalities before range queries • 3-Combine cross products and selections to become joins • 4-Push projections down

  14. (b) Moving SELECT operations down the query tree

  15. (c) Applying the more restrictive SELECT operation first

  16. (d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations

  17. (e) Moving PROJECT operations down the query tree.

  18. Transformation Rules • Transformation rules transform one relational algebra expression to AN EQUIVALENT ONE • Used by the query optimizer to optimize query tree • Any rule, if applied, makes sure that the resulting tree is equivalent resulting execution plan is equivalent • General Transformation Rules: • (2)Commutativity of s: The s operation is commutative: • sc1 (sc2(R)) =sc2 (sc1(R)) • More selective selections first • (6.a)Commuting s with (or x): If all the attributes in the selection condition c involve only the attributes of one of the relations being joined—say R—the two operations can be commuted as follows: • sc ( R S ) = (sc (R)) S • Do selections first • rest in the book p. 574-576

  19. Heuristics in Query Optimization • Outline of a Heuristic Algebraic Optimization Algorithm • Break up any select operations with conjunctive conditions into a cascade of select operations • Move each select operation as far down the query tree as is permitted by the attributes involved in the selection condition • Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation.

  20. Heuristics in Query Optimization • Combine a cross product operation with a subsequent select operation in the tree into a join operation • Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed • (Important…given a query  optimize using this algorithm) • SQL • For every query block • Initial query tree • Start with Cross Products, then Selections, then Projects • Apply algorithm

  21. Selectivity and Cost Estimates in Query Optimization • A query optimizer does not depend completely on heuristics • Not always optimal • Cost-based query optimization • Estimate and compare the costs of executing a query using different execution strategies and choose the one with the lowest cost estimate • Issues • Cost function • Number of execution strategies to be considered • Limit the number • Much better for compiledqueries where optimization is done once at compile time and the query is executed many times • PreparedStatements VS Statements

  22. Selectivity and Cost Estimates in Query Optimization • Cost Components for Query Execution • Access cost to secondary storage • Searching, reading, writing, updating, etc … • Memory usage cost • Number of memory buffers needed for the query • Storage cost • Storing any intermediate files that are generated by an execution strategy for the query • Communication cost • Shipping the results from the database site to the user’s site • Computation cost • Of performing in-memory operations on the data buffers during the execution plan (searching, sorting, joining, arithmetic)

  23. Exercise – Heuristic Optimization • Query in SQL • For every query block • Initial query tree • Start with Cross Products, then Selections, then Projects • Apply algorithm • Find Lname and SSN of all employees in the ‘Design’ Department working on project 5 who earn more than the highest paid employee working on the ‘Project X’ Project

More Related