1 / 14

David Le CS257, ID: 126 Feb 28, 2013

15.9 Query Execution Summary. David Le CS257, ID: 126 Feb 28, 2013. Overview. Query Processing Outline of Query Compilation Table Scanning Cost Measures Review of Algorithms One- pass Methods Nested - Loop Join Two - pass Sort- based Hash- based Index- based Multi- pass.

hamish
Download Presentation

David Le CS257, ID: 126 Feb 28, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 15.9 QueryExecutionSummary David LeCS257, ID: 126Feb 28, 2013

  2. Overview • QueryProcessing • Outline of Query Compilation • Table Scanning • CostMeasures • Review of Algorithms • One-passMethods • Nested-LoopJoin • Two-pass • Sort-based • Hash-based • Index-based • Multi-pass

  3. QueryProcessing • Queryiscompiled. This involves extensive optimizationusingoperations of relationalalgebra. • First compiledinto a logicalquery plans, e.g. using expressions of relationalalgebra. • Thenconverted to a physicalquery plan such as selectingimplementation for eachoperator, ordering joins and etc. • Queryisthenexecuted. query Query Compilation query plan Query Execution metadata data

  4. Outline of Query Compilation • Parsing: A parse tree for the query is constructed. • Query Rewrite: The parse tree is converted to an initial query plan and transformed into logical query plan. • Physical Plan Generation: Logical plan is converted into physical plan by selecting algorithms and order of executions. SQL query Parse query expression tree Select logical plan query optimization logical query plan tree Select physical plan physical query plan tree Execute plan

  5. Table Scanning • There are two approaches for locating tuples of relation R: • Table-scan: Get the blocks one by one. • Index-scan: Use index to lead us to all blocks holding R. • Sort-scan takes a relation R and sorting specifications and produces R in a sorted order. This can be accomplished with SQL clause ‘ORDER BY’.

  6. CostMeasures • Estimates of cost are essential for query optimization. • It allows us to determine the slow and fast parts of a query plan. • Reading many consecutive blocks on a track is extremely important since disk I/O’s are expensive in term of time. EXPLAIN SELECT * FROM a JOIN b on a.id = b.id;

  7. CostMeasures • Optimizing Queries: • EXPLAIN SELECT snp.* FROM snp JOIN chr ON snp.chr_key = chr.chr_key WHERE snp_name <> ''

  8. Review of Algorithms • One-pass Methods • Tuple-at-a-time: Selection and projection that do not require an entire relation in memory at once. • Full-relation, unary operations. Must see all or most of tuples in memory at once. Uses grouping and duplicate-eliminator operators. Hash table O(n) or a balanced binary search tree O(n log n) is used for duplicate eliminations to speed up the detections. • Full-relation, binary operations. These include union, intersection, difference, product and join.

  9. Review of Algorithms • Nested-Loop Joins • In a sense, it is ‘one-and-a-half’ passes, since one argument has its tuples read only once, while the other will be read repeatedly. • Can use relation of any size and does not have to fit all in main memory. • Two variations of nested-loop joins: • Tuple-based: Simplest form, can be very slow since it takes T(R)*T(S) disk I/O’s if we are joining R(x,y) with S(y,z). • Block-based: Organizing access to both argument relations by blocks and use as much main memory as we can to store tuples.

  10. Review of Algorithms • Two-pass Algorithms • Usually enough even for large relations. • Based on Sorting: • Partition the arguments into memory-sized, sorted sublists. • Sorted sublists are then merged appropriately to produce desired results. • Based on Hashing: • Partition the arguments into buckets. • Useful if data is too big to store in memory.

  11. Review of Algorithms • Two-pass Algorithms • Sort-based vs. Hash-based: • Hash-based are often superior to sort-based since they require only one of the arguments to be small. • Sorted-based works well when there is reason to keep some of the data sorted.

  12. Review of Algorithms • Index-based Algorithms • Index-based joins are excellent when one of the relations is small, and the other has an index on join attributes. • Clustering and non-clustering indexes: • Clustering index has all tuples with fixed value packed into minimum number of blocks. • A clustered relation can have non-clustering indexes.

  13. Review of Algorithms • Multi-pass Algorithms • Two-pass algorithms based on sorting or hashing can usually take three or more passes and will work for larger data sets. • Each pass of a sorting algorithm reads all data from disk and writes it out again. • Thus, a k-pass sorting algorithm requires 2·k·B(R) disk I/O’s.

  14. Reference Thank You. • Database Systems: The Complete Book, 2nd Edition. Chapter 15, sections 1 to 9.

More Related