- 51 Views
- Uploaded on
- Presentation posted in: General

Query Optimization Advanced Databases By Dr. Akhtar Ali

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Query Optimization

Advanced Databases

By

Dr. Akhtar Ali

- Logical Optimization: Using some transformation rules and algebraic equivalences
- To choose different join orders
- R ⋈S S ⋈R (Commutative)
- (R ⋈S) ⋈T R ⋈ (S ⋈T) (Associative)

- To push selections and projections bellow of joins
- (attr1 op value (R) ) ⋈S attr1 op value(R ⋈S )
- attr1, attr2, attr3 ((attr1, attr2 (R) ) ⋈S ) attr1, attr2,attr3 (R ⋈S )

- To choose different join orders
- Physical Optimization: For a given R.A expression
- There could be several different plans possible using different implementation of the R.A operators
- Calculate the cost of these different plans and choose the best
- Ideally, we want the best plan, but
- Practically, we should avoid worst plans

- A join is quite expensive operation.
- The cost can be reduced by reducing the sizes of the input relations.

- The sizes of the Input Relations can be reduced by applying:
- Selection: restricting the input relation.
- Projection: reducing the number of columns.

- Usually Selection reduces the size of the input relation more than Projection.
- Projection before the Join should be done quite carefully as the cost of Projection could increase the overall cost if it does not reduce by a good factor the size of the input.

- For each plan, we should be able to estimate the overall cost.
- For each node in the query tree, we estimate the cost of performing the corresponding operation;
- For each node in the query tree, we estimate the size of the result, which is used by the operation in the parent node;
- In order to correctly estimate the cost of each operation and the size of its result, the optimizer uses certain statistical information maintained by the DBMS.

- Cardinality
- The number of tuples for each relation.

- Size
- The number of pages for each relation.

- Index Cardinality
- The number of distinct key values for each index.

- Index Size
- The number of pages for each index.

- Index Height
- The number of non-leaf levels for each tree index.

- Index Range
- The minimum present key value and the maximum present key value for each index.

SELECT S.sname

FROM Reserves R, Sailors S

Where R.sid = S.sid

AND R.bid = 100

AND S.rating > 5

- Cost = 500 + 500 * 1000 I/Os (Using nested loops – page-oriented)
- 500,500 I/Os
- This is not an efficient plan.

- We could have pushed selections down before join, no index is used.
- Goal of Optimization:
- To find more efficient plans that compute the same answer.

- Push Selects before join
- Assuming that there are 1000 tuples in Reserves with bid=100 and 20000 tuples in Sailors with rating > 5. So Cost of Selections:
- Scan Reserves (1000 pages) and write the selected 1000 tuples to temp relation T1 (10 pages), so in total 1010 I/Os
- Scan Sailors (500 pages) and write the selected 20000 tuples to temp relation T2 (250 pages), so in total 750 I/Os

- Total cost = 1010 + 750 = 1760 I/Os so far

- The same RA as alternative 1
- Using a clustered Hash index on Reserves:
- A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples
- Cost of Selection = 1 + 10 = 11 I/Os
- Cost of Writing to T1 = 10 I/Os
- Sub-Total = 11 +10 = 21
- The size of T1 = 10 pages

- Using a clustered B+ tree index on Sailors:
- Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os, Cost of Writing to T2 = 250 I/Os, Sub-Total = 252 + 250 = 502, and the size of T2 = 250 pages

- Using SNL the cost is:
- Cost = 10 + 10 * 250 = 2510 I/Os

- Total cost = 21 + 502 + 2510 = 3033 I/Os, which is 165 times less than the cost of the initial plan i.e. 500,500 I/Os

- The same RA as alternative 1
- Using a clustered Hash index on Reserves:
- A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples
- Cost of Selection = 1 + 10 = 11 I/Os
- The size of T1 = 10 pages

- Using a clustered B+ tree index on Sailors:
- Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os
- The size of T2 = 250 pages

- Using Block-Nested Loops join with 7 buffer pages:
- Cost =

- Total cost = 11 + 252 + 510 = 773 I/Os, which is 647 times faster than the initial plan and about 4 times faster than the previous one.

- Query optimization is very important component of a DBMS.
- We must understand the principals of optimization to know how it influences the performance of a database system.
- Two ways of optimization:
- Logical: Push projection/selection below the join
- Physical: Using indexes and better implementation of relational operators.