1 / 10

# Query Optimization Advanced Databases By Dr. Akhtar Ali - PowerPoint PPT Presentation

Query Optimization Advanced Databases By Dr. Akhtar Ali. Overview of Query Optimization. Logical Optimization: Using some transformation rules and algebraic equivalences To choose different join orders R ⋈ S  S ⋈ R (Commutative) (R ⋈ S) ⋈ T  R ⋈ ( S ⋈ T) (Associative)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Query Optimization Advanced Databases By Dr. Akhtar Ali' - hamish-warren

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

By

Dr. Akhtar Ali

• Logical Optimization: Using some transformation rules and algebraic equivalences

• To choose different join orders

• R ⋈S  S ⋈R (Commutative)

• (R ⋈S) ⋈T  R ⋈ (S ⋈T) (Associative)

• To push selections and projections bellow of joins

• (attr1 op value (R) ) ⋈S  attr1 op value(R ⋈S )

• attr1, attr2, attr3 ((attr1, attr2 (R) ) ⋈S )  attr1, attr2,attr3 (R ⋈S )

• Physical Optimization: For a given R.A expression

• There could be several different plans possible using different implementation of the R.A operators

• Calculate the cost of these different plans and choose the best

• Ideally, we want the best plan, but

• Practically, we should avoid worst plans

• A join is quite expensive operation.

• The cost can be reduced by reducing the sizes of the input relations.

• The sizes of the Input Relations can be reduced by applying:

• Selection: restricting the input relation.

• Projection: reducing the number of columns.

• Usually Selection reduces the size of the input relation more than Projection.

• Projection before the Join should be done quite carefully as the cost of Projection could increase the overall cost if it does not reduce by a good factor the size of the input.

• For each plan, we should be able to estimate the overall cost.

• For each node in the query tree, we estimate the cost of performing the corresponding operation;

• For each node in the query tree, we estimate the size of the result, which is used by the operation in the parent node;

• In order to correctly estimate the cost of each operation and the size of its result, the optimizer uses certain statistical information maintained by the DBMS.

• Cardinality

• The number of tuples for each relation.

• Size

• The number of pages for each relation.

• Index Cardinality

• The number of distinct key values for each index.

• Index Size

• The number of pages for each index.

• Index Height

• The number of non-leaf levels for each tree index.

• Index Range

• The minimum present key value and the maximum present key value for each index.

SELECT S.sname

FROM Reserves R, Sailors S

Where R.sid = S.sid

AND R.bid = 100

AND S.rating > 5

• Cost = 500 + 500 * 1000 I/Os (Using nested loops – page-oriented)

• 500,500 I/Os

• This is not an efficient plan.

• We could have pushed selections down before join, no index is used.

• Goal of Optimization:

• To find more efficient plans that compute the same answer.

• Push Selects before join

• Assuming that there are 1000 tuples in Reserves with bid=100 and 20000 tuples in Sailors with rating > 5. So Cost of Selections:

• Scan Reserves (1000 pages) and write the selected 1000 tuples to temp relation T1 (10 pages), so in total 1010 I/Os

• Scan Sailors (500 pages) and write the selected 20000 tuples to temp relation T2 (250 pages), so in total 750 I/Os

• Total cost = 1010 + 750 = 1760 I/Os so far

• Using SNL the cost = 10 + 10 * 250 = 2510 I/Os

• Total cost = 1760 + 2510 = 4270 I/Os about 117 times less than the cost of the initial plan i.e. 500,500 I/Os

• The same RA as alternative 1

• Using a clustered Hash index on Reserves:

• A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples

• Cost of Selection = 1 + 10 = 11 I/Os

• Cost of Writing to T1 = 10 I/Os

• Sub-Total = 11 +10 = 21

• The size of T1 = 10 pages

• Using a clustered B+ tree index on Sailors:

• Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os, Cost of Writing to T2 = 250 I/Os, Sub-Total = 252 + 250 = 502, and the size of T2 = 250 pages

• Using SNL the cost is:

• Cost = 10 + 10 * 250 = 2510 I/Os

• Total cost = 21 + 502 + 2510 = 3033 I/Os, which is 165 times less than the cost of the initial plan i.e. 500,500 I/Os

Optimization: Alternative 3(using pipelining i.e. intermediate results are not written to disk)

• The same RA as alternative 1

• Using a clustered Hash index on Reserves:

• A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples

• Cost of Selection = 1 + 10 = 11 I/Os

• The size of T1 = 10 pages

• Using a clustered B+ tree index on Sailors:

• Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os

• The size of T2 = 250 pages

• Using Block-Nested Loops join with 7 buffer pages:

• Cost =

• Total cost = 11 + 252 + 510 = 773 I/Os, which is 647 times faster than the initial plan and about 4 times faster than the previous one.

Summary intermediate results are not written to disk)

• Query optimization is very important component of a DBMS.

• We must understand the principals of optimization to know how it influences the performance of a database system.

• Two ways of optimization:

• Logical: Push projection/selection below the join

• Physical: Using indexes and better implementation of relational operators.