slide1
Download
Skip this Video
Download Presentation
Query Optimization Advanced Databases By Dr. Akhtar Ali

Loading in 2 Seconds...

play fullscreen
1 / 10

Query Optimization Advanced Databases By Dr. Akhtar Ali - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Query Optimization Advanced Databases By Dr. Akhtar Ali. Overview of Query Optimization. Logical Optimization: Using some transformation rules and algebraic equivalences To choose different join orders R ⋈ S  S ⋈ R (Commutative) (R ⋈ S) ⋈ T  R ⋈ ( S ⋈ T) (Associative)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Query Optimization Advanced Databases By Dr. Akhtar Ali' - hamish-warren


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Query Optimization

Advanced Databases

By

Dr. Akhtar Ali

overview of query optimization
Overview of Query Optimization
  • Logical Optimization: Using some transformation rules and algebraic equivalences
    • To choose different join orders
      • R ⋈S  S ⋈R (Commutative)
      • (R ⋈S) ⋈T  R ⋈ (S ⋈T) (Associative)
    • To push selections and projections bellow of joins
      • (attr1 op value (R) ) ⋈S  attr1 op value(R ⋈S )
      • attr1, attr2, attr3 ((attr1, attr2 (R) ) ⋈S )  attr1, attr2,attr3 (R ⋈S )
  • Physical Optimization: For a given R.A expression
    • There could be several different plans possible using different implementation of the R.A operators
    • Calculate the cost of these different plans and choose the best
    • Ideally, we want the best plan, but
    • Practically, we should avoid worst plans
pushing selection projection below join
Pushing Selection & Projection below Join
  • A join is quite expensive operation.
    • The cost can be reduced by reducing the sizes of the input relations.
  • The sizes of the Input Relations can be reduced by applying:
    • Selection: restricting the input relation.
    • Projection: reducing the number of columns.
  • Usually Selection reduces the size of the input relation more than Projection.
  • Projection before the Join should be done quite carefully as the cost of Projection could increase the overall cost if it does not reduce by a good factor the size of the input.
estimating the evaluation cost of a plan
Estimating the Evaluation Cost of a Plan
  • For each plan, we should be able to estimate the overall cost.
  • For each node in the query tree, we estimate the cost of performing the corresponding operation;
  • For each node in the query tree, we estimate the size of the result, which is used by the operation in the parent node;
  • In order to correctly estimate the cost of each operation and the size of its result, the optimizer uses certain statistical information maintained by the DBMS.
statistics maintained by a dbms
Statistics Maintained by a DBMS
  • Cardinality
    • The number of tuples for each relation.
  • Size
    • The number of pages for each relation.
  • Index Cardinality
    • The number of distinct key values for each index.
  • Index Size
    • The number of pages for each index.
  • Index Height
    • The number of non-leaf levels for each tree index.
  • Index Range
    • The minimum present key value and the maximum present key value for each index.
a motivating example
A motivating example

SELECT S.sname

FROM Reserves R, Sailors S

Where R.sid = S.sid

AND R.bid = 100

AND S.rating > 5

  • Cost = 500 + 500 * 1000 I/Os (Using nested loops – page-oriented)
    • 500,500 I/Os
    • This is not an efficient plan.
  • We could have pushed selections down before join, no index is used.
  • Goal of Optimization:
    • To find more efficient plans that compute the same answer.
optimization alternative 1 no index
Optimization: Alternative 1 (no index)
  • Push Selects before join
  • Assuming that there are 1000 tuples in Reserves with bid=100 and 20000 tuples in Sailors with rating > 5. So Cost of Selections:
    • Scan Reserves (1000 pages) and write the selected 1000 tuples to temp relation T1 (10 pages), so in total 1010 I/Os
    • Scan Sailors (500 pages) and write the selected 20000 tuples to temp relation T2 (250 pages), so in total 750 I/Os
    • Total cost = 1010 + 750 = 1760 I/Os so far
  • Using SNL the cost = 10 + 10 * 250 = 2510 I/Os
  • Total cost = 1760 + 2510 = 4270 I/Os about 117 times less than the cost of the initial plan i.e. 500,500 I/Os
optimization alternative 2 uses indexes
Optimization: Alternative 2 (uses indexes)
  • The same RA as alternative 1
  • Using a clustered Hash index on Reserves:
    • A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples
    • Cost of Selection = 1 + 10 = 11 I/Os
    • Cost of Writing to T1 = 10 I/Os
    • Sub-Total = 11 +10 = 21
    • The size of T1 = 10 pages
  • Using a clustered B+ tree index on Sailors:
    • Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os, Cost of Writing to T2 = 250 I/Os, Sub-Total = 252 + 250 = 502, and the size of T2 = 250 pages
  • Using SNL the cost is:
    • Cost = 10 + 10 * 250 = 2510 I/Os
  • Total cost = 21 + 502 + 2510 = 3033 I/Os, which is 165 times less than the cost of the initial plan i.e. 500,500 I/Os
optimization alternative 3 using pipelining i e intermediate results are not written to disk
Optimization: Alternative 3(using pipelining i.e. intermediate results are not written to disk)
  • The same RA as alternative 1
  • Using a clustered Hash index on Reserves:
    • A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples
    • Cost of Selection = 1 + 10 = 11 I/Os
    • The size of T1 = 10 pages
  • Using a clustered B+ tree index on Sailors:
    • Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os
    • The size of T2 = 250 pages
  • Using Block-Nested Loops join with 7 buffer pages:
    • Cost =
  • Total cost = 11 + 252 + 510 = 773 I/Os, which is 647 times faster than the initial plan and about 4 times faster than the previous one.
summary
Summary
  • Query optimization is very important component of a DBMS.
  • We must understand the principals of optimization to know how it influences the performance of a database system.
  • Two ways of optimization:
    • Logical: Push projection/selection below the join
    • Physical: Using indexes and better implementation of relational operators.
ad