Csce 608 600 database systems
Sponsored Links
This presentation is the property of its rightful owner.
1 / 41

CSCE 608 – 600 Database Systems PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

CSCE 608 – 600 Database Systems. Chapter 15: Query Execution. Index-Based Algorithms. The existence of an index is especially helpful for selection, and helps others Clustered relation : tuples are packed into the minimum number of blocks

Download Presentation

CSCE 608 – 600 Database Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


CSCE 608 – 600 Database Systems

Chapter 15: Query Execution


Index-Based Algorithms

  • The existence of an index is especially helpful for selection, and helps others

  • Clustered relation: tuples are packed into the minimum number of blocks

  • Clustering index: all tuples with the same value for the index's search key are packed into the minimum number of blocks


Index-Based Selection

  • Without an index, selection takes B(R), or even T(R), disk I/O's.

  • To select all tuples with attribute a equal to value v, when there is an index on a:

    • search the index for value v and get pointers to exactly the blocks containing the desired tuples

  • If index is clustering, then number of disk I/O's is about B(R)/V(R,a)


Examples

  • Suppose B(R) = 1000, T(R) = 20,000, there is an index on a and we want to select all tuples with a = 0.

    • If R is clustered and don't use index: 1000 disk I/O's

    • If R is not clustered and don't use index: 20,000 disk I/O's

    • If V(R,a) = 100, index is clustering, and use index: 1000/100 = 10 disk I/O's (on average)

    • If V(R,a) = 10, R is not clustered, index is non-clustering, and use index: 20,000/10 = 2000 disk I/O's (on average)

    • If V(R,a) = 20,000 (a is a key) and use index: 1 disk I/O


Using Indexes in Other Operations

  • If the index is a B-tree, can efficiently select tuples with indexed attribute in a range

  • If selection is on a complex condition such as "a = v AND …", first do the index-based algorithm to get tuples satisfying "a = v".

    • Such splitting is part of the job of the query optimizer


Index-Based Join Algorithm

  • Consider natural join of R(X,Y) and S(Y,Z).

  • Suppose S has an index on Y.

    for each block of R

    for each tuple t in the current block

    use index on S to find tuples of S that match t in the attribute(s) Y

    output the join of these tuples


Analysis of Index-Based Join

  • To get all the blocks of R, either B(R) or T(R) disk I/O's are needed

  • For each tuple of R, there are on average T(S)/V(S,Y) matching tuples of S

    • T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering

    • T(R)*B(S)/V(S,Y) disk I/O's if index is clustering

  • This method is efficient if R is much smaller than S and V(S,Y) is large (i.e., not many tuples of S match)


Join Using a Sorted Index

  • Suppose we want to join R(X,Y)and S(Y,Z).

  • Suppose we have a sorted index (e.g., B-tree) on Y for R and S:

    • do sort-join but

    • no need to sort the indexed relations first


Buffer Management

  • The availability of blocks (buffers) of main memory is controlled by buffer manager.

  • When a new buffer is needed, a replacement policy is used to decide which existing buffer should be returned to disk.

  • If the number of buffers available for an operation cannot be predicted in advance, then the algorithm chosen must degrade gracefully as the number of buffers shrinks.

  • If the number of buffers available is not large enough for a two-pass algorithm, then there are generalizations to algorithms that use three or more passes.


CSCE 608 - 600 Database Systems

Chapter 16: Query Compiler


Query Compiler

Parsing

Logical Query Plan


SQL query

parse

parse tree

convert

answer

logical query plan

execute

apply laws

statistics

Pi

“improved” l.q.p

pick best

estimate result sizes

{(P1,C1),(P2,C2)...}

l.q.p. +sizes

estimate costs

consider physical plans

{P1,P2,…..}


Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Parsing

  • Goal is to convert a text string containing a query into a parse tree data structure:

    • leaves form the text string (broken into lexical elements)

    • internal nodes are syntactic categories

  • Uses standard algorithmic techniques from compilers

    • given a grammar for the language (e.g., SQL), process the string and build the tree


Example: SQL query

SELECT title

FROM StarsIn

WHERE starName IN (

SELECT name

FROM MovieStar

WHERE birthdate LIKE ‘%1960’

);

(Find the movies with stars born in 1960)

Assume we have a simplified grammar for SQL.


SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

nameMovieStar birthDate‘%1960’

Example: Parse Tree

<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

titleStarsIn <Attribute> ( <Query> )

starName <SFW>


The Preprocessor

  • replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query)

  • does semantic checking:

    • are relations and views mentioned in the schema?

    • are attributes mentioned in the current scope?

    • are attribute types correct?


Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Convert Parse Tree to Relational Algebra

  • Complete algorithm depends on specific grammar, which determines forms of the parse trees

  • Here give a flavor of the approach


Conversion

  • Suppose there are no subqueries.

  • SELECT att-list FROM rel-list WHERE cond

    is converted into

    PROJatt-list(SELECTcond(PRODUCT(rel-list))), or

    att-list(cond( X (rel-list)))


<Query>

SELECT movieTitle

FROM StarsIn, MovieStar

WHERE starName = name AND birthdate LIKE '%1960';

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> , <FromList> AND <Condition>

movieTitleStarsIn <RelName> <Attribute> LIKE <Pattern>

MovieStarbirthdate'%1960'

<Condition>

<Attribute> = <Attribute>

starName name


Equivalent Algebraic Expression Tree

movieTitle

starname = name AND birthdate LIKE '%1960'

X

StarsIn MovieStar


Handling Subqueries

  • Recall the (equivalent) query:

    SELECT title

    FROM StarsIn

    WHERE starName IN (

    SELECT name

    FROM MovieStar

    WHERE birthdate LIKE ‘%1960’

    );

  • Use an intermediate format called two-argument selection


Example: Two-Argument Selection

title

StarsIn <condition>

<tuple> IN name

<attribute> birthdate LIKE ‘%1960’

starName MovieStar


Converting Two-Argument Selection

  • To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression

  • Different rules depending on the nature of the subquery

  • Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)


Rules for IN

C

R <Condition>

X

R 

t IN S

S

C is the condition that equates

attributes in t with corresponding

attributes in S


Example: Logical Query Plan

title

starName=name

StarsIn

name

birthdate LIKE ‘%1960’

MovieStar


What if Subquery is Correlated?

  • Example is when subquery refers to the current tuple of the outer scope that is being tested

  • More complicated to deal with, since subquery cannot be translated in isolation

  • Need to incorporate external attributes in the translation

  • Some details are in textbook


Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Improving the Logical Query Plan

  • There are numerous algebraic laws concerning relational algebra operations

  • By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently

  • Next we'll survey some of these laws


Associative and Commutative Operations

  • product

  • natural join

  • set and bag union

  • set and bag intersection

  • associative: (A op B) op C = A op (B op C)

  • commutative: A op B = B op A


Laws Involving Selection

  • Selections usually reduce the size of the relation

  • Usually good to do selections early, i.e., "push them down the tree"

  • Also can be helpful to break up a complex selection into parts


Selection Splitting

  • C1 AND C2 (R) = C1 ( C2 (R))

  • C1 OR C2 (R) = (C1 (R)) Uset (C2 (R))

    if R is a set

  • C1 ( C2 (R)) = C2 ( C1 (R))


Selection and Binary Operators

  • Must push selection to both arguments:

    • C (R U S) = C (R) U C (S)

  • Must push to first arg, optional for 2nd:

    • C (R - S) = C (R) - S

    • C (R - S) = C (R) - C (S)

  • Push to at least one arg with all attributes mentioned in C:

    • product, natural join, theta join, intersection

    • e.g., C (R X S) = C (R) X S, if R has all the atts in C


Pushing Selection Up the Tree

  • Suppose we have relations

    • StarsIn(title,year,starName)

    • Movie(title,year,len,inColor,studioName)

  • and a view

    • CREATE VIEW MoviesOf1996 AS

      SELECT *

      FROM Movie

      WHERE year = 1996;

  • and the query

    • SELECT starName, studioName

      FROM MoviesOf1996 NATURAL JOIN StarsIn;


Remember the rule

C(R S) = C(R) S ?

The Straightforward Tree

starName,studioName

year=1996 StarsIn

Movie


starName,studioName

starName,studioName

starName,studioName

year=1996

year=1996 year=1996

year=1996 StarsIn

StarsIn

Movie

StarsIn

Movie

push selection

up tree

push selection

down tree

Movie

The Improved Logical Query Plan


Laws Involving Projections

  • Consider adding in additional projections

  • Adding a projection lower in the tree can improve performance, since often tuple size is reduced

    • Usually not as helpful as pushing selections down

  • If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree

    • Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is all attributes of R (resp. S) that are used in L

  • Another example:

    • L(R Ubag S) = L(R) UbagL(S)

      But watch out for set union!


Push Projection Below Selection?

  • Rule: L(C(R)) = L(C(M(R)))

    where M is all attributes used by L or C

  • But is it a good idea?

    SELECT starName FROM StarsIn WHERE movieYear = 1996;

starName

starName

movieYear=1996

movieYear=1996

starName,movieYear

StarsIn

StarsIn


Joins and Products

  • Recall from the definitions of relational algebra:

    • R C S = C(R X S) (theta join)

    • R S = L(C(R X S)) (natural join)

      where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates

  • To improve a logical query plan, replace a product followed by a selection with a join

    • Join algorithms are usually faster than doing product followed by selection


Duplicate Elimination

  • Moving  down the tree is potentially beneficial as it can reduce the size of intermediate relations

  • Can be eliminated if argument has no duplicates

    • a relation with a primary key

    • a relation resulting from a grouping operator

  • Legal to push  through product, join, selection, and bag intersection

    • Ex: (R X S) = (R) X (S)

  • Cannot push  through bag union, bag difference or projection


  • Login