Csce 608 600 database systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

CSCE 608 – 600 Database Systems PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

CSCE 608 – 600 Database Systems. Chapter 15: Query Execution. Index-Based Algorithms. The existence of an index is especially helpful for selection, and helps others Clustered relation : tuples are packed into the minimum number of blocks

Download Presentation

CSCE 608 – 600 Database Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Csce 608 600 database systems

CSCE 608 – 600 Database Systems

Chapter 15: Query Execution


Index based algorithms

Index-Based Algorithms

  • The existence of an index is especially helpful for selection, and helps others

  • Clustered relation: tuples are packed into the minimum number of blocks

  • Clustering index: all tuples with the same value for the index's search key are packed into the minimum number of blocks


Index based selection

Index-Based Selection

  • Without an index, selection takes B(R), or even T(R), disk I/O's.

  • To select all tuples with attribute a equal to value v, when there is an index on a:

    • search the index for value v and get pointers to exactly the blocks containing the desired tuples

  • If index is clustering, then number of disk I/O's is about B(R)/V(R,a)


Examples

Examples

  • Suppose B(R) = 1000, T(R) = 20,000, there is an index on a and we want to select all tuples with a = 0.

    • If R is clustered and don't use index: 1000 disk I/O's

    • If R is not clustered and don't use index: 20,000 disk I/O's

    • If V(R,a) = 100, index is clustering, and use index: 1000/100 = 10 disk I/O's (on average)

    • If V(R,a) = 10, R is not clustered, index is non-clustering, and use index: 20,000/10 = 2000 disk I/O's (on average)

    • If V(R,a) = 20,000 (a is a key) and use index: 1 disk I/O


Using indexes in other operations

Using Indexes in Other Operations

  • If the index is a B-tree, can efficiently select tuples with indexed attribute in a range

  • If selection is on a complex condition such as "a = v AND …", first do the index-based algorithm to get tuples satisfying "a = v".

    • Such splitting is part of the job of the query optimizer


Index based join algorithm

Index-Based Join Algorithm

  • Consider natural join of R(X,Y) and S(Y,Z).

  • Suppose S has an index on Y.

    for each block of R

    for each tuple t in the current block

    use index on S to find tuples of S that match t in the attribute(s) Y

    output the join of these tuples


Analysis of index based join

Analysis of Index-Based Join

  • To get all the blocks of R, either B(R) or T(R) disk I/O's are needed

  • For each tuple of R, there are on average T(S)/V(S,Y) matching tuples of S

    • T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering

    • T(R)*B(S)/V(S,Y) disk I/O's if index is clustering

  • This method is efficient if R is much smaller than S and V(S,Y) is large (i.e., not many tuples of S match)


Join using a sorted index

Join Using a Sorted Index

  • Suppose we want to join R(X,Y)and S(Y,Z).

  • Suppose we have a sorted index (e.g., B-tree) on Y for R and S:

    • do sort-join but

    • no need to sort the indexed relations first


Buffer management

Buffer Management

  • The availability of blocks (buffers) of main memory is controlled by buffer manager.

  • When a new buffer is needed, a replacement policy is used to decide which existing buffer should be returned to disk.

  • If the number of buffers available for an operation cannot be predicted in advance, then the algorithm chosen must degrade gracefully as the number of buffers shrinks.

  • If the number of buffers available is not large enough for a two-pass algorithm, then there are generalizations to algorithms that use three or more passes.


Csce 608 600 database systems1

CSCE 608 - 600 Database Systems

Chapter 16: Query Compiler


Query compiler

Query Compiler

Parsing

Logical Query Plan


Csce 608 600 database systems

SQL query

parse

parse tree

convert

answer

logical query plan

execute

apply laws

statistics

Pi

“improved” l.q.p

pick best

estimate result sizes

{(P1,C1),(P2,C2)...}

l.q.p. +sizes

estimate costs

consider physical plans

{P1,P2,…..}


Outline

Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Parsing

Parsing

  • Goal is to convert a text string containing a query into a parse tree data structure:

    • leaves form the text string (broken into lexical elements)

    • internal nodes are syntactic categories

  • Uses standard algorithmic techniques from compilers

    • given a grammar for the language (e.g., SQL), process the string and build the tree


Example sql query

Example: SQL query

SELECT title

FROM StarsIn

WHERE starName IN (

SELECT name

FROM MovieStar

WHERE birthdate LIKE ‘%1960’

);

(Find the movies with stars born in 1960)

Assume we have a simplified grammar for SQL.


Csce 608 600 database systems

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

nameMovieStar birthDate‘%1960’

Example: Parse Tree

<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

titleStarsIn <Attribute> ( <Query> )

starName <SFW>


The preprocessor

The Preprocessor

  • replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query)

  • does semantic checking:

    • are relations and views mentioned in the schema?

    • are attributes mentioned in the current scope?

    • are attribute types correct?


Outline1

Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Convert parse tree to relational algebra

Convert Parse Tree to Relational Algebra

  • Complete algorithm depends on specific grammar, which determines forms of the parse trees

  • Here give a flavor of the approach


Conversion

Conversion

  • Suppose there are no subqueries.

  • SELECT att-list FROM rel-list WHERE cond

    is converted into

    PROJatt-list(SELECTcond(PRODUCT(rel-list))), or

    att-list(cond( X (rel-list)))


Csce 608 600 database systems

<Query>

SELECT movieTitle

FROM StarsIn, MovieStar

WHERE starName = name AND birthdate LIKE '%1960';

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> , <FromList> AND <Condition>

movieTitleStarsIn <RelName> <Attribute> LIKE <Pattern>

MovieStarbirthdate'%1960'

<Condition>

<Attribute> = <Attribute>

starName name


Equivalent algebraic expression tree

Equivalent Algebraic Expression Tree

movieTitle

starname = name AND birthdate LIKE '%1960'

X

StarsIn MovieStar


Handling subqueries

Handling Subqueries

  • Recall the (equivalent) query:

    SELECT title

    FROM StarsIn

    WHERE starName IN (

    SELECT name

    FROM MovieStar

    WHERE birthdate LIKE ‘%1960’

    );

  • Use an intermediate format called two-argument selection


Csce 608 600 database systems

Example: Two-Argument Selection

title

StarsIn <condition>

<tuple> IN name

<attribute> birthdate LIKE ‘%1960’

starName MovieStar


Converting two argument selection

Converting Two-Argument Selection

  • To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression

  • Different rules depending on the nature of the subquery

  • Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)


Rules for in

Rules for IN

C

R <Condition>

X

R 

t IN S

S

C is the condition that equates

attributes in t with corresponding

attributes in S


Csce 608 600 database systems

Example: Logical Query Plan

title

starName=name

StarsIn

name

birthdate LIKE ‘%1960’

MovieStar


What if subquery is correlated

What if Subquery is Correlated?

  • Example is when subquery refers to the current tuple of the outer scope that is being tested

  • More complicated to deal with, since subquery cannot be translated in isolation

  • Need to incorporate external attributes in the translation

  • Some details are in textbook


Outline2

Outline

  • Convert SQL query to a parse tree

    • Semantic checking: attributes, relation names, types

  • Convert to a logical query plan (relational algebra expression)

    • deal with subqueries

  • Improve the logical query plan

    • use algebraic transformations

    • group together certain operators

    • evaluate logical plan based on estimated size of relations

  • Convert to a physical query plan

    • search the space of physical plans

    • choose order of operations

    • complete the physical query plan


Improving the logical query plan

Improving the Logical Query Plan

  • There are numerous algebraic laws concerning relational algebra operations

  • By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently

  • Next we'll survey some of these laws


Associative and commutative operations

Associative and Commutative Operations

  • product

  • natural join

  • set and bag union

  • set and bag intersection

  • associative: (A op B) op C = A op (B op C)

  • commutative: A op B = B op A


Laws involving selection

Laws Involving Selection

  • Selections usually reduce the size of the relation

  • Usually good to do selections early, i.e., "push them down the tree"

  • Also can be helpful to break up a complex selection into parts


Selection splitting

Selection Splitting

  • C1 AND C2 (R) = C1 ( C2 (R))

  • C1 OR C2 (R) = (C1 (R)) Uset (C2 (R))

    if R is a set

  • C1 ( C2 (R)) = C2 ( C1 (R))


Selection and binary operators

Selection and Binary Operators

  • Must push selection to both arguments:

    • C (R U S) = C (R) U C (S)

  • Must push to first arg, optional for 2nd:

    • C (R - S) = C (R) - S

    • C (R - S) = C (R) - C (S)

  • Push to at least one arg with all attributes mentioned in C:

    • product, natural join, theta join, intersection

    • e.g., C (R X S) = C (R) X S, if R has all the atts in C


Pushing selection up the tree

Pushing Selection Up the Tree

  • Suppose we have relations

    • StarsIn(title,year,starName)

    • Movie(title,year,len,inColor,studioName)

  • and a view

    • CREATE VIEW MoviesOf1996 AS

      SELECT *

      FROM Movie

      WHERE year = 1996;

  • and the query

    • SELECT starName, studioName

      FROM MoviesOf1996 NATURAL JOIN StarsIn;


The straightforward tree

Remember the rule

C(R S) = C(R) S ?

The Straightforward Tree

starName,studioName

year=1996 StarsIn

Movie


The improved logical query plan

starName,studioName

starName,studioName

starName,studioName

year=1996

year=1996 year=1996

year=1996 StarsIn

StarsIn

Movie

StarsIn

Movie

push selection

up tree

push selection

down tree

Movie

The Improved Logical Query Plan


Laws involving projections

Laws Involving Projections

  • Consider adding in additional projections

  • Adding a projection lower in the tree can improve performance, since often tuple size is reduced

    • Usually not as helpful as pushing selections down

  • If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree

    • Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is all attributes of R (resp. S) that are used in L

  • Another example:

    • L(R Ubag S) = L(R) UbagL(S)

      But watch out for set union!


Push projection below selection

Push Projection Below Selection?

  • Rule: L(C(R)) = L(C(M(R)))

    where M is all attributes used by L or C

  • But is it a good idea?

    SELECT starName FROM StarsIn WHERE movieYear = 1996;

starName

starName

movieYear=1996

movieYear=1996

starName,movieYear

StarsIn

StarsIn


Joins and products

Joins and Products

  • Recall from the definitions of relational algebra:

    • R C S = C(R X S) (theta join)

    • R S = L(C(R X S)) (natural join)

      where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates

  • To improve a logical query plan, replace a product followed by a selection with a join

    • Join algorithms are usually faster than doing product followed by selection


Duplicate elimination

Duplicate Elimination

  • Moving  down the tree is potentially beneficial as it can reduce the size of intermediate relations

  • Can be eliminated if argument has no duplicates

    • a relation with a primary key

    • a relation resulting from a grouping operator

  • Legal to push  through product, join, selection, and bag intersection

    • Ex: (R X S) = (R) X (S)

  • Cannot push  through bag union, bag difference or projection


  • Login