Query optimization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 95

CMPT 454 PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on
  • Presentation posted in: General

Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans

Download Presentation

CMPT 454

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Query optimization

Query Optimization

CMPT 454


Query optimization1

Query Optimization

  • Introduction

  • Query Execution

  • Unary Operations

  • External Sorting

  • Binary Operations


Query compilation steps

Query Compilation Steps

  • Parsing

    • Construct a parse tree for a query

  • Generate equivalent logical query plans

    • Convert the parse tree to a query plan in relational algebra

    • Transform the plan into more efficient equivalents

  • Generate a physical plan

    • Select algorithms for each of the operators in the query

    • Including details about how tables are to be accessed or sorted


Query optimization2

Query Optimization

  • Generating equivalent logical plans and their physical plans is known as query optimization

  • Selecting a good query plan entails deciding

    • Which equivalent plan leads to the most efficient query

    • Which algorithm should be used to implement an operation

    • How data from one operation should be passed to the next

  • These choices depend on database metadata

    • Size of relations

    • Number and frequency of attributes

    • Indexing and data file organization


Query processing

Query Processing

query

Query Compilation

generating, improving and estimating query plans

Query Execution

algorithms to implement physical plan operators

result


Query execution

Query Execution


Introduction

Introduction

  • A physical query plan is made up of operators

    • Each implementing one step of the plan and

    • Corresponding to a relational algebra operation

  • Some relational algebra operations may require lower level physical operations

    • Such as scanning a table


Sorting and scanning

Sorting and Scanning

  • It is sometimes necessary or useful to sort data as it is scanned

    • To satisfy a query with an ORDER BY clause

    • Or because an algorithm requires sorted input

  • There are a number of ways in which a sort scan can be performed

    • Main memory sorting

    • B tree index

    • Multi-way mergesort


Statistics

Statistics

  • There are M main memory frames available for an operation

    • M may or may not be the entire main memory

  • Information relating to a relation (table) R

    • B(R) or B – number of blocks of R

    • T(R) or T – number of tuples (i.e. records) of R

    • V(R, a) – number of distinct values for attribute a in R

  • Assume that data is accessed one block at a time from a disk

    • This assumption will be discussed later


Computation model

Computation Model

  • Cost metric

    • The number of disk I/Os required for an operation

  • Assume that

    • Input to an operation is to be read from disk

    • The result is left in main memory (does not include write cost)

  • Note that the size of a result should not influence the choice of algorithm for a particular operation

    • Although it may influence the order of operations


I o cost for scans

I/O Cost for Scans

  • The cost to scan R depends on its organization

    • If R is organized into a single file the cost of a scan is B

    • And if B < M the cost of a sort scan is B

    • If R is distributed with records from other tables the cost of a scan may be higher

  • The use of an index does not reduce the cost to scan R

    • But may reduce the cost to retrieve a part of R


Unary operations

Unary Operations


Introduction1

Introduction

  • In this section the costs of individual relational algebra operations are considered

    • There are often a number of algorithms that can be used to implement the same operation

  • Each operation is considered in isolation

    • How the individual operations interact is discussed in a later section

    • Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of

      • The order of the records in the result

      • The algorithm used to achieve that result


Selection

Selection

SELECT *

FROM Customer

WHERE age > 50

  • Only records that match the selection are included in the result

    • The size of the result depends on the number of records that match the selection

  • Complex selections that contain more than a single term will be considered later

age > 50(Customer)


Selections with no index

Selections with No Index

  • If the file is not sorted on the selection attribute

    • Perform a file scan, cost is B reads

    • If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2)

  • The file is sorted on the selection attribute

    • Perform a binary search to find the first record that matches the selection criteria

    • Then read all records that match the selection criteria

      • Which may entail reading additional pages

    • log2(B) + the number of pages of matching records - 1


Selections with b tree index

Selections with B+ Tree Index

  • Find the leaf page of the first index entry that points to a matching record

    • The number of disk reads is equal to the height of the tree

    • The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads

  • Then retrieve the matching records

    • This cost is determined by the number of leaf pages containing matching entries, and

    • The number of records to be retrieved, and

    • Whether the index is primary or secondary


Primary b tree index

Primary B+ Tree Index

  • In a primary index the file is sorted on the search key

  • If the selection is small it may fit on one page

    • An additional disk read to access this page is required

  • If the selection is large, additional pages may have to be read

    • The selection may cover a large range of records, or

    • May be an equality selection on an attribute with few distinct values

    • The exact number of pages depends on the number of records that match the selection


Secondary b tree index

Secondary B+ Tree Index

  • In a secondary index the sort order of the file (if any) is different from the search key of the index

    • The leaves of the tree form a dense index on the file

  • Adjacent matching data entries are unlikely to point to records on the same page

    • A disk read is required for each matching record

  • For large selections, a secondary index may be less efficient than scanning the entire table!

    • In practice it is important to recognize this, by estimating the number of records returned by a selection


Secondary b tree range selection

Secondary B+ Tree Range Selection

SELECT *

FROM Patient

WHERE lName = 'Zoobat'

lName = "Zoobat"(Patient)

4 patients called Zoobat

file


Selections with hash index

Selections with Hash Index

  • A hash index can be used to efficiently retrieve equality selections

    • The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads

    • The cost to retrieve the corresponding records depends on whether the index is primary or secondary

      • If the index is primary, matching records are likely to reside on the same, single, file page whereas

      • If the index is secondary, matching records are likely to reside on different pages of the file

  • Hash indexes cannot be used for range selections


Complex selections

Complex Selections

  • Selections may be made up of a number of terms connected by and () and or ()

    • These terms may reference different attributes

    • It is useful to know whether or not an index can be used to evaluate a selection

  • To determine this the selection is first expressed in conjunctive normal form (CNF)

    • A collection of conjuncts (terms joined by and)

    • Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions)

      • e.g. (A  B)  C  D  (A  C  D)  (B  C  D)


Selections with no disjunctions

Selections with no Disjunctions

  • Hash indexes can be used if there is an equality condition for every attribute in the search key

    • e.g. a single hash index on {city, street, number}

      • city="London"street="Baker"number=221(Detective)matches

      • city="Los Angeles"street="Cahuenga"(Detective)does not

  • Tree indexes can be used if there is a selection on each of the first n attributes of the search key

    • e.g. B+ index on {city, street, number}

      • Bothcity="London"street="Baker"number=221(Detective) and

      • city="Los Angeles"street="Cahuenga"(Detective) match


Selections with no disjunctions1

Selections with no Disjunctions…

  • If an index matches a subset of the conjuncts

    • Use the index to return a result that contains some unwanted records

    • Scan this result for matches to the other conjuncts

      • city="London"street="Baker"number=221fName = "Sherlock" (Detective)

  • If more than one index matches a conjunct

    • Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria

    • Or use all indexes and retrieve therids

    • Take the intersection of the rids and retrieve those records


Selections with no disjunctions2

Selectionswith no Disjunctions…

  • Consider the relation and selection shown below

    • Detective = {id, fName, lName, age, city, street, number, author}

    • city="New York"author="Spillane"lName="Hammer"(Detective)

  • With indexes

    • Secondary hash index, {city, street, number}

    • Secondary B+ tree index, {lName, fName}

    • Secondary hash index, {author}

  • There are two strategies:

    • Use the most selective of the two matching indexes, and search the results for the remaining criteria

    • Use both indexes, take the intersection of the rid

can’t be used

can be used

can be used

What if the B+ tree index is primary?


Selections with disjunctions

Selections with Disjunctions

  • If necessary, restate the selection in CNF

  • If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct

    • If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection

  • Otherwise, use an index that matches one of the conjuncts and retrieve those records

    • The remaining selection criteria can then be applied to the results from using that index


Selections with disjunctions1

Selections with Disjunctions …

  • Consider a selection of this form

    • (a  b  c)  (d  e  f)(R)

    • Where each of a to f is an equality selection on an attribute

  • If each of the terms in either of the conjuncts has a matching index

    • Use the indexes to find therids

    • Take the union of the rids and retrieve those records

  • If there are indexes on the attributes of a, b, c, and e

    • Use the a, b, and c indexes and take the union of the rids

    • Retrieve the resulting records and apply the other criteria


Selections with disjunctions2

Selections with Disjunctions …

  • Consider the selections shown below

    • (author="King"  age>35)(lName="Tam"  id=11)(Detective)

    • (author="King")  (lName="Tam"  id=11)(Detective)

  • Indexes on the relation

    • Secondary B+ tree index, {lName, fName}

    • Secondary hash index, {author}

  • Compare the two selections

    • In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required

    • In the second selection the index on author can be used, and records that don't meet the other criteria removed


Projections

Projections

SELECT fName, lName

FROM Customer

  • Only columns in the column list are retained

    • Reducing the size of the result relation

  • There are two steps to implementing projections

    • Remove the unwanted columns

    • Eliminate any duplicate records, but only if DISTINCT has been specified

      • Duplicates can be removed either by sorting or hashing

fName,lName(Customer)


External sorting

External Sorting

A Digression


External sorting1

External Sorting

  • Two-way external merge sort

  • External merge sort

  • Replacement sort

  • External merge sort notes


External sorting introduction

External Sorting Introduction

  • It is often necessary to sort data in a database

    • To sort the results of a query

    • To make some other operations more efficient

      • Bulk loading a tree index

      • Eliminating duplicate records

      • Joining relations

  • The focus on a DB sorting algorithm is to reduce the number of disk reads and writes


Internal vs external sorting

Internal vs. External Sorting

  • Sorting a collection of records that fit within main memory can be performed efficiently

    • There are a number of sorting algorithms that can be performed in n(log2n) time

      • That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort,

  • Many DB tables are too large to fit into main memory at one time

    • So cannot be read into main memory and sorted

    • The focus in external sorting is to reduce the number of disk I/Os


Merge sort a brief reminder

Merge Sort – a Brief Reminder

  • Consider the Mergesort algorithm

    • Input sub-arrays are repeatedly halved

    • Until they contain only one element

  • Sub-arrays are then merged into sorted sub-arrays by repeated merge operations

    • merging two sorted sub-arrays can be performed in O(n)

mergesort(arr, start, end)

if(start < end)

mid = start + end / 2

mergesort(arr, start, mid)

mergesort(arr, mid+1, end)

merge(arr, start, mid, mid+1, end)


Na ve external merge sort

Naïve External Merge Sort

  • Read the first two pages (of B pages) of the file

    • Sort their contents (independently of each other)

  • Merge the two pages

    • Write them out to a sorted run of size 2

  • Repeat for the rest of the file

    • Producing B/2 sorted runs of size 2

  • Merge the first two sorted runs of size 2

    • Read in the first page of the first two runs into input pages

    • Merge to a single output page, and write it out when full

    • When all records in an input page have been merged read in the second page of that run


Na ve external merge sort1

Naïve External Merge Sort …

  • After the first merge pass the file consists of B/2 sorted runs each of two pages

  • Read in the first page of each of the first two sorted runs

  • Leaving a third page free as an output buffer

disk

pages contain three records

sorted runs of size 2

main memory


Memory usage

Memory Usage

  • Records from the input pages are merged into the output buffer

  • Once the output buffer is full it's contents are written out to disk, to form the first page of the first sorted run of length 4

disk

main memory


Memory usage1

Memory Usage

  • At this point all of the records from one of the input pages have been dealt with

  • The next page of that sorted run is read into the input page

  • And the process continues

disk

main memory


Cost of na ve merge sort

Cost of Naïve Merge Sort

  • Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long

    • After the second pass there are 2k-2 sorted runs, of length 4

    • After the kth pass there is one sorted run of length B

  • The number of passes is therefore log2B+1

    • log2B is the number of merge passes required

    • The +1 is for the initial pass to sort each page

  • Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1)

    • Note that only 3 frames of main memory are used!


First stage improvement

First Stage Improvement

  • In the first stage of the process each page is read into main memory, sorted and written out

    • The pages are then ready to be merged

  • Instead of sorting each page independently, read M pages into main memory, and sort them together

    • M is the main memory frames available for the sort

    • After the first pass there will be B/M sorted runs, each of length M

    • This will reduce the number of subsequent merge passes


Merge pass improvement

Merge Pass Improvement

  • In the merge passes perform an M-1 way merge rather than a 2 way merge, using

    • M-1 input pages, one for each of M-1 sorted runs and

    • 1 page for an output buffer

  • The first item in each of the M-1 input partitions is compared to each other to determine the smallest

    • Resulting in less merge passes, and less disk I/O

  • Each merge pass merges M-1 runs

    • After the first pass the runs are size (M-1)*M

      • Since M is the size of the runs from the first stage


Cost of external merge sort

Cost of External Merge Sort

  • The initial pass produces B/M sorted runs of size M

  • Each merge pass reduces the number of runs by a factor of M-1

    • The number of merge passes is logM-1B/M 

  • Each pass requires that the entire file is read and written

    • Total cost is therefore 2B (logM-1B/M + 1)

  • As M is typically relatively large this reduction is considerable


Number of passes

Number of Passes

  • B = 1,000,000

Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)


Generating longer initial runs

Generating Longer Initial Runs

  • In the first stage B/M sorted runs of size M are produced

    • The size of these preliminary runs determines how many merge passes are required

  • The size of the initial runs can be increased by using replacement sort

    • On average the size of the initial runs can be increased to 2*M

    • Using replacement sort increases complexity


Replacement sort

Replacement Sort

  • M-2 pages are used to sort the file

    • Called the current set

  • One page is used for input, and

  • One page is used for output

  • The current set is filled

    • … and sorted

disk

current set

input

output

main memory


Replacement sort1

Replacement Sort

  • The next page of the file is read in to the input buffer

  • The smallest record from the current set (and the input buffer) is placed in the output buffer

  • The first location of the current set is now available for the first record from the input buffer

disk

current set

input

output

main memory


Replacement sort2

Replacement Sort

  • The process continues until the output buffer is full and

  • The contents of the input buffer are in the current set

  • At this point another input page can be read into the input buffer

  • And the output buffer can be written to disk as the first page of the first run

disk

current set

input

output

main memory


Replacement sort3

Replacement Sort

  • The current set can be periodically re-sorted

  • At some point a page is read that contains records with values less than the values in records in the output buffer

  • The rest of the current set must now be written out to complete the first run

  • The process then begins again

disk

current set

input

output

main memory


Revisiting i o costs

Revisiting I/O Costs

  • The cost metric used is simplistic

  • In practice it my be advantageous to make the input and output buffers larger than one page

    • This reduces the number of runs that can be merged at one time, so may increase the number of passes required

    • But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page

  • We have also ignored CPU costs

    • If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory


Note b trees and sorting

Note: B+ Trees and Sorting

  • Primary B+ tree index

    • The index can be used to find the first page, but

    • Note that the file is already sorted!

  • Secondary B+ tree index

    • Leaves point to data records that are not in sort order

      • In the worst case, each data entry could point to a different page from its adjacent entries

    • Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record!

    • In practice external sort is likely to be much more efficient than using a secondary index


Projections1

Projections


Projections and sorting

Projections and Sorting

  • Projection with duplicate removal can be performed in a number of steps

    • Scan the table, remove unwanted attributes, and write it back

      • cost = 2B disk I/Os

    • Sort the result, using all of its attributes as a compound sort key

      • cost = 4B, possibly more if the file is very large

    • Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B

      • The result of this last stage may not have to be written out; it may be the last operation or may be pipelined into another operations

  • The total cost is 8B disk I/Os, but this basic process can be improved by combining these steps


Improved sort projection

Improved Sort Projection

  • The initial scan is performed as follows

    • Read M pages and remove unwanted attributes

    • Sort the records, and remove any duplicates

    • Write the sorted run to disk

    • Repeat for the rest of the file, for a total cost of 2B

  • Perform merge passes as required on the output from the first stage

    • Remove any duplicates as they are encountered

    • If only one merge pass is required the cost is ≈ 1B

  • For a total cost of 3B


Projections and hashing

Projections and Hashing

  • Duplicates can also be identified by using hashing

  • Duplicate removal by hashing has two stages

    • Partitioning and probing

  • In the partitioning stage

    • Partition into M-1 partitions using a hash function, h

      • There is an output buffer for each partition, and

      • One input buffer

    • The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer

  • Duplicates are eliminated in the next stage


Projections and hashing1

Projections and Hashing …

  • The duplicate elimination stage uses a second hash function h2 (h2  h) to reduce main memory costs

    • An in-memory hash table is built using h2

    • If two records hash to the same location they are checked to see if they are duplicates

  • Assuming that each partition produced in the partitioning stage can fit in main memory the cost is

    • Partitioning stage: 2B

    • Duplicate elimination stage: B, for a total cost of 3B

  • This is the same cost as projection using sorting


Sort hash projection comparison

Sort, Hash Projection Comparison

  • Sort and hash projection have the same cost (3B)

  • If M > (B) sorting and sort projection can be performed in two passes

    • The first pass produces B/M sorted runs

    • If there are less than M-1 of them only one merge pass is required

  • With hash projection, if just one partition is greater than M-1, further partitioning is required

    • Regardless of the overall size of the file


Binary operations

Binary Operations


Joins

Customer Account

Joins

SELECT *

FROM Customer NATURAL INNER JOIN Account

  • A join is defined as a Cartesian product followed by a selection

    • Joins are used more often than Cartesian products

  • Cartesian products typically result in much larger tables than joins

    • It is important to be able to efficiently implement joins

C.sin = A.sin(Customer  Account)


Cost of joins

Cost of Joins

  • The naïve method of producing a join is to perform a Cartesian product then a selection

    • This is very inefficient if the intermediate relation is large

  • For example, consider two relations R and S

    • R has 10,000 records, and S has 2,000

    • S has a foreign key in R

      • Implying that a record in S relates to at most one record in R

  • The sizes of the join and the Cartesian product are

    • Natural join – 2,000 (if each s in S relates to an r in R)

    • Cartesian product – 20,000,000 records


Join methods

Join Methods

  • Nested loop joins

    • These algorithms basically compute the Cartesian product and discard records that do not meet the join condition

  • Index nested loop join

    • Uses an index on one of the tables

  • Sort-merge join

    • Sorts both tables on the join attribute and merges results

  • Hash join

    • Similar to the sort-merge join but uses hashing

    • We will also consider the hybrid hash join algorithm


Tuple nested loop join

Tuple Nested Loop Join

  • This simple algorithm compares each record in the outer table with each record in the inner relation

    • Start with one record, r, in relation R

    • Compare r to all the records, s, in relation S and see if the join condition matches

      • If so concatenate the records and add to the result

    • Repeat with all r in R

  • for each record r  R

  • for each record s  S

    • if ri = sj then

    • add r,s to result

R R.i=S.jS


Basic nested loop join

Basic Nested Loop Join

output buffer

records

main memory frames

scan all of S for each record in R

S (disk)

R (disk)


Nested loop join cost

Nested Loop Join Cost

  • The inner relation (S) has to be scanned once for each record in the outer relation (R)

    • For a cost of B(R) + (T(R) * B(S) )

    • The results are in the same order as R

  • A simple improvement is to process one blockof R at a time, instead of one record at a time

    • This reduces the cost to B(R) + (B(R) * B(S) )

    • If B(R) and B(S) are roughly the same size this is O(n2)

      • Where n = B(R)

    • Note that the result is no longer strictly ordered on R

  • The smaller relation should be the outer relation


Improved nested loop join

Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each page in R

S (disk)

R (disk)


Block nested loop join

Block Nested Loop Join

  • The nested loop join algorithm does not make effective use of main memory

    • Only one page of the outer relation is processed at a time

  • The algorithm can be significantly improved by reading as much of the outer relation as possible

    • Use M – 2 pages as an input buffer for the outer relation,

    • 1 page as an input buffer for the inner relation, and

    • 1 page as an output buffer

    • If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)!

      • CPU costs can be reduced by building an in-memory hash table on the inner relation


Block nested loop join1

Block Nested Loop Join …

  • What if the smaller relation is larger than M-2?

    • Break R, the outer relation, into blocks of M – 2 pages

      • Refer (somewhat flippantly) to these blocks as clumps

    • Scan S once for each clump of R

      • Insert concatenated records r, s that match the join condition into the output buffer

      • S is read B(R)/(M-2) times

    • The total cost is B(R) + (B(R)/(M-2) * B(S))

  • Disk seek time can be reduced by increasing the size of the input buffer for S

    • Which may increase the number of times that S is scanned


Improved nested loop join1

Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each M-2 pages in R

M-2 pages

S (disk)

R (disk)


Index nested loop join

Index Nested Loop Join

  • If there is an index on the join attribute

    • Make the indexed relation the inner relation (S)

    • Read the outer relation, retrieving matching records of S using the index

      • Does not first calculate the Cartesian product as it only retrieves records that satisfy the join condition

  • The cost is dependent on the kind of index and the number of matching records

    • An index lookup is required for each record in R

      • B+ tree index: 2 – 4, hash index: 1 – 2

    • If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary


Index nested loop join1

Index Nested Loop Join

output buffer

records

main memory frames

read page with matching s

look up index using join attribute of r

S (disk)

R (disk)


Sort merge join

Sort-Merge Join

  • Assume that both tables to be joined are sorted on the join attribute

    • The tables may be joined with one pass, in a similar way to merging two sorted runs

    • At a cost of B(R) + B(S)

  • If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort

    • The sort can typically be performed in two passes

      • At a cost of 4 B(R) + 4 B(S), including writing out the results

    • Joining the tables has an additional cost ofB(R) + B(S)


Sort merge join process

Sort-Merge Join Process

  • Read in pages of R and S

    • Assume that the join attribute is x

  • While xris different from xs

    • If xr is less than xs move to the next record from R, else

    • Move to the next record from S

  • If xrequals xs

    • Create the joined record by concatenating r and s, and

    • Add the resulting record to the output buffer

  • Repeat the process until all records have been read


Improved sort merge join

Improved Sort-Merge Join

  • The merge phase of the external sort algorithm can be combined with the join

    • Produce sorted runs (of size M) of R and S

    • Merge R, and S, then compare the results, discarding records that do not meet the join condition requires

      • A buffer for each sorted run of R

      • A buffer for each sorted run of S,

      • One buffer for merged runs of R,

      • One buffer for the merged runs of S, and

      • One buffer for the concatenated results

  • The improvement requires more main memory as there must be buffers for each run of both R and S


Sort merge join1

Sort-Merge Join

Read in sorted runs of R andS

Merge R

and merge S

Merge and concatenate records


Memory requirements

Memory Requirements

  • Sort-merge join can be performed in two passes

    • For a cost of 3(B(R) + B(S))

  • This assumes that main memory is large enough to allow an input buffer for each run of bothR and S

    • The initial pass produces sorted runs of size M

    • Main memory must be greater than (B(R) + B(S)) to perform the join in two passes

      • The first pass produces B(R) /M runs of R and B(S) /M runs of S

      • If M is not greater than (B(R) + B(S) ) then (B(R) /M + B(S) /M ) must be greater than M


Sort merge join and indexes

Sort-Merge Join and Indexes

  • The sort-merge join can take advantage of a primary index

    • The relation with the index is already sorted so no initial sort pass is required

  • If both relations have a primary index on the join attribute a zig-zag join can be performed

    • Scan the leaves of the two B+ trees in order from the left

    • When the search key value of one index is higher, scan the other index

    • Only when both indexes contain the same search key values are the records retrieved and concatenated


Hash join

Hash Join

  • The hash join algorithm has two phases

    • Partitioning, and

    • Probing

  • Partitioning

    • Both relations are partitioned using the same hash function, h, on the join attribute

      • Records in one partition of R can only match to records in the same partition of S

    • One input buffer and M- 1 output buffer pages are used to make M - 1 partitions for each relation

    • If both of the largest partitions of both relations do not fit in main memory, the relations must be further partitioned


Hash join continued

Hash Join continued

  • Probing

    • Read in one partition of R

    • To reduce CPU costs, build an in memory hash table using hash function h2 (h2  h)

    • Read the corresponding partition of S into an input buffer and join matching records using the hash table

    • Repeat for each partition of R

  • Cost

    • If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))


Memory requirements1

Memory Requirements

  • Hash join is performed in two phases

    • At a cost of 3(B(R) + B(S))

  • Main memory must be big enough to hold the largest partition of the smaller relation, R

    • Which requires that M - 2 is  (B(R))

      • A buffer for S and for output are needed

      • Partitioning produces B(R) - 1 partitions, of average size M /(B(R) - 1)

    • If M < (B(R)) then B(R)/ M must be larger than M, and the partitions are larger than main memory


Hybrid hash join

Hybrid Hash Join

  • If main memory is large enough, retain an entire partition of R during the partitioning phase

    • This eliminates the need to write out the partition, and read it back in during the probing phase

    • In addition, the matching S records can be joined (and written out to the result) when S is partitioned

    • Hence the records (of both R and S) belonging to that partition are only read once

  • This approach can be generalized to retain more than one partition where possible


Hybrid hash join and block nested loop join

Hybrid Hash Join and Block Nested Loop Join

  • If the smaller relation fits in main memory the costs are identical

    • The smaller relation is read once

    • The larger relation is scanned once to join the records

  • If the relations are large hash join is more efficient

    • Block nested loop join must read S once for each clump of R

    • Hash join matches partitions of R to partitions of S during the probing phase

    • The records of both R and Sbelonging to a particular partition are only read once, after the partitioning phase


Hash join and sort merge join

Hash Join and Sort-Merge Join

  • A skewed distribution of data may result in the relations being partitioned more than once

    • Because the partitions may vary considerably in size

    • Sort-merge join is not sensitive to data skew

  • If M falls between (B(R))and (B(S)) hash join requires less passes

    • Hash join only requires enough memory to hold partitions of one relation

    • The memory requirements of sort-merge join depend on the size of both relations


Join method ordering

Join Method Ordering

  • Simple nested loop join (read S for each record)

    • Retains the original order of R

  • Index nested loop join

    • Retains the original order of R

  • Sort-Merge join

    • Ordered by the join attribute

  • All other join methods

    • No order


General join conditions

General Join Conditions

  • The join process is more complex if the join condition is not simple equality on one attribute

  • For equalities over several attributes

    • Sort-merge and hash join must sort (or hash) over all of the attributes in the selection

    • An index that matches one of the equalities may be used for the index nested loop join

  • For inequalities (, , etc.)

    • Hash indexes cannot be used for index nested loop joins

    • Sort-merge and hash joins are not possible

    • Other join algorithms are unaffected


Set operations

Set Operations

SELECT fName, lName

FROM Patient

INTERSECT

SELECT fName, lName

FROM Doctor

  • For set operations, unlike other SQL operations, duplicates are removed by default

    • Unless ALL is specified

    • Therefore most set operations entail sorting the relations

fName,lName(Patient)  fName,lName(Doctor)


Intersection and cartesian product

Intersection and Cartesian Product

  • Intersection R S

    • A join where the condition is equality on all attributes

    • An in-memory search structure should be used

  • Cartesian product R S

    • A special case of join where there is no join condition

    • All records are joined to each other

fName,lName(Patient)  fName,lName(Doctor)

is equivalent to

fName,lName(Patient) fName,lName(Doctor)


Union

Union

  • Union using sorting

    • Sort R and S using all fields

    • Scan and merge the results while removing duplicates

  • Union using hashing

    • Partition R and S using a hash function h

    • For each partition of S (if S is the smaller relation)

      • Build an in-memory hash table (using h2)

      • Scan the corresponding partition of R

      • For each record, r, probe the hash table, if a duplicate record already exists discard it, otherwise add it to the table

      • Write out the hash table and process the next partition


Set difference

Set Difference

  • Set Difference is implemented in a similar way to union

  • In sorting, in the merge phase check that records of R are not in S before adding them to the result

  • In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result

  • If the query contains ALL a count of duplicates should be maintained

    • Instead of storing duplicate records


Aggregations

Aggregations

SELECT AVG(age)

FROM Patient

  • The basic method for calculating an aggregation is to scan the entire table

  • While keeping track of running information

    • SUM – totalof the retrieved values

    • AVG – total, and count of the retrieved values

    • COUNT – count of the retrieved values

    • MIN – smallest of the retrieved values

    • MAX – largest of the retrieved values


Grouped aggregations

Grouped Aggregations

SELECT AVG(income)

FROM Doctor

GROUP BY speciality

  • An aggregation can include a GROUP BY clause

    • In which case the aggregation has to be computed for each group

  • Aggregations with groupings can be computed in a number of ways

    • By sorting or hashing, or

    • By using an index


Sorting and groups

Sorting and Groups

  • In the sorting based approach to the table is sorted on the grouping attribute

  • The results of the sort are scanned and the result of the aggregate operation computed

    • These two processes can be combined in a similar way to the sort based projection and join algorithms

  • The cost is essentially the cost of the sorting operations

    • Probably 3B(R)

    • Although the final result will be considerably smaller than the sorted table


Hashing and groups

Hashing and Groups

  • In the hash based approach an in-memory hash table is build on the grouping attribute

    • Hash table entries consist of

      • grouping-value, running-information

  • The table is scanned, and for each record

    • Probe the hash table to find the entry for the group that the record belongs to, and

    • Update the running information for that group

  • Once the table has been scanned the grouped results are computed using the hash table entries

    • If the hash table fits in main memory the cost is B(R)


Aggregations and indexes

Aggregations and Indexes

  • It may be possible to satisfy an aggregate query using just the data entries of an index

    • The search key must include all of the attributes required for the query

      • The data entries may be sorted or hashed, and

      • No access to the records is required

    • If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order

      • The actual records may also be retrieved in this order

  • This is an example of an index-only plan


Summary

Summary

Memory Requirements


One pass and simple algorithms

One-Pass and Simple Algorithms

* = duplicate removal,  = grouping


Sort based algorithms

Sort-Based Algorithms


Hash based algorithms

Hash-Based Algorithms

Assume B(S)  B(R), and that B(S)  M


  • Login