Query optimization
1 / 95

CMPT 454 - PowerPoint PPT Presentation

  • Uploaded on

Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CMPT 454' - alijah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Query optimization1
Query Optimization

  • Introduction

  • Query Execution

  • Unary Operations

  • External Sorting

  • Binary Operations

Query compilation steps
Query Compilation Steps

  • Parsing

    • Construct a parse tree for a query

  • Generate equivalent logical query plans

    • Convert the parse tree to a query plan in relational algebra

    • Transform the plan into more efficient equivalents

  • Generate a physical plan

    • Select algorithms for each of the operators in the query

    • Including details about how tables are to be accessed or sorted

Query optimization2
Query Optimization

  • Generating equivalent logical plans and their physical plans is known as query optimization

  • Selecting a good query plan entails deciding

    • Which equivalent plan leads to the most efficient query

    • Which algorithm should be used to implement an operation

    • How data from one operation should be passed to the next

  • These choices depend on database metadata

    • Size of relations

    • Number and frequency of attributes

    • Indexing and data file organization

Query processing
Query Processing


Query Compilation

generating, improving and estimating query plans

Query Execution

algorithms to implement physical plan operators



  • A physical query plan is made up of operators

    • Each implementing one step of the plan and

    • Corresponding to a relational algebra operation

  • Some relational algebra operations may require lower level physical operations

    • Such as scanning a table

Sorting and scanning
Sorting and Scanning

  • It is sometimes necessary or useful to sort data as it is scanned

    • To satisfy a query with an ORDER BY clause

    • Or because an algorithm requires sorted input

  • There are a number of ways in which a sort scan can be performed

    • Main memory sorting

    • B tree index

    • Multi-way mergesort


  • There are M main memory frames available for an operation

    • M may or may not be the entire main memory

  • Information relating to a relation (table) R

    • B(R) or B – number of blocks of R

    • T(R) or T – number of tuples (i.e. records) of R

    • V(R, a) – number of distinct values for attribute a in R

  • Assume that data is accessed one block at a time from a disk

    • This assumption will be discussed later

Computation model
Computation Model

  • Cost metric

    • The number of disk I/Os required for an operation

  • Assume that

    • Input to an operation is to be read from disk

    • The result is left in main memory (does not include write cost)

  • Note that the size of a result should not influence the choice of algorithm for a particular operation

    • Although it may influence the order of operations

I o cost for scans
I/O Cost for Scans

  • The cost to scan R depends on its organization

    • If R is organized into a single file the cost of a scan is B

    • And if B < M the cost of a sort scan is B

    • If R is distributed with records from other tables the cost of a scan may be higher

  • The use of an index does not reduce the cost to scan R

    • But may reduce the cost to retrieve a part of R

Unary operations
Unary Operations


  • In this section the costs of individual relational algebra operations are considered

    • There are often a number of algorithms that can be used to implement the same operation

  • Each operation is considered in isolation

    • How the individual operations interact is discussed in a later section

    • Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of

      • The order of the records in the result

      • The algorithm used to achieve that result



FROM Customer

WHERE age > 50

  • Only records that match the selection are included in the result

    • The size of the result depends on the number of records that match the selection

  • Complex selections that contain more than a single term will be considered later

age > 50(Customer)

Selections with no index
Selections with No Index

  • If the file is not sorted on the selection attribute

    • Perform a file scan, cost is B reads

    • If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2)

  • The file is sorted on the selection attribute

    • Perform a binary search to find the first record that matches the selection criteria

    • Then read all records that match the selection criteria

      • Which may entail reading additional pages

    • log2(B) + the number of pages of matching records - 1

Selections with b tree index
Selections with B+ Tree Index

  • Find the leaf page of the first index entry that points to a matching record

    • The number of disk reads is equal to the height of the tree

    • The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads

  • Then retrieve the matching records

    • This cost is determined by the number of leaf pages containing matching entries, and

    • The number of records to be retrieved, and

    • Whether the index is primary or secondary

Primary b tree index
Primary B+ Tree Index

  • In a primary index the file is sorted on the search key

  • If the selection is small it may fit on one page

    • An additional disk read to access this page is required

  • If the selection is large, additional pages may have to be read

    • The selection may cover a large range of records, or

    • May be an equality selection on an attribute with few distinct values

    • The exact number of pages depends on the number of records that match the selection

Secondary b tree index
Secondary B+ Tree Index

  • In a secondary index the sort order of the file (if any) is different from the search key of the index

    • The leaves of the tree form a dense index on the file

  • Adjacent matching data entries are unlikely to point to records on the same page

    • A disk read is required for each matching record

  • For large selections, a secondary index may be less efficient than scanning the entire table!

    • In practice it is important to recognize this, by estimating the number of records returned by a selection

Secondary b tree range selection
Secondary B+ Tree Range Selection


FROM Patient

WHERE lName = 'Zoobat'

lName = "Zoobat"(Patient)

4 patients called Zoobat


Selections with hash index
Selections with Hash Index

  • A hash index can be used to efficiently retrieve equality selections

    • The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads

    • The cost to retrieve the corresponding records depends on whether the index is primary or secondary

      • If the index is primary, matching records are likely to reside on the same, single, file page whereas

      • If the index is secondary, matching records are likely to reside on different pages of the file

  • Hash indexes cannot be used for range selections

Complex selections
Complex Selections

  • Selections may be made up of a number of terms connected by and () and or ()

    • These terms may reference different attributes

    • It is useful to know whether or not an index can be used to evaluate a selection

  • To determine this the selection is first expressed in conjunctive normal form (CNF)

    • A collection of conjuncts (terms joined by and)

    • Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions)

      • e.g. (A  B)  C  D  (A  C  D)  (B  C  D)

Selections with no disjunctions
Selections with no Disjunctions

  • Hash indexes can be used if there is an equality condition for every attribute in the search key

    • e.g. a single hash index on {city, street, number}

      • city="London"street="Baker"number=221(Detective)matches

      • city="Los Angeles"street="Cahuenga"(Detective)does not

  • Tree indexes can be used if there is a selection on each of the first n attributes of the search key

    • e.g. B+ index on {city, street, number}

      • Bothcity="London"street="Baker"number=221(Detective) and

      • city="Los Angeles"street="Cahuenga"(Detective) match

Selections with no disjunctions1
Selections with no Disjunctions…

  • If an index matches a subset of the conjuncts

    • Use the index to return a result that contains some unwanted records

    • Scan this result for matches to the other conjuncts

      • city="London"street="Baker"number=221fName = "Sherlock" (Detective)

  • If more than one index matches a conjunct

    • Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria

    • Or use all indexes and retrieve therids

    • Take the intersection of the rids and retrieve those records

Selections with no disjunctions2
Selectionswith no Disjunctions…

  • Consider the relation and selection shown below

    • Detective = {id, fName, lName, age, city, street, number, author}

    • city="New York"author="Spillane"lName="Hammer"(Detective)

  • With indexes

    • Secondary hash index, {city, street, number}

    • Secondary B+ tree index, {lName, fName}

    • Secondary hash index, {author}

  • There are two strategies:

    • Use the most selective of the two matching indexes, and search the results for the remaining criteria

    • Use both indexes, take the intersection of the rid

can’t be used

can be used

can be used

What if the B+ tree index is primary?

Selections with disjunctions
Selections with Disjunctions

  • If necessary, restate the selection in CNF

  • If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct

    • If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection

  • Otherwise, use an index that matches one of the conjuncts and retrieve those records

    • The remaining selection criteria can then be applied to the results from using that index

Selections with disjunctions1
Selections with Disjunctions …

  • Consider a selection of this form

    • (a  b  c)  (d  e  f)(R)

    • Where each of a to f is an equality selection on an attribute

  • If each of the terms in either of the conjuncts has a matching index

    • Use the indexes to find therids

    • Take the union of the rids and retrieve those records

  • If there are indexes on the attributes of a, b, c, and e

    • Use the a, b, and c indexes and take the union of the rids

    • Retrieve the resulting records and apply the other criteria

Selections with disjunctions2
Selections with Disjunctions …

  • Consider the selections shown below

    • (author="King"  age>35)(lName="Tam"  id=11)(Detective)

    • (author="King")  (lName="Tam"  id=11)(Detective)

  • Indexes on the relation

    • Secondary B+ tree index, {lName, fName}

    • Secondary hash index, {author}

  • Compare the two selections

    • In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required

    • In the second selection the index on author can be used, and records that don't meet the other criteria removed


SELECT fName, lName

FROM Customer

  • Only columns in the column list are retained

    • Reducing the size of the result relation

  • There are two steps to implementing projections

    • Remove the unwanted columns

    • Eliminate any duplicate records, but only if DISTINCT has been specified

      • Duplicates can be removed either by sorting or hashing


External sorting
External Sorting

A Digression

External sorting1
External Sorting

  • Two-way external merge sort

  • External merge sort

  • Replacement sort

  • External merge sort notes

External sorting introduction
External Sorting Introduction

  • It is often necessary to sort data in a database

    • To sort the results of a query

    • To make some other operations more efficient

      • Bulk loading a tree index

      • Eliminating duplicate records

      • Joining relations

  • The focus on a DB sorting algorithm is to reduce the number of disk reads and writes

Internal vs external sorting
Internal vs. External Sorting

  • Sorting a collection of records that fit within main memory can be performed efficiently

    • There are a number of sorting algorithms that can be performed in n(log2n) time

      • That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort,

  • Many DB tables are too large to fit into main memory at one time

    • So cannot be read into main memory and sorted

    • The focus in external sorting is to reduce the number of disk I/Os

Merge sort a brief reminder
Merge Sort – a Brief Reminder

  • Consider the Mergesort algorithm

    • Input sub-arrays are repeatedly halved

    • Until they contain only one element

  • Sub-arrays are then merged into sorted sub-arrays by repeated merge operations

    • merging two sorted sub-arrays can be performed in O(n)

mergesort(arr, start, end)

if(start < end)

mid = start + end / 2

mergesort(arr, start, mid)

mergesort(arr, mid+1, end)

merge(arr, start, mid, mid+1, end)

Na ve external merge sort
Naïve External Merge Sort

  • Read the first two pages (of B pages) of the file

    • Sort their contents (independently of each other)

  • Merge the two pages

    • Write them out to a sorted run of size 2

  • Repeat for the rest of the file

    • Producing B/2 sorted runs of size 2

  • Merge the first two sorted runs of size 2

    • Read in the first page of the first two runs into input pages

    • Merge to a single output page, and write it out when full

    • When all records in an input page have been merged read in the second page of that run

Na ve external merge sort1
Naïve External Merge Sort …

  • After the first merge pass the file consists of B/2 sorted runs each of two pages

  • Read in the first page of each of the first two sorted runs

  • Leaving a third page free as an output buffer


pages contain three records

sorted runs of size 2

main memory

Memory usage
Memory Usage

  • Records from the input pages are merged into the output buffer

  • Once the output buffer is full it's contents are written out to disk, to form the first page of the first sorted run of length 4


main memory

Memory usage1
Memory Usage

  • At this point all of the records from one of the input pages have been dealt with

  • The next page of that sorted run is read into the input page

  • And the process continues


main memory

Cost of na ve merge sort
Cost of Naïve Merge Sort

  • Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long

    • After the second pass there are 2k-2 sorted runs, of length 4

    • After the kth pass there is one sorted run of length B

  • The number of passes is therefore log2B+1

    • log2B is the number of merge passes required

    • The +1 is for the initial pass to sort each page

  • Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1)

    • Note that only 3 frames of main memory are used!

First stage improvement
First Stage Improvement

  • In the first stage of the process each page is read into main memory, sorted and written out

    • The pages are then ready to be merged

  • Instead of sorting each page independently, read M pages into main memory, and sort them together

    • M is the main memory frames available for the sort

    • After the first pass there will be B/M sorted runs, each of length M

    • This will reduce the number of subsequent merge passes

Merge pass improvement
Merge Pass Improvement

  • In the merge passes perform an M-1 way merge rather than a 2 way merge, using

    • M-1 input pages, one for each of M-1 sorted runs and

    • 1 page for an output buffer

  • The first item in each of the M-1 input partitions is compared to each other to determine the smallest

    • Resulting in less merge passes, and less disk I/O

  • Each merge pass merges M-1 runs

    • After the first pass the runs are size (M-1)*M

      • Since M is the size of the runs from the first stage

Cost of external merge sort
Cost of External Merge Sort

  • The initial pass produces B/M sorted runs of size M

  • Each merge pass reduces the number of runs by a factor of M-1

    • The number of merge passes is logM-1B/M 

  • Each pass requires that the entire file is read and written

    • Total cost is therefore 2B (logM-1B/M + 1)

  • As M is typically relatively large this reduction is considerable

Number of passes
Number of Passes

  • B = 1,000,000

Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)

Generating longer initial runs
Generating Longer Initial Runs

  • In the first stage B/M sorted runs of size M are produced

    • The size of these preliminary runs determines how many merge passes are required

  • The size of the initial runs can be increased by using replacement sort

    • On average the size of the initial runs can be increased to 2*M

    • Using replacement sort increases complexity

Replacement sort
Replacement Sort

  • M-2 pages are used to sort the file

    • Called the current set

  • One page is used for input, and

  • One page is used for output

  • The current set is filled

    • … and sorted


current set



main memory

Replacement sort1
Replacement Sort

  • The next page of the file is read in to the input buffer

  • The smallest record from the current set (and the input buffer) is placed in the output buffer

  • The first location of the current set is now available for the first record from the input buffer


current set



main memory

Replacement sort2
Replacement Sort

  • The process continues until the output buffer is full and

  • The contents of the input buffer are in the current set

  • At this point another input page can be read into the input buffer

  • And the output buffer can be written to disk as the first page of the first run


current set



main memory

Replacement sort3
Replacement Sort

  • The current set can be periodically re-sorted

  • At some point a page is read that contains records with values less than the values in records in the output buffer

  • The rest of the current set must now be written out to complete the first run

  • The process then begins again


current set



main memory

Revisiting i o costs
Revisiting I/O Costs

  • The cost metric used is simplistic

  • In practice it my be advantageous to make the input and output buffers larger than one page

    • This reduces the number of runs that can be merged at one time, so may increase the number of passes required

    • But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page

  • We have also ignored CPU costs

    • If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory

Note b trees and sorting
Note: B+ Trees and Sorting

  • Primary B+ tree index

    • The index can be used to find the first page, but

    • Note that the file is already sorted!

  • Secondary B+ tree index

    • Leaves point to data records that are not in sort order

      • In the worst case, each data entry could point to a different page from its adjacent entries

    • Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record!

    • In practice external sort is likely to be much more efficient than using a secondary index

Projections and sorting
Projections and Sorting

  • Projection with duplicate removal can be performed in a number of steps

    • Scan the table, remove unwanted attributes, and write it back

      • cost = 2B disk I/Os

    • Sort the result, using all of its attributes as a compound sort key

      • cost = 4B, possibly more if the file is very large

    • Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B

      • The result of this last stage may not have to be written out; it may be the last operation or may be pipelined into another operations

  • The total cost is 8B disk I/Os, but this basic process can be improved by combining these steps

Improved sort projection
Improved Sort Projection

  • The initial scan is performed as follows

    • Read M pages and remove unwanted attributes

    • Sort the records, and remove any duplicates

    • Write the sorted run to disk

    • Repeat for the rest of the file, for a total cost of 2B

  • Perform merge passes as required on the output from the first stage

    • Remove any duplicates as they are encountered

    • If only one merge pass is required the cost is ≈ 1B

  • For a total cost of 3B

Projections and hashing
Projections and Hashing

  • Duplicates can also be identified by using hashing

  • Duplicate removal by hashing has two stages

    • Partitioning and probing

  • In the partitioning stage

    • Partition into M-1 partitions using a hash function, h

      • There is an output buffer for each partition, and

      • One input buffer

    • The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer

  • Duplicates are eliminated in the next stage

Projections and hashing1
Projections and Hashing …

  • The duplicate elimination stage uses a second hash function h2 (h2  h) to reduce main memory costs

    • An in-memory hash table is built using h2

    • If two records hash to the same location they are checked to see if they are duplicates

  • Assuming that each partition produced in the partitioning stage can fit in main memory the cost is

    • Partitioning stage: 2B

    • Duplicate elimination stage: B, for a total cost of 3B

  • This is the same cost as projection using sorting

Sort hash projection comparison
Sort, Hash Projection Comparison

  • Sort and hash projection have the same cost (3B)

  • If M > (B) sorting and sort projection can be performed in two passes

    • The first pass produces B/M sorted runs

    • If there are less than M-1 of them only one merge pass is required

  • With hash projection, if just one partition is greater than M-1, further partitioning is required

    • Regardless of the overall size of the file


Customer Account




  • A join is defined as a Cartesian product followed by a selection

    • Joins are used more often than Cartesian products

  • Cartesian products typically result in much larger tables than joins

    • It is important to be able to efficiently implement joins

C.sin = A.sin(Customer  Account)

Cost of joins
Cost of Joins

  • The naïve method of producing a join is to perform a Cartesian product then a selection

    • This is very inefficient if the intermediate relation is large

  • For example, consider two relations R and S

    • R has 10,000 records, and S has 2,000

    • S has a foreign key in R

      • Implying that a record in S relates to at most one record in R

  • The sizes of the join and the Cartesian product are

    • Natural join – 2,000 (if each s in S relates to an r in R)

    • Cartesian product – 20,000,000 records

Join methods
Join Methods

  • Nested loop joins

    • These algorithms basically compute the Cartesian product and discard records that do not meet the join condition

  • Index nested loop join

    • Uses an index on one of the tables

  • Sort-merge join

    • Sorts both tables on the join attribute and merges results

  • Hash join

    • Similar to the sort-merge join but uses hashing

    • We will also consider the hybrid hash join algorithm

Tuple nested loop join
Tuple Nested Loop Join

  • This simple algorithm compares each record in the outer table with each record in the inner relation

    • Start with one record, r, in relation R

    • Compare r to all the records, s, in relation S and see if the join condition matches

      • If so concatenate the records and add to the result

    • Repeat with all r in R

  • for each record r  R

  • for each record s  S

    • if ri = sj then

    • add r,s to result

R R.i=S.jS

Basic nested loop join
Basic Nested Loop Join

output buffer


main memory frames

scan all of S for each record in R

S (disk)

R (disk)

Nested loop join cost
Nested Loop Join Cost

  • The inner relation (S) has to be scanned once for each record in the outer relation (R)

    • For a cost of B(R) + (T(R) * B(S) )

    • The results are in the same order as R

  • A simple improvement is to process one blockof R at a time, instead of one record at a time

    • This reduces the cost to B(R) + (B(R) * B(S) )

    • If B(R) and B(S) are roughly the same size this is O(n2)

      • Where n = B(R)

    • Note that the result is no longer strictly ordered on R

  • The smaller relation should be the outer relation

Improved nested loop join
Improved Nested Loop Join

output buffer


main memory frames

scan all of S for each page in R

S (disk)

R (disk)

Block nested loop join
Block Nested Loop Join

  • The nested loop join algorithm does not make effective use of main memory

    • Only one page of the outer relation is processed at a time

  • The algorithm can be significantly improved by reading as much of the outer relation as possible

    • Use M – 2 pages as an input buffer for the outer relation,

    • 1 page as an input buffer for the inner relation, and

    • 1 page as an output buffer

    • If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)!

      • CPU costs can be reduced by building an in-memory hash table on the inner relation

Block nested loop join1
Block Nested Loop Join …

  • What if the smaller relation is larger than M-2?

    • Break R, the outer relation, into blocks of M – 2 pages

      • Refer (somewhat flippantly) to these blocks as clumps

    • Scan S once for each clump of R

      • Insert concatenated records r, s that match the join condition into the output buffer

      • S is read B(R)/(M-2) times

    • The total cost is B(R) + (B(R)/(M-2) * B(S))

  • Disk seek time can be reduced by increasing the size of the input buffer for S

    • Which may increase the number of times that S is scanned

Improved nested loop join1
Improved Nested Loop Join

output buffer


main memory frames

scan all of S for each M-2 pages in R

M-2 pages

S (disk)

R (disk)

Index nested loop join
Index Nested Loop Join

  • If there is an index on the join attribute

    • Make the indexed relation the inner relation (S)

    • Read the outer relation, retrieving matching records of S using the index

      • Does not first calculate the Cartesian product as it only retrieves records that satisfy the join condition

  • The cost is dependent on the kind of index and the number of matching records

    • An index lookup is required for each record in R

      • B+ tree index: 2 – 4, hash index: 1 – 2

    • If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary

Index nested loop join1
Index Nested Loop Join

output buffer


main memory frames

read page with matching s

look up index using join attribute of r

S (disk)

R (disk)

Sort merge join
Sort-Merge Join

  • Assume that both tables to be joined are sorted on the join attribute

    • The tables may be joined with one pass, in a similar way to merging two sorted runs

    • At a cost of B(R) + B(S)

  • If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort

    • The sort can typically be performed in two passes

      • At a cost of 4 B(R) + 4 B(S), including writing out the results

    • Joining the tables has an additional cost ofB(R) + B(S)

Sort merge join process
Sort-Merge Join Process

  • Read in pages of R and S

    • Assume that the join attribute is x

  • While xris different from xs

    • If xr is less than xs move to the next record from R, else

    • Move to the next record from S

  • If xrequals xs

    • Create the joined record by concatenating r and s, and

    • Add the resulting record to the output buffer

  • Repeat the process until all records have been read

Improved sort merge join
Improved Sort-Merge Join

  • The merge phase of the external sort algorithm can be combined with the join

    • Produce sorted runs (of size M) of R and S

    • Merge R, and S, then compare the results, discarding records that do not meet the join condition requires

      • A buffer for each sorted run of R

      • A buffer for each sorted run of S,

      • One buffer for merged runs of R,

      • One buffer for the merged runs of S, and

      • One buffer for the concatenated results

  • The improvement requires more main memory as there must be buffers for each run of both R and S

Sort merge join1
Sort-Merge Join

Read in sorted runs of R andS

Merge R

and merge S

Merge and concatenate records

Memory requirements
Memory Requirements

  • Sort-merge join can be performed in two passes

    • For a cost of 3(B(R) + B(S))

  • This assumes that main memory is large enough to allow an input buffer for each run of bothR and S

    • The initial pass produces sorted runs of size M

    • Main memory must be greater than (B(R) + B(S)) to perform the join in two passes

      • The first pass produces B(R) /M runs of R and B(S) /M runs of S

      • If M is not greater than (B(R) + B(S) ) then (B(R) /M + B(S) /M ) must be greater than M

Sort merge join and indexes
Sort-Merge Join and Indexes

  • The sort-merge join can take advantage of a primary index

    • The relation with the index is already sorted so no initial sort pass is required

  • If both relations have a primary index on the join attribute a zig-zag join can be performed

    • Scan the leaves of the two B+ trees in order from the left

    • When the search key value of one index is higher, scan the other index

    • Only when both indexes contain the same search key values are the records retrieved and concatenated

Hash join
Hash Join

  • The hash join algorithm has two phases

    • Partitioning, and

    • Probing

  • Partitioning

    • Both relations are partitioned using the same hash function, h, on the join attribute

      • Records in one partition of R can only match to records in the same partition of S

    • One input buffer and M- 1 output buffer pages are used to make M - 1 partitions for each relation

    • If both of the largest partitions of both relations do not fit in main memory, the relations must be further partitioned

Hash join continued
Hash Join continued

  • Probing

    • Read in one partition of R

    • To reduce CPU costs, build an in memory hash table using hash function h2 (h2  h)

    • Read the corresponding partition of S into an input buffer and join matching records using the hash table

    • Repeat for each partition of R

  • Cost

    • If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))

Memory requirements1
Memory Requirements

  • Hash join is performed in two phases

    • At a cost of 3(B(R) + B(S))

  • Main memory must be big enough to hold the largest partition of the smaller relation, R

    • Which requires that M - 2 is  (B(R))

      • A buffer for S and for output are needed

      • Partitioning produces B(R) - 1 partitions, of average size M /(B(R) - 1)

    • If M < (B(R)) then B(R)/ M must be larger than M, and the partitions are larger than main memory

Hybrid hash join
Hybrid Hash Join

  • If main memory is large enough, retain an entire partition of R during the partitioning phase

    • This eliminates the need to write out the partition, and read it back in during the probing phase

    • In addition, the matching S records can be joined (and written out to the result) when S is partitioned

    • Hence the records (of both R and S) belonging to that partition are only read once

  • This approach can be generalized to retain more than one partition where possible

Hybrid hash join and block nested loop join
Hybrid Hash Join and Block Nested Loop Join

  • If the smaller relation fits in main memory the costs are identical

    • The smaller relation is read once

    • The larger relation is scanned once to join the records

  • If the relations are large hash join is more efficient

    • Block nested loop join must read S once for each clump of R

    • Hash join matches partitions of R to partitions of S during the probing phase

    • The records of both R and Sbelonging to a particular partition are only read once, after the partitioning phase

Hash join and sort merge join
Hash Join and Sort-Merge Join

  • A skewed distribution of data may result in the relations being partitioned more than once

    • Because the partitions may vary considerably in size

    • Sort-merge join is not sensitive to data skew

  • If M falls between (B(R))and (B(S)) hash join requires less passes

    • Hash join only requires enough memory to hold partitions of one relation

    • The memory requirements of sort-merge join depend on the size of both relations

Join method ordering
Join Method Ordering

  • Simple nested loop join (read S for each record)

    • Retains the original order of R

  • Index nested loop join

    • Retains the original order of R

  • Sort-Merge join

    • Ordered by the join attribute

  • All other join methods

    • No order

General join conditions
General Join Conditions

  • The join process is more complex if the join condition is not simple equality on one attribute

  • For equalities over several attributes

    • Sort-merge and hash join must sort (or hash) over all of the attributes in the selection

    • An index that matches one of the equalities may be used for the index nested loop join

  • For inequalities (, , etc.)

    • Hash indexes cannot be used for index nested loop joins

    • Sort-merge and hash joins are not possible

    • Other join algorithms are unaffected

Set operations
Set Operations

SELECT fName, lName

FROM Patient


SELECT fName, lName

FROM Doctor

  • For set operations, unlike other SQL operations, duplicates are removed by default

    • Unless ALL is specified

    • Therefore most set operations entail sorting the relations

fName,lName(Patient)  fName,lName(Doctor)

Intersection and cartesian product
Intersection and Cartesian Product

  • Intersection R S

    • A join where the condition is equality on all attributes

    • An in-memory search structure should be used

  • Cartesian product R S

    • A special case of join where there is no join condition

    • All records are joined to each other

fName,lName(Patient)  fName,lName(Doctor)

is equivalent to

fName,lName(Patient) fName,lName(Doctor)


  • Union using sorting

    • Sort R and S using all fields

    • Scan and merge the results while removing duplicates

  • Union using hashing

    • Partition R and S using a hash function h

    • For each partition of S (if S is the smaller relation)

      • Build an in-memory hash table (using h2)

      • Scan the corresponding partition of R

      • For each record, r, probe the hash table, if a duplicate record already exists discard it, otherwise add it to the table

      • Write out the hash table and process the next partition

Set difference
Set Difference

  • Set Difference is implemented in a similar way to union

  • In sorting, in the merge phase check that records of R are not in S before adding them to the result

  • In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result

  • If the query contains ALL a count of duplicates should be maintained

    • Instead of storing duplicate records



FROM Patient

  • The basic method for calculating an aggregation is to scan the entire table

  • While keeping track of running information

    • SUM – totalof the retrieved values

    • AVG – total, and count of the retrieved values

    • COUNT – count of the retrieved values

    • MIN – smallest of the retrieved values

    • MAX – largest of the retrieved values

Grouped aggregations
Grouped Aggregations

SELECT AVG(income)

FROM Doctor

GROUP BY speciality

  • An aggregation can include a GROUP BY clause

    • In which case the aggregation has to be computed for each group

  • Aggregations with groupings can be computed in a number of ways

    • By sorting or hashing, or

    • By using an index

Sorting and groups
Sorting and Groups

  • In the sorting based approach to the table is sorted on the grouping attribute

  • The results of the sort are scanned and the result of the aggregate operation computed

    • These two processes can be combined in a similar way to the sort based projection and join algorithms

  • The cost is essentially the cost of the sorting operations

    • Probably 3B(R)

    • Although the final result will be considerably smaller than the sorted table

Hashing and groups
Hashing and Groups

  • In the hash based approach an in-memory hash table is build on the grouping attribute

    • Hash table entries consist of

      • grouping-value, running-information

  • The table is scanned, and for each record

    • Probe the hash table to find the entry for the group that the record belongs to, and

    • Update the running information for that group

  • Once the table has been scanned the grouped results are computed using the hash table entries

    • If the hash table fits in main memory the cost is B(R)

Aggregations and indexes
Aggregations and Indexes

  • It may be possible to satisfy an aggregate query using just the data entries of an index

    • The search key must include all of the attributes required for the query

      • The data entries may be sorted or hashed, and

      • No access to the records is required

    • If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order

      • The actual records may also be retrieved in this order

  • This is an example of an index-only plan


Memory Requirements

One pass and simple algorithms
One-Pass and Simple Algorithms

* = duplicate removal,  = grouping

Hash based algorithms
Hash-Based Algorithms

Assume B(S)  B(R), and that B(S)  M