query optimization
Download
Skip this Video
Download Presentation
CMPT 454

Loading in 2 Seconds...

play fullscreen
1 / 95

CMPT 454 - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CMPT 454' - alijah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
query optimization1
Query Optimization
  • Introduction
  • Query Execution
  • Unary Operations
  • External Sorting
  • Binary Operations
query compilation steps
Query Compilation Steps
  • Parsing
    • Construct a parse tree for a query
  • Generate equivalent logical query plans
    • Convert the parse tree to a query plan in relational algebra
    • Transform the plan into more efficient equivalents
  • Generate a physical plan
    • Select algorithms for each of the operators in the query
    • Including details about how tables are to be accessed or sorted
query optimization2
Query Optimization
  • Generating equivalent logical plans and their physical plans is known as query optimization
  • Selecting a good query plan entails deciding
    • Which equivalent plan leads to the most efficient query
    • Which algorithm should be used to implement an operation
    • How data from one operation should be passed to the next
  • These choices depend on database metadata
    • Size of relations
    • Number and frequency of attributes
    • Indexing and data file organization
query processing
Query Processing

query

Query Compilation

generating, improving and estimating query plans

Query Execution

algorithms to implement physical plan operators

result

introduction
Introduction
  • A physical query plan is made up of operators
    • Each implementing one step of the plan and
    • Corresponding to a relational algebra operation
  • Some relational algebra operations may require lower level physical operations
    • Such as scanning a table
sorting and scanning
Sorting and Scanning
  • It is sometimes necessary or useful to sort data as it is scanned
    • To satisfy a query with an ORDER BY clause
    • Or because an algorithm requires sorted input
  • There are a number of ways in which a sort scan can be performed
    • Main memory sorting
    • B tree index
    • Multi-way mergesort
statistics
Statistics
  • There are M main memory frames available for an operation
    • M may or may not be the entire main memory
  • Information relating to a relation (table) R
    • B(R) or B – number of blocks of R
    • T(R) or T – number of tuples (i.e. records) of R
    • V(R, a) – number of distinct values for attribute a in R
  • Assume that data is accessed one block at a time from a disk
    • This assumption will be discussed later
computation model
Computation Model
  • Cost metric
    • The number of disk I/Os required for an operation
  • Assume that
    • Input to an operation is to be read from disk
    • The result is left in main memory (does not include write cost)
  • Note that the size of a result should not influence the choice of algorithm for a particular operation
    • Although it may influence the order of operations
i o cost for scans
I/O Cost for Scans
  • The cost to scan R depends on its organization
    • If R is organized into a single file the cost of a scan is B
    • And if B < M the cost of a sort scan is B
    • If R is distributed with records from other tables the cost of a scan may be higher
  • The use of an index does not reduce the cost to scan R
    • But may reduce the cost to retrieve a part of R
introduction1
Introduction
  • In this section the costs of individual relational algebra operations are considered
    • There are often a number of algorithms that can be used to implement the same operation
  • Each operation is considered in isolation
    • How the individual operations interact is discussed in a later section
    • Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of
      • The order of the records in the result
      • The algorithm used to achieve that result
selection
Selection

SELECT *

FROM Customer

WHERE age > 50

  • Only records that match the selection are included in the result
    • The size of the result depends on the number of records that match the selection
  • Complex selections that contain more than a single term will be considered later

age > 50(Customer)

selections with no index
Selections with No Index
  • If the file is not sorted on the selection attribute
    • Perform a file scan, cost is B reads
    • If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2)
  • The file is sorted on the selection attribute
    • Perform a binary search to find the first record that matches the selection criteria
    • Then read all records that match the selection criteria
      • Which may entail reading additional pages
    • log2(B) + the number of pages of matching records - 1
selections with b tree index
Selections with B+ Tree Index
  • Find the leaf page of the first index entry that points to a matching record
    • The number of disk reads is equal to the height of the tree
    • The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads
  • Then retrieve the matching records
    • This cost is determined by the number of leaf pages containing matching entries, and
    • The number of records to be retrieved, and
    • Whether the index is primary or secondary
primary b tree index
Primary B+ Tree Index
  • In a primary index the file is sorted on the search key
  • If the selection is small it may fit on one page
    • An additional disk read to access this page is required
  • If the selection is large, additional pages may have to be read
    • The selection may cover a large range of records, or
    • May be an equality selection on an attribute with few distinct values
    • The exact number of pages depends on the number of records that match the selection
secondary b tree index
Secondary B+ Tree Index
  • In a secondary index the sort order of the file (if any) is different from the search key of the index
    • The leaves of the tree form a dense index on the file
  • Adjacent matching data entries are unlikely to point to records on the same page
    • A disk read is required for each matching record
  • For large selections, a secondary index may be less efficient than scanning the entire table!
    • In practice it is important to recognize this, by estimating the number of records returned by a selection
secondary b tree range selection
Secondary B+ Tree Range Selection

SELECT *

FROM Patient

WHERE lName = \'Zoobat\'

lName = "Zoobat"(Patient)

4 patients called Zoobat

file

selections with hash index
Selections with Hash Index
  • A hash index can be used to efficiently retrieve equality selections
    • The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads
    • The cost to retrieve the corresponding records depends on whether the index is primary or secondary
      • If the index is primary, matching records are likely to reside on the same, single, file page whereas
      • If the index is secondary, matching records are likely to reside on different pages of the file
  • Hash indexes cannot be used for range selections
complex selections
Complex Selections
  • Selections may be made up of a number of terms connected by and () and or ()
    • These terms may reference different attributes
    • It is useful to know whether or not an index can be used to evaluate a selection
  • To determine this the selection is first expressed in conjunctive normal form (CNF)
    • A collection of conjuncts (terms joined by and)
    • Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions)
      • e.g. (A  B)  C  D  (A  C  D)  (B  C  D)
selections with no disjunctions
Selections with no Disjunctions
  • Hash indexes can be used if there is an equality condition for every attribute in the search key
    • e.g. a single hash index on {city, street, number}
      • city="London"street="Baker"number=221(Detective)matches
      • city="Los Angeles"street="Cahuenga"(Detective)does not
  • Tree indexes can be used if there is a selection on each of the first n attributes of the search key
    • e.g. B+ index on {city, street, number}
      • Bothcity="London"street="Baker"number=221(Detective) and
      • city="Los Angeles"street="Cahuenga"(Detective) match
selections with no disjunctions1
Selections with no Disjunctions…
  • If an index matches a subset of the conjuncts
    • Use the index to return a result that contains some unwanted records
    • Scan this result for matches to the other conjuncts
      • city="London"street="Baker"number=221fName = "Sherlock" (Detective)
  • If more than one index matches a conjunct
    • Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria
    • Or use all indexes and retrieve therids
    • Take the intersection of the rids and retrieve those records
selections with no disjunctions2
Selectionswith no Disjunctions…
  • Consider the relation and selection shown below
    • Detective = {id, fName, lName, age, city, street, number, author}
    • city="New York"author="Spillane"lName="Hammer"(Detective)
  • With indexes
    • Secondary hash index, {city, street, number}
    • Secondary B+ tree index, {lName, fName}
    • Secondary hash index, {author}
  • There are two strategies:
    • Use the most selective of the two matching indexes, and search the results for the remaining criteria
    • Use both indexes, take the intersection of the rid

can’t be used

can be used

can be used

What if the B+ tree index is primary?

selections with disjunctions
Selections with Disjunctions
  • If necessary, restate the selection in CNF
  • If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct
    • If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection
  • Otherwise, use an index that matches one of the conjuncts and retrieve those records
    • The remaining selection criteria can then be applied to the results from using that index
selections with disjunctions1
Selections with Disjunctions …
  • Consider a selection of this form
    • (a  b  c)  (d  e  f)(R)
    • Where each of a to f is an equality selection on an attribute
  • If each of the terms in either of the conjuncts has a matching index
    • Use the indexes to find therids
    • Take the union of the rids and retrieve those records
  • If there are indexes on the attributes of a, b, c, and e
    • Use the a, b, and c indexes and take the union of the rids
    • Retrieve the resulting records and apply the other criteria
selections with disjunctions2
Selections with Disjunctions …
  • Consider the selections shown below
    • (author="King"  age>35)(lName="Tam"  id=11)(Detective)
    • (author="King")  (lName="Tam"  id=11)(Detective)
  • Indexes on the relation
    • Secondary B+ tree index, {lName, fName}
    • Secondary hash index, {author}
  • Compare the two selections
    • In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required
    • In the second selection the index on author can be used, and records that don\'t meet the other criteria removed
projections
Projections

SELECT fName, lName

FROM Customer

  • Only columns in the column list are retained
    • Reducing the size of the result relation
  • There are two steps to implementing projections
    • Remove the unwanted columns
    • Eliminate any duplicate records, but only if DISTINCT has been specified
      • Duplicates can be removed either by sorting or hashing

fName,lName(Customer)

external sorting
External Sorting

A Digression

external sorting1
External Sorting
  • Two-way external merge sort
  • External merge sort
  • Replacement sort
  • External merge sort notes
external sorting introduction
External Sorting Introduction
  • It is often necessary to sort data in a database
    • To sort the results of a query
    • To make some other operations more efficient
      • Bulk loading a tree index
      • Eliminating duplicate records
      • Joining relations
  • The focus on a DB sorting algorithm is to reduce the number of disk reads and writes
internal vs external sorting
Internal vs. External Sorting
  • Sorting a collection of records that fit within main memory can be performed efficiently
    • There are a number of sorting algorithms that can be performed in n(log2n) time
      • That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort,
  • Many DB tables are too large to fit into main memory at one time
    • So cannot be read into main memory and sorted
    • The focus in external sorting is to reduce the number of disk I/Os
merge sort a brief reminder
Merge Sort – a Brief Reminder
  • Consider the Mergesort algorithm
    • Input sub-arrays are repeatedly halved
    • Until they contain only one element
  • Sub-arrays are then merged into sorted sub-arrays by repeated merge operations
    • merging two sorted sub-arrays can be performed in O(n)

mergesort(arr, start, end)

if(start < end)

mid = start + end / 2

mergesort(arr, start, mid)

mergesort(arr, mid+1, end)

merge(arr, start, mid, mid+1, end)

na ve external merge sort
Naïve External Merge Sort
  • Read the first two pages (of B pages) of the file
    • Sort their contents (independently of each other)
  • Merge the two pages
    • Write them out to a sorted run of size 2
  • Repeat for the rest of the file
    • Producing B/2 sorted runs of size 2
  • Merge the first two sorted runs of size 2
    • Read in the first page of the first two runs into input pages
    • Merge to a single output page, and write it out when full
    • When all records in an input page have been merged read in the second page of that run
na ve external merge sort1
Naïve External Merge Sort …
  • After the first merge pass the file consists of B/2 sorted runs each of two pages
  • Read in the first page of each of the first two sorted runs
  • Leaving a third page free as an output buffer

disk

pages contain three records

sorted runs of size 2

main memory

memory usage
Memory Usage
  • Records from the input pages are merged into the output buffer
  • Once the output buffer is full it\'s contents are written out to disk, to form the first page of the first sorted run of length 4

disk

main memory

memory usage1
Memory Usage
  • At this point all of the records from one of the input pages have been dealt with
  • The next page of that sorted run is read into the input page
  • And the process continues

disk

main memory

cost of na ve merge sort
Cost of Naïve Merge Sort
  • Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long
    • After the second pass there are 2k-2 sorted runs, of length 4
    • After the kth pass there is one sorted run of length B
  • The number of passes is therefore log2B+1
    • log2B is the number of merge passes required
    • The +1 is for the initial pass to sort each page
  • Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1)
    • Note that only 3 frames of main memory are used!
first stage improvement
First Stage Improvement
  • In the first stage of the process each page is read into main memory, sorted and written out
    • The pages are then ready to be merged
  • Instead of sorting each page independently, read M pages into main memory, and sort them together
    • M is the main memory frames available for the sort
    • After the first pass there will be B/M sorted runs, each of length M
    • This will reduce the number of subsequent merge passes
merge pass improvement
Merge Pass Improvement
  • In the merge passes perform an M-1 way merge rather than a 2 way merge, using
    • M-1 input pages, one for each of M-1 sorted runs and
    • 1 page for an output buffer
  • The first item in each of the M-1 input partitions is compared to each other to determine the smallest
    • Resulting in less merge passes, and less disk I/O
  • Each merge pass merges M-1 runs
    • After the first pass the runs are size (M-1)*M
      • Since M is the size of the runs from the first stage
cost of external merge sort
Cost of External Merge Sort
  • The initial pass produces B/M sorted runs of size M
  • Each merge pass reduces the number of runs by a factor of M-1
    • The number of merge passes is logM-1B/M 
  • Each pass requires that the entire file is read and written
    • Total cost is therefore 2B (logM-1B/M + 1)
  • As M is typically relatively large this reduction is considerable
number of passes
Number of Passes
  • B = 1,000,000

Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)

generating longer initial runs
Generating Longer Initial Runs
  • In the first stage B/M sorted runs of size M are produced
    • The size of these preliminary runs determines how many merge passes are required
  • The size of the initial runs can be increased by using replacement sort
    • On average the size of the initial runs can be increased to 2*M
    • Using replacement sort increases complexity
replacement sort
Replacement Sort
  • M-2 pages are used to sort the file
    • Called the current set
  • One page is used for input, and
  • One page is used for output
  • The current set is filled
    • … and sorted

disk

current set

input

output

main memory

replacement sort1
Replacement Sort
  • The next page of the file is read in to the input buffer
  • The smallest record from the current set (and the input buffer) is placed in the output buffer
  • The first location of the current set is now available for the first record from the input buffer

disk

current set

input

output

main memory

replacement sort2
Replacement Sort
  • The process continues until the output buffer is full and
  • The contents of the input buffer are in the current set
  • At this point another input page can be read into the input buffer
  • And the output buffer can be written to disk as the first page of the first run

disk

current set

input

output

main memory

replacement sort3
Replacement Sort
  • The current set can be periodically re-sorted
  • At some point a page is read that contains records with values less than the values in records in the output buffer
  • The rest of the current set must now be written out to complete the first run
  • The process then begins again

disk

current set

input

output

main memory

revisiting i o costs
Revisiting I/O Costs
  • The cost metric used is simplistic
  • In practice it my be advantageous to make the input and output buffers larger than one page
    • This reduces the number of runs that can be merged at one time, so may increase the number of passes required
    • But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page
  • We have also ignored CPU costs
    • If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory
note b trees and sorting
Note: B+ Trees and Sorting
  • Primary B+ tree index
    • The index can be used to find the first page, but
    • Note that the file is already sorted!
  • Secondary B+ tree index
    • Leaves point to data records that are not in sort order
      • In the worst case, each data entry could point to a different page from its adjacent entries
    • Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record!
    • In practice external sort is likely to be much more efficient than using a secondary index
projections and sorting
Projections and Sorting
  • Projection with duplicate removal can be performed in a number of steps
    • Scan the table, remove unwanted attributes, and write it back
      • cost = 2B disk I/Os
    • Sort the result, using all of its attributes as a compound sort key
      • cost = 4B, possibly more if the file is very large
    • Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B
      • The result of this last stage may not have to be written out; it may be the last operation or may be pipelined into another operations
  • The total cost is 8B disk I/Os, but this basic process can be improved by combining these steps
improved sort projection
Improved Sort Projection
  • The initial scan is performed as follows
    • Read M pages and remove unwanted attributes
    • Sort the records, and remove any duplicates
    • Write the sorted run to disk
    • Repeat for the rest of the file, for a total cost of 2B
  • Perform merge passes as required on the output from the first stage
    • Remove any duplicates as they are encountered
    • If only one merge pass is required the cost is ≈ 1B
  • For a total cost of 3B
projections and hashing
Projections and Hashing
  • Duplicates can also be identified by using hashing
  • Duplicate removal by hashing has two stages
    • Partitioning and probing
  • In the partitioning stage
    • Partition into M-1 partitions using a hash function, h
      • There is an output buffer for each partition, and
      • One input buffer
    • The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer
  • Duplicates are eliminated in the next stage
projections and hashing1
Projections and Hashing …
  • The duplicate elimination stage uses a second hash function h2 (h2  h) to reduce main memory costs
    • An in-memory hash table is built using h2
    • If two records hash to the same location they are checked to see if they are duplicates
  • Assuming that each partition produced in the partitioning stage can fit in main memory the cost is
    • Partitioning stage: 2B
    • Duplicate elimination stage: B, for a total cost of 3B
  • This is the same cost as projection using sorting
sort hash projection comparison
Sort, Hash Projection Comparison
  • Sort and hash projection have the same cost (3B)
  • If M > (B) sorting and sort projection can be performed in two passes
    • The first pass produces B/M sorted runs
    • If there are less than M-1 of them only one merge pass is required
  • With hash projection, if just one partition is greater than M-1, further partitioning is required
    • Regardless of the overall size of the file
joins

Customer Account

Joins

SELECT *

FROM Customer NATURAL INNER JOIN Account

  • A join is defined as a Cartesian product followed by a selection
    • Joins are used more often than Cartesian products
  • Cartesian products typically result in much larger tables than joins
    • It is important to be able to efficiently implement joins

C.sin = A.sin(Customer  Account)

cost of joins
Cost of Joins
  • The naïve method of producing a join is to perform a Cartesian product then a selection
    • This is very inefficient if the intermediate relation is large
  • For example, consider two relations R and S
    • R has 10,000 records, and S has 2,000
    • S has a foreign key in R
      • Implying that a record in S relates to at most one record in R
  • The sizes of the join and the Cartesian product are
    • Natural join – 2,000 (if each s in S relates to an r in R)
    • Cartesian product – 20,000,000 records
join methods
Join Methods
  • Nested loop joins
    • These algorithms basically compute the Cartesian product and discard records that do not meet the join condition
  • Index nested loop join
    • Uses an index on one of the tables
  • Sort-merge join
    • Sorts both tables on the join attribute and merges results
  • Hash join
    • Similar to the sort-merge join but uses hashing
    • We will also consider the hybrid hash join algorithm
tuple nested loop join
Tuple Nested Loop Join
  • This simple algorithm compares each record in the outer table with each record in the inner relation
    • Start with one record, r, in relation R
    • Compare r to all the records, s, in relation S and see if the join condition matches
      • If so concatenate the records and add to the result
    • Repeat with all r in R
  • for each record r  R
  • for each record s  S
      • if ri = sj then
      • add r,s to result

R R.i=S.jS

basic nested loop join
Basic Nested Loop Join

output buffer

records

main memory frames

scan all of S for each record in R

S (disk)

R (disk)

nested loop join cost
Nested Loop Join Cost
  • The inner relation (S) has to be scanned once for each record in the outer relation (R)
    • For a cost of B(R) + (T(R) * B(S) )
    • The results are in the same order as R
  • A simple improvement is to process one blockof R at a time, instead of one record at a time
    • This reduces the cost to B(R) + (B(R) * B(S) )
    • If B(R) and B(S) are roughly the same size this is O(n2)
      • Where n = B(R)
    • Note that the result is no longer strictly ordered on R
  • The smaller relation should be the outer relation
improved nested loop join
Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each page in R

S (disk)

R (disk)

block nested loop join
Block Nested Loop Join
  • The nested loop join algorithm does not make effective use of main memory
    • Only one page of the outer relation is processed at a time
  • The algorithm can be significantly improved by reading as much of the outer relation as possible
    • Use M – 2 pages as an input buffer for the outer relation,
    • 1 page as an input buffer for the inner relation, and
    • 1 page as an output buffer
    • If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)!
      • CPU costs can be reduced by building an in-memory hash table on the inner relation
block nested loop join1
Block Nested Loop Join …
  • What if the smaller relation is larger than M-2?
    • Break R, the outer relation, into blocks of M – 2 pages
      • Refer (somewhat flippantly) to these blocks as clumps
    • Scan S once for each clump of R
      • Insert concatenated records r, s that match the join condition into the output buffer
      • S is read B(R)/(M-2) times
    • The total cost is B(R) + (B(R)/(M-2) * B(S))
  • Disk seek time can be reduced by increasing the size of the input buffer for S
    • Which may increase the number of times that S is scanned
improved nested loop join1
Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each M-2 pages in R

M-2 pages

S (disk)

R (disk)

index nested loop join
Index Nested Loop Join
  • If there is an index on the join attribute
    • Make the indexed relation the inner relation (S)
    • Read the outer relation, retrieving matching records of S using the index
      • Does not first calculate the Cartesian product as it only retrieves records that satisfy the join condition
  • The cost is dependent on the kind of index and the number of matching records
    • An index lookup is required for each record in R
      • B+ tree index: 2 – 4, hash index: 1 – 2
    • If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary
index nested loop join1
Index Nested Loop Join

output buffer

records

main memory frames

read page with matching s

look up index using join attribute of r

S (disk)

R (disk)

sort merge join
Sort-Merge Join
  • Assume that both tables to be joined are sorted on the join attribute
    • The tables may be joined with one pass, in a similar way to merging two sorted runs
    • At a cost of B(R) + B(S)
  • If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort
    • The sort can typically be performed in two passes
      • At a cost of 4 B(R) + 4 B(S), including writing out the results
    • Joining the tables has an additional cost ofB(R) + B(S)
sort merge join process
Sort-Merge Join Process
  • Read in pages of R and S
    • Assume that the join attribute is x
  • While xris different from xs
    • If xr is less than xs move to the next record from R, else
    • Move to the next record from S
  • If xrequals xs
    • Create the joined record by concatenating r and s, and
    • Add the resulting record to the output buffer
  • Repeat the process until all records have been read
improved sort merge join
Improved Sort-Merge Join
  • The merge phase of the external sort algorithm can be combined with the join
    • Produce sorted runs (of size M) of R and S
    • Merge R, and S, then compare the results, discarding records that do not meet the join condition requires
      • A buffer for each sorted run of R
      • A buffer for each sorted run of S,
      • One buffer for merged runs of R,
      • One buffer for the merged runs of S, and
      • One buffer for the concatenated results
  • The improvement requires more main memory as there must be buffers for each run of both R and S
sort merge join1
Sort-Merge Join

Read in sorted runs of R andS

Merge R

and merge S

Merge and concatenate records

memory requirements
Memory Requirements
  • Sort-merge join can be performed in two passes
    • For a cost of 3(B(R) + B(S))
  • This assumes that main memory is large enough to allow an input buffer for each run of bothR and S
    • The initial pass produces sorted runs of size M
    • Main memory must be greater than (B(R) + B(S)) to perform the join in two passes
      • The first pass produces B(R) /M runs of R and B(S) /M runs of S
      • If M is not greater than (B(R) + B(S) ) then (B(R) /M + B(S) /M ) must be greater than M
sort merge join and indexes
Sort-Merge Join and Indexes
  • The sort-merge join can take advantage of a primary index
    • The relation with the index is already sorted so no initial sort pass is required
  • If both relations have a primary index on the join attribute a zig-zag join can be performed
    • Scan the leaves of the two B+ trees in order from the left
    • When the search key value of one index is higher, scan the other index
    • Only when both indexes contain the same search key values are the records retrieved and concatenated
hash join
Hash Join
  • The hash join algorithm has two phases
    • Partitioning, and
    • Probing
  • Partitioning
    • Both relations are partitioned using the same hash function, h, on the join attribute
      • Records in one partition of R can only match to records in the same partition of S
    • One input buffer and M- 1 output buffer pages are used to make M - 1 partitions for each relation
    • If both of the largest partitions of both relations do not fit in main memory, the relations must be further partitioned
hash join continued
Hash Join continued
  • Probing
    • Read in one partition of R
    • To reduce CPU costs, build an in memory hash table using hash function h2 (h2  h)
    • Read the corresponding partition of S into an input buffer and join matching records using the hash table
    • Repeat for each partition of R
  • Cost
    • If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))
memory requirements1
Memory Requirements
  • Hash join is performed in two phases
    • At a cost of 3(B(R) + B(S))
  • Main memory must be big enough to hold the largest partition of the smaller relation, R
    • Which requires that M - 2 is  (B(R))
      • A buffer for S and for output are needed
      • Partitioning produces B(R) - 1 partitions, of average size M /(B(R) - 1)
    • If M < (B(R)) then B(R)/ M must be larger than M, and the partitions are larger than main memory
hybrid hash join
Hybrid Hash Join
  • If main memory is large enough, retain an entire partition of R during the partitioning phase
    • This eliminates the need to write out the partition, and read it back in during the probing phase
    • In addition, the matching S records can be joined (and written out to the result) when S is partitioned
    • Hence the records (of both R and S) belonging to that partition are only read once
  • This approach can be generalized to retain more than one partition where possible
hybrid hash join and block nested loop join
Hybrid Hash Join and Block Nested Loop Join
  • If the smaller relation fits in main memory the costs are identical
    • The smaller relation is read once
    • The larger relation is scanned once to join the records
  • If the relations are large hash join is more efficient
    • Block nested loop join must read S once for each clump of R
    • Hash join matches partitions of R to partitions of S during the probing phase
    • The records of both R and Sbelonging to a particular partition are only read once, after the partitioning phase
hash join and sort merge join
Hash Join and Sort-Merge Join
  • A skewed distribution of data may result in the relations being partitioned more than once
    • Because the partitions may vary considerably in size
    • Sort-merge join is not sensitive to data skew
  • If M falls between (B(R))and (B(S)) hash join requires less passes
    • Hash join only requires enough memory to hold partitions of one relation
    • The memory requirements of sort-merge join depend on the size of both relations
join method ordering
Join Method Ordering
  • Simple nested loop join (read S for each record)
    • Retains the original order of R
  • Index nested loop join
    • Retains the original order of R
  • Sort-Merge join
    • Ordered by the join attribute
  • All other join methods
    • No order
general join conditions
General Join Conditions
  • The join process is more complex if the join condition is not simple equality on one attribute
  • For equalities over several attributes
    • Sort-merge and hash join must sort (or hash) over all of the attributes in the selection
    • An index that matches one of the equalities may be used for the index nested loop join
  • For inequalities (, , etc.)
    • Hash indexes cannot be used for index nested loop joins
    • Sort-merge and hash joins are not possible
    • Other join algorithms are unaffected
set operations
Set Operations

SELECT fName, lName

FROM Patient

INTERSECT

SELECT fName, lName

FROM Doctor

  • For set operations, unlike other SQL operations, duplicates are removed by default
    • Unless ALL is specified
    • Therefore most set operations entail sorting the relations

fName,lName(Patient)  fName,lName(Doctor)

intersection and cartesian product
Intersection and Cartesian Product
  • Intersection R S
    • A join where the condition is equality on all attributes
    • An in-memory search structure should be used
  • Cartesian product R S
    • A special case of join where there is no join condition
    • All records are joined to each other

fName,lName(Patient)  fName,lName(Doctor)

is equivalent to

fName,lName(Patient) fName,lName(Doctor)

union
Union
  • Union using sorting
    • Sort R and S using all fields
    • Scan and merge the results while removing duplicates
  • Union using hashing
    • Partition R and S using a hash function h
    • For each partition of S (if S is the smaller relation)
      • Build an in-memory hash table (using h2)
      • Scan the corresponding partition of R
      • For each record, r, probe the hash table, if a duplicate record already exists discard it, otherwise add it to the table
      • Write out the hash table and process the next partition
set difference
Set Difference
  • Set Difference is implemented in a similar way to union
  • In sorting, in the merge phase check that records of R are not in S before adding them to the result
  • In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result
  • If the query contains ALL a count of duplicates should be maintained
    • Instead of storing duplicate records
aggregations
Aggregations

SELECT AVG(age)

FROM Patient

  • The basic method for calculating an aggregation is to scan the entire table
  • While keeping track of running information
    • SUM – totalof the retrieved values
    • AVG – total, and count of the retrieved values
    • COUNT – count of the retrieved values
    • MIN – smallest of the retrieved values
    • MAX – largest of the retrieved values
grouped aggregations
Grouped Aggregations

SELECT AVG(income)

FROM Doctor

GROUP BY speciality

  • An aggregation can include a GROUP BY clause
    • In which case the aggregation has to be computed for each group
  • Aggregations with groupings can be computed in a number of ways
    • By sorting or hashing, or
    • By using an index
sorting and groups
Sorting and Groups
  • In the sorting based approach to the table is sorted on the grouping attribute
  • The results of the sort are scanned and the result of the aggregate operation computed
    • These two processes can be combined in a similar way to the sort based projection and join algorithms
  • The cost is essentially the cost of the sorting operations
    • Probably 3B(R)
    • Although the final result will be considerably smaller than the sorted table
hashing and groups
Hashing and Groups
  • In the hash based approach an in-memory hash table is build on the grouping attribute
    • Hash table entries consist of
      • grouping-value, running-information
  • The table is scanned, and for each record
    • Probe the hash table to find the entry for the group that the record belongs to, and
    • Update the running information for that group
  • Once the table has been scanned the grouped results are computed using the hash table entries
    • If the hash table fits in main memory the cost is B(R)
aggregations and indexes
Aggregations and Indexes
  • It may be possible to satisfy an aggregate query using just the data entries of an index
    • The search key must include all of the attributes required for the query
      • The data entries may be sorted or hashed, and
      • No access to the records is required
    • If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order
      • The actual records may also be retrieved in this order
  • This is an example of an index-only plan
summary
Summary

Memory Requirements

one pass and simple algorithms
One-Pass and Simple Algorithms

* = duplicate removal,  = grouping

hash based algorithms
Hash-Based Algorithms

Assume B(S)  B(R), and that B(S)  M

ad