CMPT 454

1 / 95

# CMPT 454 - PowerPoint PPT Presentation

Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' CMPT 454' - alijah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Query Optimization

### CMPT 454

Query Optimization
• Introduction
• Query Execution
• Unary Operations
• External Sorting
• Binary Operations
Query Compilation Steps
• Parsing
• Construct a parse tree for a query
• Generate equivalent logical query plans
• Convert the parse tree to a query plan in relational algebra
• Transform the plan into more efficient equivalents
• Generate a physical plan
• Select algorithms for each of the operators in the query
• Including details about how tables are to be accessed or sorted
Query Optimization
• Generating equivalent logical plans and their physical plans is known as query optimization
• Selecting a good query plan entails deciding
• Which equivalent plan leads to the most efficient query
• Which algorithm should be used to implement an operation
• How data from one operation should be passed to the next
• These choices depend on database metadata
• Size of relations
• Number and frequency of attributes
• Indexing and data file organization
Query Processing

query

Query Compilation

generating, improving and estimating query plans

Query Execution

algorithms to implement physical plan operators

result

Introduction
• A physical query plan is made up of operators
• Each implementing one step of the plan and
• Corresponding to a relational algebra operation
• Some relational algebra operations may require lower level physical operations
• Such as scanning a table
Sorting and Scanning
• It is sometimes necessary or useful to sort data as it is scanned
• To satisfy a query with an ORDER BY clause
• Or because an algorithm requires sorted input
• There are a number of ways in which a sort scan can be performed
• Main memory sorting
• B tree index
• Multi-way mergesort
Statistics
• There are M main memory frames available for an operation
• M may or may not be the entire main memory
• Information relating to a relation (table) R
• B(R) or B – number of blocks of R
• T(R) or T – number of tuples (i.e. records) of R
• V(R, a) – number of distinct values for attribute a in R
• Assume that data is accessed one block at a time from a disk
• This assumption will be discussed later
Computation Model
• Cost metric
• The number of disk I/Os required for an operation
• Assume that
• Input to an operation is to be read from disk
• The result is left in main memory (does not include write cost)
• Note that the size of a result should not influence the choice of algorithm for a particular operation
• Although it may influence the order of operations
I/O Cost for Scans
• The cost to scan R depends on its organization
• If R is organized into a single file the cost of a scan is B
• And if B < M the cost of a sort scan is B
• If R is distributed with records from other tables the cost of a scan may be higher
• The use of an index does not reduce the cost to scan R
• But may reduce the cost to retrieve a part of R
Introduction
• In this section the costs of individual relational algebra operations are considered
• There are often a number of algorithms that can be used to implement the same operation
• Each operation is considered in isolation
• How the individual operations interact is discussed in a later section
• Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of
• The order of the records in the result
• The algorithm used to achieve that result
Selection

SELECT *

FROM Customer

WHERE age > 50

• Only records that match the selection are included in the result
• The size of the result depends on the number of records that match the selection
• Complex selections that contain more than a single term will be considered later

age > 50(Customer)

Selections with No Index
• If the file is not sorted on the selection attribute
• Perform a file scan, cost is B reads
• If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2)
• The file is sorted on the selection attribute
• Perform a binary search to find the first record that matches the selection criteria
• Then read all records that match the selection criteria
• Which may entail reading additional pages
• log2(B) + the number of pages of matching records - 1
Selections with B+ Tree Index
• Find the leaf page of the first index entry that points to a matching record
• The number of disk reads is equal to the height of the tree
• The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads
• Then retrieve the matching records
• This cost is determined by the number of leaf pages containing matching entries, and
• The number of records to be retrieved, and
• Whether the index is primary or secondary
Primary B+ Tree Index
• In a primary index the file is sorted on the search key
• If the selection is small it may fit on one page
• If the selection is large, additional pages may have to be read
• The selection may cover a large range of records, or
• May be an equality selection on an attribute with few distinct values
• The exact number of pages depends on the number of records that match the selection
Secondary B+ Tree Index
• In a secondary index the sort order of the file (if any) is different from the search key of the index
• The leaves of the tree form a dense index on the file
• Adjacent matching data entries are unlikely to point to records on the same page
• A disk read is required for each matching record
• For large selections, a secondary index may be less efficient than scanning the entire table!
• In practice it is important to recognize this, by estimating the number of records returned by a selection
Secondary B+ Tree Range Selection

SELECT *

FROM Patient

WHERE lName = \'Zoobat\'

lName = "Zoobat"(Patient)

4 patients called Zoobat

file

Selections with Hash Index
• A hash index can be used to efficiently retrieve equality selections
• The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads
• The cost to retrieve the corresponding records depends on whether the index is primary or secondary
• If the index is primary, matching records are likely to reside on the same, single, file page whereas
• If the index is secondary, matching records are likely to reside on different pages of the file
• Hash indexes cannot be used for range selections
Complex Selections
• Selections may be made up of a number of terms connected by and () and or ()
• These terms may reference different attributes
• It is useful to know whether or not an index can be used to evaluate a selection
• To determine this the selection is first expressed in conjunctive normal form (CNF)
• A collection of conjuncts (terms joined by and)
• Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions)
• e.g. (A  B)  C  D  (A  C  D)  (B  C  D)
Selections with no Disjunctions
• Hash indexes can be used if there is an equality condition for every attribute in the search key
• e.g. a single hash index on {city, street, number}
• city="London"street="Baker"number=221(Detective)matches
• city="Los Angeles"street="Cahuenga"(Detective)does not
• Tree indexes can be used if there is a selection on each of the first n attributes of the search key
• e.g. B+ index on {city, street, number}
• Bothcity="London"street="Baker"number=221(Detective) and
• city="Los Angeles"street="Cahuenga"(Detective) match
Selections with no Disjunctions…
• If an index matches a subset of the conjuncts
• Use the index to return a result that contains some unwanted records
• Scan this result for matches to the other conjuncts
• city="London"street="Baker"number=221fName = "Sherlock" (Detective)
• If more than one index matches a conjunct
• Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria
• Or use all indexes and retrieve therids
• Take the intersection of the rids and retrieve those records
Selectionswith no Disjunctions…
• Consider the relation and selection shown below
• Detective = {id, fName, lName, age, city, street, number, author}
• city="New York"author="Spillane"lName="Hammer"(Detective)
• With indexes
• Secondary hash index, {city, street, number}
• Secondary B+ tree index, {lName, fName}
• Secondary hash index, {author}
• There are two strategies:
• Use the most selective of the two matching indexes, and search the results for the remaining criteria
• Use both indexes, take the intersection of the rid

can’t be used

can be used

can be used

What if the B+ tree index is primary?

Selections with Disjunctions
• If necessary, restate the selection in CNF
• If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct
• If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection
• Otherwise, use an index that matches one of the conjuncts and retrieve those records
• The remaining selection criteria can then be applied to the results from using that index
Selections with Disjunctions …
• Consider a selection of this form
• (a  b  c)  (d  e  f)(R)
• Where each of a to f is an equality selection on an attribute
• If each of the terms in either of the conjuncts has a matching index
• Use the indexes to find therids
• Take the union of the rids and retrieve those records
• If there are indexes on the attributes of a, b, c, and e
• Use the a, b, and c indexes and take the union of the rids
• Retrieve the resulting records and apply the other criteria
Selections with Disjunctions …
• Consider the selections shown below
• (author="King"  age>35)(lName="Tam"  id=11)(Detective)
• (author="King")  (lName="Tam"  id=11)(Detective)
• Indexes on the relation
• Secondary B+ tree index, {lName, fName}
• Secondary hash index, {author}
• Compare the two selections
• In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required
• In the second selection the index on author can be used, and records that don\'t meet the other criteria removed
Projections

SELECT fName, lName

FROM Customer

• Only columns in the column list are retained
• Reducing the size of the result relation
• There are two steps to implementing projections
• Remove the unwanted columns
• Eliminate any duplicate records, but only if DISTINCT has been specified
• Duplicates can be removed either by sorting or hashing

fName,lName(Customer)

External Sorting

A Digression

External Sorting
• Two-way external merge sort
• External merge sort
• Replacement sort
• External merge sort notes
External Sorting Introduction
• It is often necessary to sort data in a database
• To sort the results of a query
• To make some other operations more efficient
• Eliminating duplicate records
• Joining relations
• The focus on a DB sorting algorithm is to reduce the number of disk reads and writes
Internal vs. External Sorting
• Sorting a collection of records that fit within main memory can be performed efficiently
• There are a number of sorting algorithms that can be performed in n(log2n) time
• That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort,
• Many DB tables are too large to fit into main memory at one time
• So cannot be read into main memory and sorted
• The focus in external sorting is to reduce the number of disk I/Os
Merge Sort – a Brief Reminder
• Consider the Mergesort algorithm
• Input sub-arrays are repeatedly halved
• Until they contain only one element
• Sub-arrays are then merged into sorted sub-arrays by repeated merge operations
• merging two sorted sub-arrays can be performed in O(n)

mergesort(arr, start, end)

if(start < end)

mid = start + end / 2

mergesort(arr, start, mid)

mergesort(arr, mid+1, end)

merge(arr, start, mid, mid+1, end)

Naïve External Merge Sort
• Read the first two pages (of B pages) of the file
• Sort their contents (independently of each other)
• Merge the two pages
• Write them out to a sorted run of size 2
• Repeat for the rest of the file
• Producing B/2 sorted runs of size 2
• Merge the first two sorted runs of size 2
• Read in the first page of the first two runs into input pages
• Merge to a single output page, and write it out when full
• When all records in an input page have been merged read in the second page of that run
Naïve External Merge Sort …
• After the first merge pass the file consists of B/2 sorted runs each of two pages
• Read in the first page of each of the first two sorted runs
• Leaving a third page free as an output buffer

disk

pages contain three records

sorted runs of size 2

main memory

Memory Usage
• Records from the input pages are merged into the output buffer
• Once the output buffer is full it\'s contents are written out to disk, to form the first page of the first sorted run of length 4

disk

main memory

Memory Usage
• At this point all of the records from one of the input pages have been dealt with
• The next page of that sorted run is read into the input page
• And the process continues

disk

main memory

Cost of Naïve Merge Sort
• Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long
• After the second pass there are 2k-2 sorted runs, of length 4
• After the kth pass there is one sorted run of length B
• The number of passes is therefore log2B+1
• log2B is the number of merge passes required
• The +1 is for the initial pass to sort each page
• Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1)
• Note that only 3 frames of main memory are used!
First Stage Improvement
• In the first stage of the process each page is read into main memory, sorted and written out
• The pages are then ready to be merged
• Instead of sorting each page independently, read M pages into main memory, and sort them together
• M is the main memory frames available for the sort
• After the first pass there will be B/M sorted runs, each of length M
• This will reduce the number of subsequent merge passes
Merge Pass Improvement
• In the merge passes perform an M-1 way merge rather than a 2 way merge, using
• M-1 input pages, one for each of M-1 sorted runs and
• 1 page for an output buffer
• The first item in each of the M-1 input partitions is compared to each other to determine the smallest
• Resulting in less merge passes, and less disk I/O
• Each merge pass merges M-1 runs
• After the first pass the runs are size (M-1)*M
• Since M is the size of the runs from the first stage
Cost of External Merge Sort
• The initial pass produces B/M sorted runs of size M
• Each merge pass reduces the number of runs by a factor of M-1
• The number of merge passes is logM-1B/M 
• Each pass requires that the entire file is read and written
• Total cost is therefore 2B (logM-1B/M + 1)
• As M is typically relatively large this reduction is considerable
Number of Passes
• B = 1,000,000

Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)

Generating Longer Initial Runs
• In the first stage B/M sorted runs of size M are produced
• The size of these preliminary runs determines how many merge passes are required
• The size of the initial runs can be increased by using replacement sort
• On average the size of the initial runs can be increased to 2*M
• Using replacement sort increases complexity
Replacement Sort
• M-2 pages are used to sort the file
• Called the current set
• One page is used for input, and
• One page is used for output
• The current set is filled
• … and sorted

disk

current set

input

output

main memory

Replacement Sort
• The next page of the file is read in to the input buffer
• The smallest record from the current set (and the input buffer) is placed in the output buffer
• The first location of the current set is now available for the first record from the input buffer

disk

current set

input

output

main memory

Replacement Sort
• The process continues until the output buffer is full and
• The contents of the input buffer are in the current set
• At this point another input page can be read into the input buffer
• And the output buffer can be written to disk as the first page of the first run

disk

current set

input

output

main memory

Replacement Sort
• The current set can be periodically re-sorted
• At some point a page is read that contains records with values less than the values in records in the output buffer
• The rest of the current set must now be written out to complete the first run
• The process then begins again

disk

current set

input

output

main memory

Revisiting I/O Costs
• The cost metric used is simplistic
• In practice it my be advantageous to make the input and output buffers larger than one page
• This reduces the number of runs that can be merged at one time, so may increase the number of passes required
• But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page
• We have also ignored CPU costs
• If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory
Note: B+ Trees and Sorting
• Primary B+ tree index
• The index can be used to find the first page, but
• Note that the file is already sorted!
• Secondary B+ tree index
• Leaves point to data records that are not in sort order
• In the worst case, each data entry could point to a different page from its adjacent entries
• Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record!
• In practice external sort is likely to be much more efficient than using a secondary index
Projections and Sorting
• Projection with duplicate removal can be performed in a number of steps
• Scan the table, remove unwanted attributes, and write it back
• cost = 2B disk I/Os
• Sort the result, using all of its attributes as a compound sort key
• cost = 4B, possibly more if the file is very large
• Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B
• The result of this last stage may not have to be written out; it may be the last operation or may be pipelined into another operations
• The total cost is 8B disk I/Os, but this basic process can be improved by combining these steps
Improved Sort Projection
• The initial scan is performed as follows
• Read M pages and remove unwanted attributes
• Sort the records, and remove any duplicates
• Write the sorted run to disk
• Repeat for the rest of the file, for a total cost of 2B
• Perform merge passes as required on the output from the first stage
• Remove any duplicates as they are encountered
• If only one merge pass is required the cost is ≈ 1B
• For a total cost of 3B
Projections and Hashing
• Duplicates can also be identified by using hashing
• Duplicate removal by hashing has two stages
• Partitioning and probing
• In the partitioning stage
• Partition into M-1 partitions using a hash function, h
• There is an output buffer for each partition, and
• One input buffer
• The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer
• Duplicates are eliminated in the next stage
Projections and Hashing …
• The duplicate elimination stage uses a second hash function h2 (h2  h) to reduce main memory costs
• An in-memory hash table is built using h2
• If two records hash to the same location they are checked to see if they are duplicates
• Assuming that each partition produced in the partitioning stage can fit in main memory the cost is
• Partitioning stage: 2B
• Duplicate elimination stage: B, for a total cost of 3B
• This is the same cost as projection using sorting
Sort, Hash Projection Comparison
• Sort and hash projection have the same cost (3B)
• If M > (B) sorting and sort projection can be performed in two passes
• The first pass produces B/M sorted runs
• If there are less than M-1 of them only one merge pass is required
• With hash projection, if just one partition is greater than M-1, further partitioning is required
• Regardless of the overall size of the file

Customer Account

Joins

SELECT *

FROM Customer NATURAL INNER JOIN Account

• A join is defined as a Cartesian product followed by a selection
• Joins are used more often than Cartesian products
• Cartesian products typically result in much larger tables than joins
• It is important to be able to efficiently implement joins

C.sin = A.sin(Customer  Account)

Cost of Joins
• The naïve method of producing a join is to perform a Cartesian product then a selection
• This is very inefficient if the intermediate relation is large
• For example, consider two relations R and S
• R has 10,000 records, and S has 2,000
• S has a foreign key in R
• Implying that a record in S relates to at most one record in R
• The sizes of the join and the Cartesian product are
• Natural join – 2,000 (if each s in S relates to an r in R)
• Cartesian product – 20,000,000 records
Join Methods
• Nested loop joins
• These algorithms basically compute the Cartesian product and discard records that do not meet the join condition
• Index nested loop join
• Uses an index on one of the tables
• Sort-merge join
• Sorts both tables on the join attribute and merges results
• Hash join
• Similar to the sort-merge join but uses hashing
• We will also consider the hybrid hash join algorithm
Tuple Nested Loop Join
• This simple algorithm compares each record in the outer table with each record in the inner relation
• Start with one record, r, in relation R
• Compare r to all the records, s, in relation S and see if the join condition matches
• If so concatenate the records and add to the result
• Repeat with all r in R
• for each record r  R
• for each record s  S
• if ri = sj then
• add r,s to result

R R.i=S.jS

Basic Nested Loop Join

output buffer

records

main memory frames

scan all of S for each record in R

S (disk)

R (disk)

Nested Loop Join Cost
• The inner relation (S) has to be scanned once for each record in the outer relation (R)
• For a cost of B(R) + (T(R) * B(S) )
• The results are in the same order as R
• A simple improvement is to process one blockof R at a time, instead of one record at a time
• This reduces the cost to B(R) + (B(R) * B(S) )
• If B(R) and B(S) are roughly the same size this is O(n2)
• Where n = B(R)
• Note that the result is no longer strictly ordered on R
• The smaller relation should be the outer relation
Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each page in R

S (disk)

R (disk)

Block Nested Loop Join
• The nested loop join algorithm does not make effective use of main memory
• Only one page of the outer relation is processed at a time
• The algorithm can be significantly improved by reading as much of the outer relation as possible
• Use M – 2 pages as an input buffer for the outer relation,
• 1 page as an input buffer for the inner relation, and
• 1 page as an output buffer
• If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)!
• CPU costs can be reduced by building an in-memory hash table on the inner relation
Block Nested Loop Join …
• What if the smaller relation is larger than M-2?
• Break R, the outer relation, into blocks of M – 2 pages
• Refer (somewhat flippantly) to these blocks as clumps
• Scan S once for each clump of R
• Insert concatenated records r, s that match the join condition into the output buffer
• S is read B(R)/(M-2) times
• The total cost is B(R) + (B(R)/(M-2) * B(S))
• Disk seek time can be reduced by increasing the size of the input buffer for S
• Which may increase the number of times that S is scanned
Improved Nested Loop Join

output buffer

records

main memory frames

scan all of S for each M-2 pages in R

M-2 pages

S (disk)

R (disk)

Index Nested Loop Join
• If there is an index on the join attribute
• Make the indexed relation the inner relation (S)
• Read the outer relation, retrieving matching records of S using the index
• Does not first calculate the Cartesian product as it only retrieves records that satisfy the join condition
• The cost is dependent on the kind of index and the number of matching records
• An index lookup is required for each record in R
• B+ tree index: 2 – 4, hash index: 1 – 2
• If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary
Index Nested Loop Join

output buffer

records

main memory frames

read page with matching s

look up index using join attribute of r

S (disk)

R (disk)

Sort-Merge Join
• Assume that both tables to be joined are sorted on the join attribute
• The tables may be joined with one pass, in a similar way to merging two sorted runs
• At a cost of B(R) + B(S)
• If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort
• The sort can typically be performed in two passes
• At a cost of 4 B(R) + 4 B(S), including writing out the results
• Joining the tables has an additional cost ofB(R) + B(S)
Sort-Merge Join Process
• Read in pages of R and S
• Assume that the join attribute is x
• While xris different from xs
• If xr is less than xs move to the next record from R, else
• Move to the next record from S
• If xrequals xs
• Create the joined record by concatenating r and s, and
• Add the resulting record to the output buffer
• Repeat the process until all records have been read
Improved Sort-Merge Join
• The merge phase of the external sort algorithm can be combined with the join
• Produce sorted runs (of size M) of R and S
• Merge R, and S, then compare the results, discarding records that do not meet the join condition requires
• A buffer for each sorted run of R
• A buffer for each sorted run of S,
• One buffer for merged runs of R,
• One buffer for the merged runs of S, and
• One buffer for the concatenated results
• The improvement requires more main memory as there must be buffers for each run of both R and S
Sort-Merge Join

Read in sorted runs of R andS

Merge R

and merge S

Merge and concatenate records

Memory Requirements
• Sort-merge join can be performed in two passes
• For a cost of 3(B(R) + B(S))
• This assumes that main memory is large enough to allow an input buffer for each run of bothR and S
• The initial pass produces sorted runs of size M
• Main memory must be greater than (B(R) + B(S)) to perform the join in two passes
• The first pass produces B(R) /M runs of R and B(S) /M runs of S
• If M is not greater than (B(R) + B(S) ) then (B(R) /M + B(S) /M ) must be greater than M
Sort-Merge Join and Indexes
• The sort-merge join can take advantage of a primary index
• The relation with the index is already sorted so no initial sort pass is required
• If both relations have a primary index on the join attribute a zig-zag join can be performed
• Scan the leaves of the two B+ trees in order from the left
• When the search key value of one index is higher, scan the other index
• Only when both indexes contain the same search key values are the records retrieved and concatenated
Hash Join
• The hash join algorithm has two phases
• Partitioning, and
• Probing
• Partitioning
• Both relations are partitioned using the same hash function, h, on the join attribute
• Records in one partition of R can only match to records in the same partition of S
• One input buffer and M- 1 output buffer pages are used to make M - 1 partitions for each relation
• If both of the largest partitions of both relations do not fit in main memory, the relations must be further partitioned
Hash Join continued
• Probing
• Read in one partition of R
• To reduce CPU costs, build an in memory hash table using hash function h2 (h2  h)
• Read the corresponding partition of S into an input buffer and join matching records using the hash table
• Repeat for each partition of R
• Cost
• If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))
Memory Requirements
• Hash join is performed in two phases
• At a cost of 3(B(R) + B(S))
• Main memory must be big enough to hold the largest partition of the smaller relation, R
• Which requires that M - 2 is  (B(R))
• A buffer for S and for output are needed
• Partitioning produces B(R) - 1 partitions, of average size M /(B(R) - 1)
• If M < (B(R)) then B(R)/ M must be larger than M, and the partitions are larger than main memory
Hybrid Hash Join
• If main memory is large enough, retain an entire partition of R during the partitioning phase
• This eliminates the need to write out the partition, and read it back in during the probing phase
• In addition, the matching S records can be joined (and written out to the result) when S is partitioned
• Hence the records (of both R and S) belonging to that partition are only read once
• This approach can be generalized to retain more than one partition where possible
Hybrid Hash Join and Block Nested Loop Join
• If the smaller relation fits in main memory the costs are identical
• The smaller relation is read once
• The larger relation is scanned once to join the records
• If the relations are large hash join is more efficient
• Block nested loop join must read S once for each clump of R
• Hash join matches partitions of R to partitions of S during the probing phase
• The records of both R and Sbelonging to a particular partition are only read once, after the partitioning phase
Hash Join and Sort-Merge Join
• A skewed distribution of data may result in the relations being partitioned more than once
• Because the partitions may vary considerably in size
• Sort-merge join is not sensitive to data skew
• If M falls between (B(R))and (B(S)) hash join requires less passes
• Hash join only requires enough memory to hold partitions of one relation
• The memory requirements of sort-merge join depend on the size of both relations
Join Method Ordering
• Simple nested loop join (read S for each record)
• Retains the original order of R
• Index nested loop join
• Retains the original order of R
• Sort-Merge join
• Ordered by the join attribute
• All other join methods
• No order
General Join Conditions
• The join process is more complex if the join condition is not simple equality on one attribute
• For equalities over several attributes
• Sort-merge and hash join must sort (or hash) over all of the attributes in the selection
• An index that matches one of the equalities may be used for the index nested loop join
• For inequalities (, , etc.)
• Hash indexes cannot be used for index nested loop joins
• Sort-merge and hash joins are not possible
• Other join algorithms are unaffected
Set Operations

SELECT fName, lName

FROM Patient

INTERSECT

SELECT fName, lName

FROM Doctor

• For set operations, unlike other SQL operations, duplicates are removed by default
• Unless ALL is specified
• Therefore most set operations entail sorting the relations

fName,lName(Patient)  fName,lName(Doctor)

Intersection and Cartesian Product
• Intersection R S
• A join where the condition is equality on all attributes
• An in-memory search structure should be used
• Cartesian product R S
• A special case of join where there is no join condition
• All records are joined to each other

fName,lName(Patient)  fName,lName(Doctor)

is equivalent to

fName,lName(Patient) fName,lName(Doctor)

Union
• Union using sorting
• Sort R and S using all fields
• Scan and merge the results while removing duplicates
• Union using hashing
• Partition R and S using a hash function h
• For each partition of S (if S is the smaller relation)
• Build an in-memory hash table (using h2)
• Scan the corresponding partition of R
• For each record, r, probe the hash table, if a duplicate record already exists discard it, otherwise add it to the table
• Write out the hash table and process the next partition
Set Difference
• Set Difference is implemented in a similar way to union
• In sorting, in the merge phase check that records of R are not in S before adding them to the result
• In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result
• If the query contains ALL a count of duplicates should be maintained
• Instead of storing duplicate records
Aggregations

SELECT AVG(age)

FROM Patient

• The basic method for calculating an aggregation is to scan the entire table
• While keeping track of running information
• SUM – totalof the retrieved values
• AVG – total, and count of the retrieved values
• COUNT – count of the retrieved values
• MIN – smallest of the retrieved values
• MAX – largest of the retrieved values
Grouped Aggregations

SELECT AVG(income)

FROM Doctor

GROUP BY speciality

• An aggregation can include a GROUP BY clause
• In which case the aggregation has to be computed for each group
• Aggregations with groupings can be computed in a number of ways
• By sorting or hashing, or
• By using an index
Sorting and Groups
• In the sorting based approach to the table is sorted on the grouping attribute
• The results of the sort are scanned and the result of the aggregate operation computed
• These two processes can be combined in a similar way to the sort based projection and join algorithms
• The cost is essentially the cost of the sorting operations
• Probably 3B(R)
• Although the final result will be considerably smaller than the sorted table
Hashing and Groups
• In the hash based approach an in-memory hash table is build on the grouping attribute
• Hash table entries consist of
• grouping-value, running-information
• The table is scanned, and for each record
• Probe the hash table to find the entry for the group that the record belongs to, and
• Update the running information for that group
• Once the table has been scanned the grouped results are computed using the hash table entries
• If the hash table fits in main memory the cost is B(R)
Aggregations and Indexes
• It may be possible to satisfy an aggregate query using just the data entries of an index
• The search key must include all of the attributes required for the query
• The data entries may be sorted or hashed, and
• No access to the records is required
• If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order
• The actual records may also be retrieved in this order
• This is an example of an index-only plan
Summary

Memory Requirements

One-Pass and Simple Algorithms

* = duplicate removal,  = grouping

Hash-Based Algorithms

Assume B(S)  B(R), and that B(S)  M