1 / 95

# CMPT 454 - PowerPoint PPT Presentation

Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CMPT 454' - alijah

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CMPT 454

• Introduction

• Query Execution

• Unary Operations

• External Sorting

• Binary Operations

• Parsing

• Construct a parse tree for a query

• Generate equivalent logical query plans

• Convert the parse tree to a query plan in relational algebra

• Transform the plan into more efficient equivalents

• Generate a physical plan

• Select algorithms for each of the operators in the query

• Including details about how tables are to be accessed or sorted

• Generating equivalent logical plans and their physical plans is known as query optimization

• Selecting a good query plan entails deciding

• Which equivalent plan leads to the most efficient query

• Which algorithm should be used to implement an operation

• How data from one operation should be passed to the next

• These choices depend on database metadata

• Size of relations

• Number and frequency of attributes

• Indexing and data file organization

query

Query Compilation

generating, improving and estimating query plans

Query Execution

algorithms to implement physical plan operators

result

• A physical query plan is made up of operators

• Each implementing one step of the plan and

• Corresponding to a relational algebra operation

• Some relational algebra operations may require lower level physical operations

• Such as scanning a table

• It is sometimes necessary or useful to sort data as it is scanned

• To satisfy a query with an ORDER BY clause

• Or because an algorithm requires sorted input

• There are a number of ways in which a sort scan can be performed

• Main memory sorting

• B tree index

• Multi-way mergesort

• There are M main memory frames available for an operation

• M may or may not be the entire main memory

• Information relating to a relation (table) R

• B(R) or B – number of blocks of R

• T(R) or T – number of tuples (i.e. records) of R

• V(R, a) – number of distinct values for attribute a in R

• Assume that data is accessed one block at a time from a disk

• This assumption will be discussed later

• Cost metric

• The number of disk I/Os required for an operation

• Assume that

• Input to an operation is to be read from disk

• The result is left in main memory (does not include write cost)

• Note that the size of a result should not influence the choice of algorithm for a particular operation

• Although it may influence the order of operations

• The cost to scan R depends on its organization

• If R is organized into a single file the cost of a scan is B

• And if B < M the cost of a sort scan is B

• If R is distributed with records from other tables the cost of a scan may be higher

• The use of an index does not reduce the cost to scan R

• But may reduce the cost to retrieve a part of R

Unary Operations

• In this section the costs of individual relational algebra operations are considered

• There are often a number of algorithms that can be used to implement the same operation

• Each operation is considered in isolation

• How the individual operations interact is discussed in a later section

• Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of

• The order of the records in the result

• The algorithm used to achieve that result

SELECT *

FROM Customer

WHERE age > 50

• Only records that match the selection are included in the result

• The size of the result depends on the number of records that match the selection

• Complex selections that contain more than a single term will be considered later

age > 50(Customer)

• If the file is not sorted on the selection attribute

• Perform a file scan, cost is B reads

• If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2)

• The file is sorted on the selection attribute

• Perform a binary search to find the first record that matches the selection criteria

• Then read all records that match the selection criteria

• log2(B) + the number of pages of matching records - 1

• Find the leaf page of the first index entry that points to a matching record

• The number of disk reads is equal to the height of the tree

• The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads

• Then retrieve the matching records

• This cost is determined by the number of leaf pages containing matching entries, and

• The number of records to be retrieved, and

• Whether the index is primary or secondary

• In a primary index the file is sorted on the search key

• If the selection is small it may fit on one page

• If the selection is large, additional pages may have to be read

• The selection may cover a large range of records, or

• May be an equality selection on an attribute with few distinct values

• The exact number of pages depends on the number of records that match the selection

• In a secondary index the sort order of the file (if any) is different from the search key of the index

• The leaves of the tree form a dense index on the file

• Adjacent matching data entries are unlikely to point to records on the same page

• A disk read is required for each matching record

• For large selections, a secondary index may be less efficient than scanning the entire table!

• In practice it is important to recognize this, by estimating the number of records returned by a selection

SELECT *

FROM Patient

WHERE lName = 'Zoobat'

lName = "Zoobat"(Patient)

4 patients called Zoobat

file

• A hash index can be used to efficiently retrieve equality selections

• The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads

• The cost to retrieve the corresponding records depends on whether the index is primary or secondary

• If the index is primary, matching records are likely to reside on the same, single, file page whereas

• If the index is secondary, matching records are likely to reside on different pages of the file

• Hash indexes cannot be used for range selections

• Selections may be made up of a number of terms connected by and () and or ()

• These terms may reference different attributes

• It is useful to know whether or not an index can be used to evaluate a selection

• To determine this the selection is first expressed in conjunctive normal form (CNF)

• A collection of conjuncts (terms joined by and)

• Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions)

• e.g. (A  B)  C  D  (A  C  D)  (B  C  D)

• Hash indexes can be used if there is an equality condition for every attribute in the search key

• e.g. a single hash index on {city, street, number}

• city="London"street="Baker"number=221(Detective)matches

• city="Los Angeles"street="Cahuenga"(Detective)does not

• Tree indexes can be used if there is a selection on each of the first n attributes of the search key

• e.g. B+ index on {city, street, number}

• Bothcity="London"street="Baker"number=221(Detective) and

• city="Los Angeles"street="Cahuenga"(Detective) match

• If an index matches a subset of the conjuncts

• Use the index to return a result that contains some unwanted records

• Scan this result for matches to the other conjuncts

• city="London"street="Baker"number=221fName = "Sherlock" (Detective)

• If more than one index matches a conjunct

• Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria

• Or use all indexes and retrieve therids

• Take the intersection of the rids and retrieve those records

Selectionswith no Disjunctions…

• Consider the relation and selection shown below

• Detective = {id, fName, lName, age, city, street, number, author}

• city="New York"author="Spillane"lName="Hammer"(Detective)

• With indexes

• Secondary hash index, {city, street, number}

• Secondary B+ tree index, {lName, fName}

• Secondary hash index, {author}

• There are two strategies:

• Use the most selective of the two matching indexes, and search the results for the remaining criteria

• Use both indexes, take the intersection of the rid

can’t be used

can be used

can be used

What if the B+ tree index is primary?

• If necessary, restate the selection in CNF

• If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct

• If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection

• Otherwise, use an index that matches one of the conjuncts and retrieve those records

• The remaining selection criteria can then be applied to the results from using that index

• Consider a selection of this form

• (a  b  c)  (d  e  f)(R)

• Where each of a to f is an equality selection on an attribute

• If each of the terms in either of the conjuncts has a matching index

• Use the indexes to find therids

• Take the union of the rids and retrieve those records

• If there are indexes on the attributes of a, b, c, and e

• Use the a, b, and c indexes and take the union of the rids

• Retrieve the resulting records and apply the other criteria

• Consider the selections shown below

• (author="King"  age>35)(lName="Tam"  id=11)(Detective)

• (author="King")  (lName="Tam"  id=11)(Detective)

• Indexes on the relation

• Secondary B+ tree index, {lName, fName}

• Secondary hash index, {author}

• Compare the two selections

• In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required

• In the second selection the index on author can be used, and records that don't meet the other criteria removed

SELECT fName, lName

FROM Customer

• Only columns in the column list are retained

• Reducing the size of the result relation

• There are two steps to implementing projections

• Remove the unwanted columns

• Eliminate any duplicate records, but only if DISTINCT has been specified

• Duplicates can be removed either by sorting or hashing

fName,lName(Customer)

A Digression

• Two-way external merge sort

• External merge sort

• Replacement sort

• External merge sort notes

• It is often necessary to sort data in a database

• To sort the results of a query

• To make some other operations more efficient

• Eliminating duplicate records

• Joining relations

• The focus on a DB sorting algorithm is to reduce the number of disk reads and writes

• Sorting a collection of records that fit within main memory can be performed efficiently

• There are a number of sorting algorithms that can be performed in n(log2n) time

• That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort,

• Many DB tables are too large to fit into main memory at one time

• So cannot be read into main memory and sorted

• The focus in external sorting is to reduce the number of disk I/Os

• Consider the Mergesort algorithm

• Input sub-arrays are repeatedly halved

• Until they contain only one element

• Sub-arrays are then merged into sorted sub-arrays by repeated merge operations

• merging two sorted sub-arrays can be performed in O(n)

mergesort(arr, start, end)

if(start < end)

mid = start + end / 2

mergesort(arr, start, mid)

mergesort(arr, mid+1, end)

merge(arr, start, mid, mid+1, end)

• Read the first two pages (of B pages) of the file

• Sort their contents (independently of each other)

• Merge the two pages

• Write them out to a sorted run of size 2

• Repeat for the rest of the file

• Producing B/2 sorted runs of size 2

• Merge the first two sorted runs of size 2

• Read in the first page of the first two runs into input pages

• Merge to a single output page, and write it out when full

• When all records in an input page have been merged read in the second page of that run

• After the first merge pass the file consists of B/2 sorted runs each of two pages

• Read in the first page of each of the first two sorted runs

• Leaving a third page free as an output buffer

disk

pages contain three records

sorted runs of size 2

main memory

• Records from the input pages are merged into the output buffer

• Once the output buffer is full it's contents are written out to disk, to form the first page of the first sorted run of length 4

disk

main memory

• At this point all of the records from one of the input pages have been dealt with

• The next page of that sorted run is read into the input page

• And the process continues

disk

main memory

• Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long

• After the second pass there are 2k-2 sorted runs, of length 4

• After the kth pass there is one sorted run of length B

• The number of passes is therefore log2B+1

• log2B is the number of merge passes required

• The +1 is for the initial pass to sort each page

• Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1)

• Note that only 3 frames of main memory are used!

• In the first stage of the process each page is read into main memory, sorted and written out

• The pages are then ready to be merged

• Instead of sorting each page independently, read M pages into main memory, and sort them together

• M is the main memory frames available for the sort

• After the first pass there will be B/M sorted runs, each of length M

• This will reduce the number of subsequent merge passes

• In the merge passes perform an M-1 way merge rather than a 2 way merge, using

• M-1 input pages, one for each of M-1 sorted runs and

• 1 page for an output buffer

• The first item in each of the M-1 input partitions is compared to each other to determine the smallest

• Resulting in less merge passes, and less disk I/O

• Each merge pass merges M-1 runs

• After the first pass the runs are size (M-1)*M

• Since M is the size of the runs from the first stage

• The initial pass produces B/M sorted runs of size M

• Each merge pass reduces the number of runs by a factor of M-1

• The number of merge passes is logM-1B/M 

• Each pass requires that the entire file is read and written

• Total cost is therefore 2B (logM-1B/M + 1)

• As M is typically relatively large this reduction is considerable

• B = 1,000,000

Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)

• In the first stage B/M sorted runs of size M are produced

• The size of these preliminary runs determines how many merge passes are required

• The size of the initial runs can be increased by using replacement sort

• On average the size of the initial runs can be increased to 2*M

• Using replacement sort increases complexity

• M-2 pages are used to sort the file

• Called the current set

• One page is used for input, and

• One page is used for output

• The current set is filled

• … and sorted

disk

current set

input

output

main memory

• The next page of the file is read in to the input buffer

• The smallest record from the current set (and the input buffer) is placed in the output buffer

• The first location of the current set is now available for the first record from the input buffer

disk

current set

input

output

main memory

• The process continues until the output buffer is full and

• The contents of the input buffer are in the current set

• At this point another input page can be read into the input buffer

• And the output buffer can be written to disk as the first page of the first run

disk

current set

input

output

main memory

• The current set can be periodically re-sorted

• At some point a page is read that contains records with values less than the values in records in the output buffer

• The rest of the current set must now be written out to complete the first run

• The process then begins again

disk

current set

input

output

main memory

• The cost metric used is simplistic

• In practice it my be advantageous to make the input and output buffers larger than one page

• This reduces the number of runs that can be merged at one time, so may increase the number of passes required

• But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page

• We have also ignored CPU costs

• If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory

• Primary B+ tree index

• The index can be used to find the first page, but

• Note that the file is already sorted!

• Secondary B+ tree index

• Leaves point to data records that are not in sort order

• In the worst case, each data entry could point to a different page from its adjacent entries

• Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record!

• In practice external sort is likely to be much more efficient than using a secondary index

• Projection with duplicate removal can be performed in a number of steps

• Scan the table, remove unwanted attributes, and write it back

• cost = 2B disk I/Os

• Sort the result, using all of its attributes as a compound sort key

• cost = 4B, possibly more if the file is very large

• Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B

• The result of this last stage may not have to be written out; it may be the last operation or may be pipelined into another operations

• The total cost is 8B disk I/Os, but this basic process can be improved by combining these steps

• The initial scan is performed as follows

• Read M pages and remove unwanted attributes

• Sort the records, and remove any duplicates

• Write the sorted run to disk

• Repeat for the rest of the file, for a total cost of 2B

• Perform merge passes as required on the output from the first stage

• Remove any duplicates as they are encountered

• If only one merge pass is required the cost is ≈ 1B

• For a total cost of 3B

• Duplicates can also be identified by using hashing

• Duplicate removal by hashing has two stages

• Partitioning and probing

• In the partitioning stage

• Partition into M-1 partitions using a hash function, h

• There is an output buffer for each partition, and

• One input buffer

• The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer

• Duplicates are eliminated in the next stage

• The duplicate elimination stage uses a second hash function h2 (h2  h) to reduce main memory costs

• An in-memory hash table is built using h2

• If two records hash to the same location they are checked to see if they are duplicates

• Assuming that each partition produced in the partitioning stage can fit in main memory the cost is

• Partitioning stage: 2B

• Duplicate elimination stage: B, for a total cost of 3B

• This is the same cost as projection using sorting

• Sort and hash projection have the same cost (3B)

• If M > (B) sorting and sort projection can be performed in two passes

• The first pass produces B/M sorted runs

• If there are less than M-1 of them only one merge pass is required

• With hash projection, if just one partition is greater than M-1, further partitioning is required

• Regardless of the overall size of the file

Joins

SELECT *

FROM Customer NATURAL INNER JOIN Account

• A join is defined as a Cartesian product followed by a selection

• Joins are used more often than Cartesian products

• Cartesian products typically result in much larger tables than joins

• It is important to be able to efficiently implement joins

C.sin = A.sin(Customer  Account)

• The naïve method of producing a join is to perform a Cartesian product then a selection

• This is very inefficient if the intermediate relation is large

• For example, consider two relations R and S

• R has 10,000 records, and S has 2,000

• S has a foreign key in R

• Implying that a record in S relates to at most one record in R

• The sizes of the join and the Cartesian product are

• Natural join – 2,000 (if each s in S relates to an r in R)

• Cartesian product – 20,000,000 records

• Nested loop joins

• These algorithms basically compute the Cartesian product and discard records that do not meet the join condition

• Index nested loop join

• Uses an index on one of the tables

• Sort-merge join

• Sorts both tables on the join attribute and merges results

• Hash join

• Similar to the sort-merge join but uses hashing

• We will also consider the hybrid hash join algorithm

Tuple Nested Loop Join

• This simple algorithm compares each record in the outer table with each record in the inner relation

• Compare r to all the records, s, in relation S and see if the join condition matches

• If so concatenate the records and add to the result

• Repeat with all r in R

• for each record r  R

• for each record s  S

• if ri = sj then

R R.i=S.jS

output buffer

records

main memory frames

scan all of S for each record in R

S (disk)

R (disk)

• The inner relation (S) has to be scanned once for each record in the outer relation (R)

• For a cost of B(R) + (T(R) * B(S) )

• The results are in the same order as R

• A simple improvement is to process one blockof R at a time, instead of one record at a time

• This reduces the cost to B(R) + (B(R) * B(S) )

• If B(R) and B(S) are roughly the same size this is O(n2)

• Where n = B(R)

• Note that the result is no longer strictly ordered on R

• The smaller relation should be the outer relation

output buffer

records

main memory frames

scan all of S for each page in R

S (disk)

R (disk)

• The nested loop join algorithm does not make effective use of main memory

• Only one page of the outer relation is processed at a time

• The algorithm can be significantly improved by reading as much of the outer relation as possible

• Use M – 2 pages as an input buffer for the outer relation,

• 1 page as an input buffer for the inner relation, and

• 1 page as an output buffer

• If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)!

• CPU costs can be reduced by building an in-memory hash table on the inner relation

• What if the smaller relation is larger than M-2?

• Break R, the outer relation, into blocks of M – 2 pages

• Refer (somewhat flippantly) to these blocks as clumps

• Scan S once for each clump of R

• Insert concatenated records r, s that match the join condition into the output buffer

• S is read B(R)/(M-2) times

• The total cost is B(R) + (B(R)/(M-2) * B(S))

• Disk seek time can be reduced by increasing the size of the input buffer for S

• Which may increase the number of times that S is scanned

output buffer

records

main memory frames

scan all of S for each M-2 pages in R

M-2 pages

S (disk)

R (disk)

• If there is an index on the join attribute

• Make the indexed relation the inner relation (S)

• Read the outer relation, retrieving matching records of S using the index

• Does not first calculate the Cartesian product as it only retrieves records that satisfy the join condition

• The cost is dependent on the kind of index and the number of matching records

• An index lookup is required for each record in R

• B+ tree index: 2 – 4, hash index: 1 – 2

• If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary

output buffer

records

main memory frames

look up index using join attribute of r

S (disk)

R (disk)

• Assume that both tables to be joined are sorted on the join attribute

• The tables may be joined with one pass, in a similar way to merging two sorted runs

• At a cost of B(R) + B(S)

• If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort

• The sort can typically be performed in two passes

• At a cost of 4 B(R) + 4 B(S), including writing out the results

• Joining the tables has an additional cost ofB(R) + B(S)

• Read in pages of R and S

• Assume that the join attribute is x

• While xris different from xs

• If xr is less than xs move to the next record from R, else

• Move to the next record from S

• If xrequals xs

• Create the joined record by concatenating r and s, and

• Add the resulting record to the output buffer

• Repeat the process until all records have been read

• The merge phase of the external sort algorithm can be combined with the join

• Produce sorted runs (of size M) of R and S

• Merge R, and S, then compare the results, discarding records that do not meet the join condition requires

• A buffer for each sorted run of R

• A buffer for each sorted run of S,

• One buffer for merged runs of R,

• One buffer for the merged runs of S, and

• One buffer for the concatenated results

• The improvement requires more main memory as there must be buffers for each run of both R and S

Read in sorted runs of R andS

Merge R

and merge S

Merge and concatenate records

• Sort-merge join can be performed in two passes

• For a cost of 3(B(R) + B(S))

• This assumes that main memory is large enough to allow an input buffer for each run of bothR and S

• The initial pass produces sorted runs of size M

• Main memory must be greater than (B(R) + B(S)) to perform the join in two passes

• The first pass produces B(R) /M runs of R and B(S) /M runs of S

• If M is not greater than (B(R) + B(S) ) then (B(R) /M + B(S) /M ) must be greater than M

• The sort-merge join can take advantage of a primary index

• The relation with the index is already sorted so no initial sort pass is required

• If both relations have a primary index on the join attribute a zig-zag join can be performed

• Scan the leaves of the two B+ trees in order from the left

• When the search key value of one index is higher, scan the other index

• Only when both indexes contain the same search key values are the records retrieved and concatenated

• The hash join algorithm has two phases

• Partitioning, and

• Probing

• Partitioning

• Both relations are partitioned using the same hash function, h, on the join attribute

• Records in one partition of R can only match to records in the same partition of S

• One input buffer and M- 1 output buffer pages are used to make M - 1 partitions for each relation

• If both of the largest partitions of both relations do not fit in main memory, the relations must be further partitioned

• Probing

• Read in one partition of R

• To reduce CPU costs, build an in memory hash table using hash function h2 (h2  h)

• Read the corresponding partition of S into an input buffer and join matching records using the hash table

• Repeat for each partition of R

• Cost

• If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))

• Hash join is performed in two phases

• At a cost of 3(B(R) + B(S))

• Main memory must be big enough to hold the largest partition of the smaller relation, R

• Which requires that M - 2 is  (B(R))

• A buffer for S and for output are needed

• Partitioning produces B(R) - 1 partitions, of average size M /(B(R) - 1)

• If M < (B(R)) then B(R)/ M must be larger than M, and the partitions are larger than main memory

• If main memory is large enough, retain an entire partition of R during the partitioning phase

• This eliminates the need to write out the partition, and read it back in during the probing phase

• In addition, the matching S records can be joined (and written out to the result) when S is partitioned

• Hence the records (of both R and S) belonging to that partition are only read once

• This approach can be generalized to retain more than one partition where possible

Hybrid Hash Join and Block Nested Loop Join

• If the smaller relation fits in main memory the costs are identical

• The smaller relation is read once

• The larger relation is scanned once to join the records

• If the relations are large hash join is more efficient

• Block nested loop join must read S once for each clump of R

• Hash join matches partitions of R to partitions of S during the probing phase

• The records of both R and Sbelonging to a particular partition are only read once, after the partitioning phase

• A skewed distribution of data may result in the relations being partitioned more than once

• Because the partitions may vary considerably in size

• Sort-merge join is not sensitive to data skew

• If M falls between (B(R))and (B(S)) hash join requires less passes

• Hash join only requires enough memory to hold partitions of one relation

• The memory requirements of sort-merge join depend on the size of both relations

• Simple nested loop join (read S for each record)

• Retains the original order of R

• Index nested loop join

• Retains the original order of R

• Sort-Merge join

• Ordered by the join attribute

• All other join methods

• No order

• The join process is more complex if the join condition is not simple equality on one attribute

• For equalities over several attributes

• Sort-merge and hash join must sort (or hash) over all of the attributes in the selection

• An index that matches one of the equalities may be used for the index nested loop join

• For inequalities (, , etc.)

• Hash indexes cannot be used for index nested loop joins

• Sort-merge and hash joins are not possible

• Other join algorithms are unaffected

SELECT fName, lName

FROM Patient

INTERSECT

SELECT fName, lName

FROM Doctor

• For set operations, unlike other SQL operations, duplicates are removed by default

• Unless ALL is specified

• Therefore most set operations entail sorting the relations

fName,lName(Patient)  fName,lName(Doctor)

• Intersection R S

• A join where the condition is equality on all attributes

• An in-memory search structure should be used

• Cartesian product R S

• A special case of join where there is no join condition

• All records are joined to each other

fName,lName(Patient)  fName,lName(Doctor)

is equivalent to

fName,lName(Patient) fName,lName(Doctor)

• Union using sorting

• Sort R and S using all fields

• Scan and merge the results while removing duplicates

• Union using hashing

• Partition R and S using a hash function h

• For each partition of S (if S is the smaller relation)

• Build an in-memory hash table (using h2)

• Scan the corresponding partition of R

• For each record, r, probe the hash table, if a duplicate record already exists discard it, otherwise add it to the table

• Write out the hash table and process the next partition

• Set Difference is implemented in a similar way to union

• In sorting, in the merge phase check that records of R are not in S before adding them to the result

• In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result

• If the query contains ALL a count of duplicates should be maintained

• Instead of storing duplicate records

SELECT AVG(age)

FROM Patient

• The basic method for calculating an aggregation is to scan the entire table

• While keeping track of running information

• SUM – totalof the retrieved values

• AVG – total, and count of the retrieved values

• COUNT – count of the retrieved values

• MIN – smallest of the retrieved values

• MAX – largest of the retrieved values

SELECT AVG(income)

FROM Doctor

GROUP BY speciality

• An aggregation can include a GROUP BY clause

• In which case the aggregation has to be computed for each group

• Aggregations with groupings can be computed in a number of ways

• By sorting or hashing, or

• By using an index

• In the sorting based approach to the table is sorted on the grouping attribute

• The results of the sort are scanned and the result of the aggregate operation computed

• These two processes can be combined in a similar way to the sort based projection and join algorithms

• The cost is essentially the cost of the sorting operations

• Probably 3B(R)

• Although the final result will be considerably smaller than the sorted table

• In the hash based approach an in-memory hash table is build on the grouping attribute

• Hash table entries consist of

• grouping-value, running-information

• The table is scanned, and for each record

• Probe the hash table to find the entry for the group that the record belongs to, and

• Update the running information for that group

• Once the table has been scanned the grouped results are computed using the hash table entries

• If the hash table fits in main memory the cost is B(R)

• It may be possible to satisfy an aggregate query using just the data entries of an index

• The search key must include all of the attributes required for the query

• The data entries may be sorted or hashed, and

• If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order

• The actual records may also be retrieved in this order

• This is an example of an index-only plan

Memory Requirements

* = duplicate removal,  = grouping

Assume B(S)  B(R), and that B(S)  M