Download Presentation
ECE 569 Database System Engineering Fall 2004 Topic VIII: Query Execution and optimization

Loading in 2 Seconds...

1 / 29

# ECE 569 Database System Engineering Fall 2004 Topic VIII: Query Execution and optimization - PowerPoint PPT Presentation

ECE 569 Database System Engineering Fall 2004 Topic VIII: Query Execution and optimization. Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04. Select Operation. simple condition (C=A  V)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'ECE 569 Database System Engineering Fall 2004 Topic VIII: Query Execution and optimization' - lucas

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### ECE 569 Database System EngineeringFall 2004Topic VIII: Query Execution and optimization

Yanyong Zhang www.ece.rutgers.edu/~yyzhang

Course URL www.ece.rutgers.edu/~yyzhang/fall04

Select Operation
• simple condition (C=A  V)
• Selectivity of condition C for relation R = |{t  R | C(t)}| / |R|
• The number of records satisfying a condition divided by number of tuples
• If there are i distinct values of V with uniform distribution, average selectivity is 1/i.
• Linear Search
• Retrieve every tuple in relation and test predicate
• Cost = N_Pages
• Equality predicate with index on A
• Locate all the tuples with search key V using the index
• Cost = ??
• Inequality with B+-tree on A
• Locate first tuple t with search key V using index
• Retrieve tuples in ascending order starting with t if  is . Otherwise retrieve tuples in descending order
• Cost = ?? (selectivity)
Select operation (cont’d)
• Conjunctive conditions (C = C1 C2 ...  Ck )
• Use one of the access methods above to retrieve tuples satisfying Ci. For each tuple, test remaining conditions.
• Choose Ci with lowest selectivity to reduce cost
• If secondary indices containing tuple pointers exist for all or some of the attributes, retrieve all pointers that satisfy individual conditions. Intersect pointers and retrieve tuples.
• Disjunctive Conditions (C1 C2 ...  Ck)
• If there is an access path for every condition, then select records and perform union to eliminate duplicates.
Join Operation
• T = R >< S
• Nested loop
• Algorithm

while records remain in R do

fetch next record r from R

while records remain in S do

fetch next record s from S

if (r[A] == s[B]) then

insert t into T where t[R] = r and t[S] =s

end

end

• Analysis
• rR # of records in R
• bR # of blocks in R
• Cost = rR(bs+1)

A = B

Join Operation (cont’d)
• T = R >< S
• Nested loop with multiple buffers
• Use one buffer to sequence through blocks of S
• Use nb-2 buffers for R

while records remain in R do

read in nb-2 buffers of tuples from R

while records remain in S do

fetch next record s from S

for every record r of R in a buffer do

if (r[A] == s[B]) then insert t into T

end for

end while

end while

• Every block of R is read only once
• Every block in S is read bR/(nb-2)
• Cost = bR +bRbS/(nb-2)
• Outer loop should scan smaller relation

A = B

Join operation (cont’d)
• Index method
• Requires an index (or hash key) for one of the join attributes. (Assume there is an index on B of S)
• Algorithm

while records remain in R do

fetch next tuple from R

use index to retrieve all records with key r(B) in S

for each record s retrieved do

insert t into T

end for

end while

• Analysis
• xB average # of block accesses to retrieve a record using access path for attribute B
• Cost = bR + rR xB + bT
• These disk accesses may be slower than those from nested join.
Join operation (cont’d)
• Sort-merge join
• Requires that records in R be ordered by their values in A and that S be ordered according to B.
• Algorithm below assumes A and B are candidate keys.

fetch next record r from R

fetch next record s from S

while (r  NULL and s  NULL) do

if(r(A) > s(B)) then fetch next record s from S

else if (r(A) < s(B)) then fetch next record r from R

else /* r(A) == s(B) */

insert t into T

fetch next record r from R

fetch next record s from S

end while

Analysis

• Records of each file are scanned only once
• Cost = bR + bs + bT
Projection operations
• Projection - p(R)
• P includes a candidate key for R
• No need to check for duplicates
• Otherwise, one of following can be used to eliminate duplicates
• If result is hashed, check for duplicates as tuples are inserted
• Sort resulting relation and eliminate duplicates that are now adjacent.
Set operations
• R  S
• Hash records of R and S to same file. Do not insert duplicates
• Concatenate files, sort, and remove adjacent duplicates
• R  S
• Scan smaller file, attempt to locate each record in larger file. (If found, add tuple to result)
• R – S
• Copy records from R to result
• Hash records of S to result. If tuple is found, delete it
Query Optimization rule of thumb
• R1: selections and projections are processed on the fly and almost never generate intermediate relations. Selections are processed as relations are accessed for the first time. Projections are processed as the results of other operators are generated.
• R2: Cross products are never formed, unless the query itself asks for them. Relations are always combined through joins in the query.
• R3: The inner operand of each join is a database relation, never an intermediate result.
Heuristic Optimization of Query Trees
• Consider following schema

customers (cid, cname, ccity, discnt)

products (pid, pname, pcity, pquantity, price)

agents (aid, aname, acity, percent)

orders (ordno, month, ocid, oaid, opid, quantity, oprice)

• Query

R: (select pid from products)

except

(select opid

from customers, orders, agents

where ccity = “Duluth” and

acity = “New York” and

cid = ocid and

aid = oaid)

Heuristic (cont’d)
• Translate the query into algebra

R: pid (products)-

opid (city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid ((customers X orders) X agents)

• Query Tree

-

pid

opid

city=“Duluth”  acity=“New York” 

cid = ocid  aid = oaid

products

X

X

customers

orders

agents

Heuristic (cont’d)

1. Replace F  F’(R)withF(F’(R))wherever possible (allow flexibility in scheduling selects)

-

pid

opid

city=“Duluth”

products

acity=“New York”

cid = ocid

aid = oaid

X

X

customers

orders

agents

Heuristic (cont’d)

2. Move select operations as close to leaves as possible

-

pid

opid

aid = oaid

products

X

cid = ocid

acity=“New York”

X

city=“Duluth”

orders

agents

customers

Heuristic (cont’d)

3. Rearrange tree so that most restrictive select executes first. Most restrictive select produces smallest result, or is one with smallest selectivity.

• Assume most restrictive select is probably acity=“New York

-

pid

opid

cid = ocid

products

X

aid = oaid

city=“Duluth”

X

acity=“New York”

orders

customers

agents

Heuristic (cont’d)

4. Replace cartesian product and adjacent select with join

-

pid

opid

><

cid = ocid

products

><

aid = oaid

city=“Duluth”

acity=“New York”

orders

customers

agents

Heuristic (cont’d)

5. Project out unnecessary attributes as soon as possible.

-

pid

opid

><

cid = ocid

products

cid

><

aid = oaid

city=“Duluth”

aid

oaid,ocid,opid

acity=“New York”

orders

customers

agents

Heuristic (cont’d)

6. Map subtrees to execution methods such as:

• A single selection or projection
• A selection followed by a projection
• A join, union, or set difference with two operands. Each input can be preceded by selections and/or projections. The output can also be followed by a selection and/or projection.
Heuristic (cont’d)

-

pid

opid

><

cid = ocid

products

cid

><

aid = oaid

city=“Duluth”

aid

oaid,ocid,opid

acity=“New York”

orders

customers

agents

Example
• Characteristics of DBMS
• Available join methods – (1) nested loop; (2) sort-merge join
• Query

SELECT emp.Name, dept.name, acnt.type

FROM emp, dept, acnt

WHERE emp.dno = dept.dno AND

dept.ano = acnt.ano AND

emp.age  50 AND

acnt.balance  10000

Example (cont’d)
• name,dname,type (emp.dno=dept.dno  dept.ano=acnt.ano  emp.age50  acnt.balance10000((emp x dept) x acnt))
• Now decide the order of join
• acnt is the third relation
• emp >< dept or dept >< emp
• emp is the third relation
• dept >< acnt or acnt >< dept

><

emp.age >= 50

acnt.balance >= 10000

dept

emp

acnt

Relations
• emp(name, age, sal, dno)
• Pages – 20, 000
• Number of tuples – 100,000
• Indexes – dense clustered B+-tree on sal (3-level deep)
• dept(dno, dname, floor, budget, mgr, ano)
• Pages – 10
• Number of tuples – 100
• Indexes – dense clustered hash table on dno (avg bucket length = 1.2 pages)
• acnt (ano, type, balance, bno)
• Pages – 100
• Number of tuples – 1000
• Indexes – dense clustered B+-tree on balance (3-level deep)
• bank (bno, bname, address)
• Pages – 20
• Number of tuples – 200
• Indexes – none
Histograms
• emp.age (assume uniform distribution)

range frequency

0< x  10 0

10< x  20 500

20< x  30 2500

30< x  40 4000

40< x  50 2000

50< x  60 800

60< x  70 200

• Acnt.balance

range frequency

0< x  100 200 / 100 = 2

100< x  500 200 / 400 = 0.5

500< x  5000 200 / 4500 = 0.044

5000< x  10000 200 / 5000 = 0.04

10000< x  50000 200 / 40000 = 0.005

50000< x <  0

Method

Method

Cost

Cost

Order

Order

Result Size

Result Size

Comment

Comment

SCAN

SCAN

20,000

10

none

none

12,000 tuples

2,400 pages

100 tuples

10 pages

Size reduced by selectivity of age <= 50

Plans for accessing relation
• Plans for retrieving tuples from emp, dept and acnt.

EMP

DEPT

Method

Cost

Order

Result Size

Comment

SCAN

100

none

200 tuples

20 pages

Size corrected for selectivity of balance >= 10000

B+-tree on balance

3+20 =23

none

200 tuples

20 pages

Plans for accessing relation (cont’d)

ACNT

Method

Cost

Order

Result Size

Comment

Nested Loop (page oriented)

20000 + 2400*10 = 44000

none

12000 tuples

2400 pages

Assume that tuple size is same as EMP tuples.

Nested Loop using hash table on dno

20000 + 12000*(1 + 1.2 + 1) = 58400

none

12000 tuples

2400 pages

Plans for joining two relations

EMP >< DEPT

Method

Cost

Order

Result Size

Comment

Nested Loop (page oriented)

10 + 10 * 20000 = 200010

none

12000 tuples

2400 pages

Plans for joining two relations (cont’d)

DEPT >< EMP

Method

Method

Cost

Cost

Order

Order

Result Size

Result Size

Comment

Comment

Nested Loop (page oriented)

Nested Loop (page oriented)

10 + 10 * 100 = 10010

23 + 20 * 10 = 223

none

none

200 tuples

20 pages

200 tuples

20 pages

Plans for joining two relations (cont’d)

DEPT >< ACNO

ACNO >< DEPT