16 4 estimating the cost of operations
Download
1 / 16

16.4 Estimating the Cost of Operations - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

16.4 Estimating the Cost of Operations. Dongyi Jia CS257 ID:116 Spring 2008. Agenda. Possible Physical Plan Estimating Sizes of Intermediate Relations Estimating the Size of a Project Estimating the Size of a Selection Estimating the Size of a Join

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 16.4 Estimating the Cost of Operations' - dugan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
16 4 estimating the cost of operations

16.4 Estimating the Cost of Operations

Dongyi Jia

CS257 ID:116

Spring 2008


Agenda
Agenda

  • Possible Physical Plan

  • Estimating Sizes of Intermediate Relations

  • Estimating the Size of a Project

  • Estimating the Size of a Selection

  • Estimating the Size of a Join

  • Natural Joins With Multiple Join Attributes

  • Joins of Many Relations

  • Estimating Sizes of Other Operations


Select physical plan
Select Physical Plan

  • An order and grouping for associative-and-commutative operations.

  • An Algorithm for each operator in the logical plan.

  • Additional operators that are needed for the physical plan but that were not present explicitly in the logical plan.

  • The way in which arguments are passed from one operator to the next.


Estimating sizes of intermediate relations
Estimating Sizes of Intermediate Relations

Rules for estimating the number of tuples in an intermediate relation:

  • Give accurate estimates

  • Are easy to compute

  • Are logically consistent


Estimating the size of a projection
Estimating the Size of a Projection

The projection is different from the other operators, in that the size of the result is computable. Since a projection produces a result tuple for every argument tuple, the only change in the output size is the change in the lengths of the tuples.


Estimating the size of a selection 1
Estimating the Size of a Selection(1)

  • Let , where A is an attribute of R and C is a constant. Then we recommend as an estimate:

    T(S) =T(R)/V(R,A)

    The rule above surely holds if all values of attribute A occur equally often in the database.


Estimating the size of a selection 2
Estimating the Size of a Selection(2)

  • If , then our estimate for

    T(s) is: T(S) = T(R)/3

  • We may use T(S)=T(R)(V(R,a) -1 )/ V(R,a) as an estimate.

  • When the selection condition C is the Andof several equalities and inequalities, we can treat the selection as a cascade of simple selections, each of which checks for one of the conditions.


The zipfian distribution
The Zipfian Distribution

  • Zipfian distribution: the frequencies of the ith most common values are in proportion to .

  • As long as the constant in the selection condition is chosen randomly, the average size of matching set will still be T(R)/V(R,a).


Estimating the size of a selection 3
Estimating the Size of a Selection(3)

  • A less simple, but possibly more accurate estimate of the size of is to assume that C1 and of which satisfy C2, we would estimate the number of tuples in S as

    In explanation, is the fraction of tuples that do not satisfy C1, and is the fraction that do not satisfy C2. The product of these numbers is the fraction of R’s tuples that are not in S, and 1 minus this product is the fraction that are in S.


Estimating the size of a join
Estimating the Size of a Join

  • two simplifying assumptions:

    1. Containment of Value Sets

    2. Preservation of Value Sets

    Under these assumptions, we estimate

    T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))


Natural joins with multiple join attributes
Natural Joins With Multiple Join Attributes

Of the T(R)T(S) pairs of tuples from R and S, the expected number of pairs that match in both y1 and y2 is:

T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2))

In general, the following rule can be used to estimate the size of a natural join when there are any number of attributes shared between the two relations.

● The estimate of the size of R S is computed by multiplying T(R) by T(S) and dividing by the largest of V(R,y) and V(S,y) for each attribute y that is common to R and S.


Joins of many relations 1
Joins of Many Relations(1)

  • rule for estimating the size of any join

    Start with the product of the number of tuples in each relation. Then, for each attribute A appearing at least twice, divide by all but the least of V(R,A)’s.

    We can estimate the number of values that will remain for attribute A after the join. By the preservation-of-value-sets assumption, it is the least of these V(R,A)’s.


Joins of many relations 2
Joins of Many Relations(2)

Based on the two assumptions-containment and preservation of value sets:

  • No matter how we group and order the terms in a natural join of n relations, the estimation of rules, applied to each join individually, yield the same estimate for the size of the result. Moreover, this estimate is the same that we get if we apply the rule for the join of all n relations as a whole.


Estimating sizes for other operations 1
Estimating Sizes for Other Operations(1)

  • Union: the average of the sum and the larger.

  • Intersection:

    approach1: take the average of the extremes, which is the half the smaller.

    approach2: intersection is an extreme case of the natural join, use the formula

    T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))


Estimating sizes for other operations 2
Estimating Sizes for Other Operations(2)

  • Difference: T(R)-(1/2)*T(S)

  • Duplicate Elimination: take the smaller of (1/2)*T(R) and the product of all the V(R, )’s.

  • Grouping and Aggregation: upper-bound the number of groups by a product of V(R,A)’s, here attribute A ranges over only the grouping attributes of L. An estimate is the smaller of (1/2)*T(R) and this product.



ad