1 / 35

# Chapter 7. Query Processing - PowerPoint PPT Presentation

Chapter 7. Query Processing. Seoul National University Department of Computer Engineering OOPSLA Lab. CONTENTS. Introduction Traversal of Nodes in a Query Graph Cost Model. Introduction. Phases of Query Processing Query Graph Similarities between OO and Relational Queries Join Methods.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Chapter 7. Query Processing' - reba

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 7. Query Processing

Seoul National University

Department of Computer Engineering

OOPSLA Lab.

• Introduction

• Traversal of Nodes in a Query Graph

• Cost Model

• Phases of Query Processing

• Query Graph

• Similarities between OO and Relational Queries

• Join Methods

Expression

Standardized

Expression

Declarative Query

Declarative

Optimizer

Translator into

Algebraic Structure

Type Check

Execution of the

Chosen Plans

Type Consistent

Expression

Optimized Expression

Algebraic Optimizer

Generator of

Execution Plans

Phases of Query Processing

• Evaluating relational query in two phases

• first phase for possible optimization

• second phase for actual execution

• Only small changes for OO query processing

• Simple predicate

• <name-attribute operator value>

• value can be a primitive or an object

• Complex predicate

• has an adjacent sequence of attributes on a branch of the aggregation hierarchy of a class

• Single operand query

• Multiple operand query

• Target class

• A subset of the schema graph(SG)

• Definition of Query Graph (QG) for a single operand query on class C

• QG is a connected subgraph of SG for C

• QG includes only nodes of SG used in the query

• an arc from attr. to a domain in SG can be changed into an arc to a subclass of the domain in QG

• the set of arcs from C to the domains of the attributes of C form a direct graph rooted in C

• leaf nodes of acyclic branch have simple predicate, whereas others can have simple or complex predicates

select :T

where :T man_years > 2

and :T leader specialization IN {“DB”,”IS”}

and :T participant group_name LIKE ‘%CAD%’

• In relational data model

• on a single relation

• In OO data model

• aggregation hierarchy rooted on the target class

• implicit join

• inheritance hierarchy rooted on the target class

• more powerful than a query on a single relation

• Multiple operand query

• Joins between arbitrary pairs of classes

• Definition of explicit join between attr. Si, Sj of class Ci, Cj

• Ci, Cj are both the root of QGs that correspond to a single operand query on Ci , Cj

• QGs for Ci, Cj partially overlap and Si, Sj are both the root of a hierarchy of classes

• Query can have more than one target class

• Structure of queries is essentially same except for class hierarchies

• Implicit and explicit joins between the classes are equivalent to joins in relations

• Relational query processing techniques can be used on OO data model

• all reasonable permutations between the relations

• generation of an execution plan for each permutation

• evaluation of the cost of each execution plan

• in distributed DB, generation of a global plan

• Forward order join where class D is the domain of

// attribute A of class C

• Reverse order join

for c in C do

begin

retrieve c.A;

retrieve d in D such that

OID(d) = c.A if A is single-valued

OID(d) c.A if A is multi-valued;

evaluate the predicate on d;

end.

for d in D do

begin

retrieve u = OID(d);

evaluate the predicate on d;

retrieve c in C such that

c.A = u if A is single-valued

uc.A if A is multi-valued ;

end.

• Visiting Classes in Query Graph

• Access Algorithms to Instances of Classes

• Query Execution Strategies

• Query Execution Graph

• Query Optimization

• Differences in Query Processing compared with the Relational Model

• Forward traversal

• C1 is the root class of QG

• any depth-first order in QG

• Reverse traversal

• C1 is a leaf class of QG

• Ci can be a leaf class of QG

• all the children classes must be visited previously than the parent class

• Mixed traversal

• C1 is any class of QG

• Ci is connected to one of C1,..., Ci-1 by a direct arc towards Ci

• Ci and one of C1 ,…, Ci-1 have a common parent in QG

• one or more children class of Ci must be in C1,…, Ci-1

• Nested-loop access method

• each instance is processed separately and completely

• attributes of an instance are evaluated with concerned predicates

• the instance is passed to the parent or child class

• Sort-domain (or sort-merge) methods

• all instances are instantiated and processed together

• they are passed all together to the next proper node

• storage pages of instances are accessed only once

• Combining strategies for traversing QG and accessing instances

• Nested-loop forward traversal(NLFT)

• Sort-domain forward traversal(SDFT)

• Nested-loop reverse traversal(NLRT)

• Sort-domain reverse traversal(SDRT)

• [P] projection

• [S] selection

• [] intersection

• [I] index scanning

• [NJ] nested-loop join

• [SJ] sort-domain join

• [O] arrangement

WHERE man_year > 2

AND Researcher.specialization IN {“DB”,”IS”}

group_name

Man_year>2

JP(1)

Group

S

NJ

S

Specialization

IN {“DB” , “IS”}

JP(2)

Researcher

Group

P

NJ

S

Researcher

P

{Group.OID}

Group

S

P

Group_name

man_year>2

NJ

P

S

P

JP(1)

Specialization

IN {“DB” , “IS”}

{Researcher.OID}

Researcher

S

P

man_year>2

NJ

P

S

P

JP(2)

• Too expensive to generate all possible execution plans

• To reduce the complexity of choosing the best strategy

• analyze the structure of DB

• collect statistics on the frequencies of access

• Use of methods in queries

• difficult to estimate the cost and selectivity

• possible to depend the evaluation order

• Indices

• on aggregation hierarchy

• on inheritance hierarchy

• Multiple-valued attributes

• existential or universal quantifiers

• Methods on queries

• Database statistics

• number of instances along class hierarchy

• number of pages containing such instances

Cost Model Model

• Mathematical functions to determine the cost of the various types of traversal

• Definition of Path

• Model Parameters

• Cost Functions

Definition of Path Model

• Given an aggregation hierarchy H,path P is defined as C1.A1.A2....An(n1) where

• C1is a class in H

• A1 is an attribute of class C1

• Ai is an attribute of class Ci in H,such that Ci is the domain of attribute Ai-1 of class Ci-1(1<in)

• length(P) : the length of the path

• classes(P) : the set of classes along the path

• dom(P) : the domain of attribute Anof class Cn

• nci :number of classes in the inheritance having class Ci-1

• Dij : average number of distinct values for attribute Aiof class Ci,j

• Di :average number of values for attribute Ai, evaluated W.R.T all the members of class Ci,1

• Ni,j :cardinality of class Ci,j

• Nhi :cardinality of the set of members of class Ci,1

• fani,j : average number of references to members of class Ci+1,1 contained in attribute Ai of an instance of class Ci,j

• fani : average number of references to members of class Ci+1,1 contained in attribute Ai of a member of class Ci,1

• di,j : average number of instances in class Ci,j

• di : average number of members of class Ci,1

• ki,j : average number of instances in class Ci,j

• khi : average number of members of class Ci,1

• Pij : number of pages for the instances of class Ci,j

• Phi :number of pages for the members of class Ci,1

• ri : binary variable which assumes the value

• 1, if the members of Ci,1 have reverse reference to the members of Ci-1,1

• 0, if not, 2  i  n

Query Parameters Model

• NIi : number of members of class Ci,1 on which the nested predicate must be evaluated

• APi : number of pages effectively accessed to find the members of class Ci,1

Derived Parameters Model

• RefBy(i, s, y, k) : average # of values contained in Ay for a set of kinstances of Ci,s

• RefByh(i, s, y, k) : ... for a set of kmembers of Ci,s

• : average number of instances of class Ci,j

• : average number of members of class Ci,1

• Ref(i, y, s, k) : average # of instances of class Ci,s with avalue in a set of k elements, 1  i  y  n, as the value of the nested Attr. Ay

• Refh(i, y, k) : average # of members ...

• Independent distributions of values of the various attributes

• Uniform distribution of the values of the each attribute

• No relation between the cardinality of the instances of a class and that of the other classes in the same inheritance hierarchy

Cost Function Model

• Nested predicate C1.A1...An op exp

• op is a relation operator

• exp is an expression

• evaluated with a forward or reverse traversal

• Types of distribution of the values of attribute Ai

• disjunctive distribution

• each value of is Ai used in a single class

• inclusive distribution

NLFT Cost Functions Model

1.

2.

3. Assume the total # of instances = # of pages

APk = NIk

4. Total Cost

SDFT Cost Functions Model

1.

2.

3.Total cost

NLRT Cost Functions Model

1.

2.

2.1.

2.2.

3.

4. Total cost

SDRT Cost Functions Model

1.

1.1

2.

2.1

3. Total cost