Sponsored Links
This presentation is the property of its rightful owner.
1 / 35

# Chapter 7. Query Processing PowerPoint PPT Presentation

Chapter 7. Query Processing. Seoul National University Department of Computer Engineering OOPSLA Lab. CONTENTS. Introduction Traversal of Nodes in a Query Graph Cost Model. Introduction. Phases of Query Processing Query Graph Similarities between OO and Relational Queries Join Methods.

### Download Presentation

Chapter 7. Query Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Chapter 7. Query Processing

Seoul National University

Department of Computer Engineering

OOPSLA Lab.

### CONTENTS

• Introduction

• Traversal of Nodes in a Query Graph

• Cost Model

### Introduction

• Phases of Query Processing

• Query Graph

• Similarities between OO and Relational Queries

• Join Methods

Object Algebra

Expression

Standardized

Expression

Declarative Query

Declarative

Optimizer

Translator into

Algebraic Structure

Type Check

Execution of the

Chosen Plans

Type Consistent

Expression

Optimized Expression

Algebraic Optimizer

Generator of

Execution Plans

### Phases of Query Processing

• Evaluating relational query in two phases

• first phase for possible optimization

• second phase for actual execution

• Only small changes for OO query processing

### Terminology Associated with Query

• Simple predicate

• <name-attribute operator value>

• value can be a primitive or an object

• Complex predicate

• has an adjacent sequence of attributes on a branch of the aggregation hierarchy of a class

• Single operand query

• Multiple operand query

• Target class

### Query Graph in OODB

• A subset of the schema graph(SG)

• Definition of Query Graph (QG) for a single operand query on class C

• QG is a connected subgraph of SG for C

• QG includes only nodes of SG used in the query

• an arc from attr. to a domain in SG can be changed into an arc to a subclass of the domain in QG

• the set of arcs from C to the domains of the attributes of C form a direct graph rooted in C

• leaf nodes of acyclic branch have simple predicate, whereas others can have simple or complex predicates

### Example of Query Graph

select :T

fromTask :T

where:T man_years > 2

and:T leader specialization IN {“DB”,”IS”}

and:T participant group_name LIKE ‘%CAD%’

### Single Operand Query

• In relational data model

• on a single relation

• In OO data model

• aggregation hierarchy rooted on the target class

• implicit join

• inheritance hierarchy rooted on the target class

• more powerful than a query on a single relation

### Explicit Join

• Multiple operand query

• Joins between arbitrary pairs of classes

• Definition of explicit join between attr. Si, Sj of class Ci, Cj

• Ci, Cj are both the root of QGs that correspond to a single operand query on Ci , Cj

• QGs for Ci, Cj partially overlap and Si, Sj are both the root of a hierarchy of classes

• Query can have more than one target class

### Similarities between OO and Relational Queries

• Structure of queries is essentially same except for class hierarchies

• Implicit and explicit joins between the classes are equivalent to joins in relations

• Relational query processing techniques can be used on OO data model

• all reasonable permutations between the relations

• generation of an execution plan for each permutation

• evaluation of the cost of each execution plan

• in distributed DB, generation of a global plan

### Join Methods considering Class Permutation

• Forward order join where class D is the domain of

// attribute A of class C

• Reverse order join

for c in C do

begin

retrieve c.A;

retrieve d in D such that

OID(d) = c.A if A is single-valued

OID(d) c.A if A is multi-valued;

evaluate the predicate on d;

end.

for d in D do

begin

retrieve u = OID(d);

evaluate the predicate on d;

retrieve c in C such that

c.A = u if A is single-valued

uc.A if A is multi-valued ;

end.

### Traversal of Nodes in a Query Graph

• Visiting Classes in Query Graph

• Access Algorithms to Instances of Classes

• Query Execution Strategies

• Query Execution Graph

• Query Optimization

• Differences in Query Processing compared with the Relational Model

### Methods of Visiting Classes in QG(1)

• Forward traversal

• C1 is the root class of QG

• any depth-first order in QG

• ex) (Task Group Researcher) (Task Researcher Group)

• Reverse traversal

• C1 is a leaf class of QG

• Ci can be a leaf class of QG

• all the children classes must be visited previously than the parent class

• ex) (Group Researcher Task) (Researcher Group Task)

### Methods of Visiting Classes in QG(2)

• Mixed traversal

• C1 is any class of QG

• Ci is connected to one of C1,..., Ci-1 by a direct arc towards Ci

• Ci and one of C1 ,…, Ci-1 have a common parent in QG

• one or more children class of Ci must be in C1,…, Ci-1

• ex) (Group Task Researcher) (Researcher Task Group)

### Access Algorithms to Instances of Classes

• Nested-loop access method

• each instance is processed separately and completely

• attributes of an instance are evaluated with concerned predicates

• the instance is passed to the parent or child class

• Sort-domain (or sort-merge) methods

• all instances are instantiated and processed together

• they are passed all together to the next proper node

• storage pages of instances are accessed only once

### Query Execution Strategies

• Combining strategies for traversing QG and accessing instances

• Nested-loop forward traversal(NLFT)

• Sort-domain forward traversal(SDFT)

• Nested-loop reverse traversal(NLRT)

• Sort-domain reverse traversal(SDRT)

### Basic Operations Composing Query

• [P] projection

• [S] selection

• [] intersection

• [I] index scanning

• [NJ] nested-loop join

• [SJ] sort-domain join

• [O] arrangement

### An Example of Query

SELECT Task

WHERE man_year > 2

AND Group.group_name LIKE ‘%CAD%’

AND Researcher.specialization IN {“DB”,”IS”}

### Query Execution Graph by NLFT

group_name

LIKE ‘%CAD%’

Man_year>2

JP(1)

Group

Task

S

NJ

S

Specialization

IN {“DB” , “IS”}

JP(2)

Researcher

Group

P

NJ

S

{Task.OID}

{Task.OID}

Researcher

P

JP(1):Task.participant=Group.OID

JP(2):Task.leader=Researcher.OID

### Query Execution Graph by NLRT

{Group.OID}

Group

S

P

Group_name

LIKE ‘%CAD%’

{Task.OID}

man_year>2

NJ

P

S

P

{Task.OID}

Task

JP(1)

Specialization

IN {“DB” , “IS”}

{Researcher.OID}

Researcher

S

P

{Task.OID}

man_year>2

NJ

P

S

P

{Task.OID}

JP(2)

Task

JP(1):Task.participant=Group.OID

JP(2):Task.leader=Researcher.OID

### Considerations on Query Optimization

• Too expensive to generate all possible execution plans

• To reduce the complexity of choosing the best strategy

• analyze the structure of DB

• collect statistics on the frequencies of access

• Use of methods in queries

• difficult to estimate the cost and selectivity

• possible to depend the evaluation order

### Differences in Query Processing compared with the Relational Model

• Indices

• on aggregation hierarchy

• on inheritance hierarchy

• Multiple-valued attributes

• existential or universal quantifiers

• Methods on queries

• Database statistics

• number of instances along class hierarchy

• number of pages containing such instances

### Cost Model

• Mathematical functions to determine the cost of the various types of traversal

• Definition of Path

• Model Parameters

• Cost Functions

### Definition of Path

• Given an aggregation hierarchy H,path P is defined as C1.A1.A2....An(n1) where

• C1is a class in H

• A1 is an attribute of class C1

• Ai is an attribute of class Ci in H,such that Ci is the domain of attribute Ai-1 of class Ci-1(1<in)

• length(P) : the length of the path

• classes(P) : the set of classes along the path

• dom(P) : the domain of attribute Anof class Cn

### Logical Data Parameters(1)

• nci :number of classes in the inheritance having class Ci-1

• Dij : average number of distinct values for attribute Aiof class Ci,j

• Di :average number of values for attribute Ai, evaluated W.R.T all the members of class Ci,1

• Ni,j :cardinality of class Ci,j

• Nhi :cardinality of the set of members of class Ci,1

### Logical Data Parameters(2)

• fani,j : average number of references to members of class Ci+1,1 contained in attribute Ai of an instance of class Ci,j

• fani : average number of references to members of class Ci+1,1 contained in attribute Ai of a member of class Ci,1

• di,j : average number of instances in class Ci,j

• di : average number of members of class Ci,1

• ki,j : average number of instances in class Ci,j

• khi : average number of members of class Ci,1

### Physical Data Parameters

• Pij : number of pages for the instances of class Ci,j

• Phi :number of pages for the members of class Ci,1

• ri : binary variable which assumes the value

• 1, if the members of Ci,1 have reverse reference to the members of Ci-1,1

• 0, if not, 2  i  n

### Query Parameters

• NIi : number of members of class Ci,1 on which the nested predicate must be evaluated

• APi : number of pages effectively accessed to find the members of class Ci,1

### Derived Parameters

• RefBy(i, s, y, k) : average # of values contained in Ay for a set of kinstances of Ci,s

• RefByh(i, s, y, k) : ... for a set of kmembers of Ci,s

• : average number of instances of class Ci,j

• : average number of members of class Ci,1

• Ref(i, y, s, k) : average # of instances of class Ci,s with avalue in a set of k elements, 1  i  y  n, as the value of the nested Attr. Ay

• Refh(i, y, k) : average # of members ...

### Assumptions for Cost Model

• Independent distributions of values of the various attributes

• Uniform distribution of the values of the each attribute

• No relation between the cardinality of the instances of a class and that of the other classes in the same inheritance hierarchy

### Cost Function

• Nested predicate C1.A1...An op exp

• op is a relation operator

• exp is an expression

• evaluated with a forward or reverse traversal

• Types of distribution of the values of attribute Ai

• disjunctive distribution

• each value of is Ai used in a single class

• inclusive distribution

### NLFT Cost Functions

1.

2.

3. Assume the total # of instances = # of pages

APk = NIk

4. Total Cost

1.

2.

3.Total cost

1.

2.

2.1.

2.2.

3.

4. Total cost

1.

1.1

2.

2.1

3. Total cost