Query Optimization in Object Databases

Query Optimization in Object Databases Georges GARDARIN Laboratoire PRiSM/UVSQ G. Gardarin

1. Introduction • Object models provide ADTs, inheritance, complex structures, relationships and object identity • Query Optimizers transform query in query plans composed of low level operations to evaluate on the object collections • New techniques are required for supporting the object-oriented features

Outline • Object Query Languages • Complex Object Algebra • Operator Algorithms • Query Plan Transformations • Cost Models • Search Strategies • Open Problems

Overview • Presentation of the various topics of Query Processing in OODBMSs • Topics are not independent • Operators depend on data structures (index) • Search strategies depend on cost model • An optimizer has to consider all aspects • Complex piece of software • Has to be extensible for additional features • data types • access methods • operators

Vocabulary • Collection : a set, list, array or bag of objects • Query : a user query in high level language • Predicate : a term of a query criteria • Qualification : a logical expression of predicates • Operator : a low level accessor to 1 or several collections • Annotation : the selected algorithm for executing an operator • Query plan : a program of annotated operator • Cluster : a group of related objects stored together in a bucket • Index : an accelerator by value of an attribute • Path index : an accelerator by values along a path

2. Object Query Languages • Extension of SQL with : • user defined functions in predicates and results • user defined comparison predicates • path expressions to traverse relationships • flattening, grouping and degrouping operators • automatic scan of inheritance hierarchies • Two “standards” are under construction : • The object standard of ODMG (OQL) • The object-relational standard of ISO/ANSI (SQL3)

Int Database Example (1) Vehicle Maker Number Color Company String String President Name City Employee String String Ssn BirthDate Name Float String

Query Example • Object identity : • SELECT E.Name, C.Name • FROM Employee E, Company C • WHERE C.President == E • Paths and method : • SELECT Number • FROM Vehicle • WHERE Color = "Red" • AND Vehicle.Maker.City= "Paris" • AND Vehicle.Maker.President.age() < 50

String String String Int Int Int Database Example (2) Company Employs Name City Person Owns Name Age Vehicle Number Power

Qualified Path Expression • OQL form : SELECT C.Name, P.Name, V.Number FROM C IN Companies, P IN C.Employs, V IN P.Owns WHERE C.City="Paris" AND P.Age<30 AND V.Pow>10 • Direct form : SELECT C.Name, P.Name, V.Number FROM Companies C, Persons P, Vehicles V WHERE C[City="Paris"].Employs.P[Age<30].Owns.V[Pow>10]

Exercice Queries • Express in OQL, then with qualified path expressions, a set of given queries.

3. Complex Object Algebra • Generalization of relational algebra • Set-oriented processing of objects • A set of operations on collections of objects generating collections of objects • Different types of collections : • class extent, set, bag, list, array • Any query can be expressed as a complex object algebra expression • Logical algebra annotated for execution

The LORA Algebra • Finance and Gardarin 1991 LoraOp SearchOp UpdateOp TransactOp GroupOp Filter Map Join Sort SetOp RJoin VJoin - RemoveDup - Aggregate - Nest - Unnest - Union - Intersect - Difference

JOIN:Col,Exp,Col =>Col OUTER_JOIN: Col,Exp,Col => Col SORT: Col, Exp => Col AGG: Col, Exp, Exp => Col NEST: Col, Nest_Exp => Col UNNEST: Col, Nest_exp =>Col RDUPLICATE: Col, Exp => Col FILTER: Col, Qual =>Col MAP: Col, Exp,Qual => Col MINUS: Col, Col => Col DIVIDE: Col, Col =>Col UNION: Col, Col => Col OUTER_UNION: Col, Col => Col INTERSECT: Col, Col=> Col UPDATE: Col, Col, Ident, Assignement =>Col INSERT: Col, Col => Col DELETE: Col, Col, Ident => Col Main Operator Signatures

Algebraic Tree • SELECT Number • FROM Vehicle • WHERE Color = "Red" • AND Vehicle.Maker.City= "Paris" • AND Vehicle.Maker.President.age() < 50 Filter(*,Number) RJoin(Maker) Filter(City="Paris",*) RJoin(President) Filter(Color="Red") Filter(age()<50,*) Vehicle Company Employee

The ENCORE Algebra (1) • Shaw and Zdonik 1990 • Select(InputCollection, p) = {s(s in InputCollection)  p(s)} • Image(InputCollection, f : T) = {f(s)s in InputCollection} • Project(InputCollection,<(A1, f1), ...,(An, fn)>) = {<A1 : f1(s), ...,An : fn(s)>(s in InputCollection)}

The ENCORE Algebra (2) • Nest(InputCollection,Ai) = {<A1 : s.A1, ...,Ai : t, ...,An : s.An>r s (r in t  s in InputCollection  s.Ai = r)} • UnNest(InputCollection,Ai) = {<A1 : s.A1, ...,Ai : t, ...,An : s.An>s in InputCollectiont in Ai} • Flatten(InputCollection) = {rt in InputCollection  r in t} • DupEliminate (InputCollection) • Coalesce (InputCollection, Ai)

The ENCORE Algebra (3) • OJoin(InputCollection1,InputCollection2,A1,A2,p) ={<A1 : s, A2 : r>s in InputCollection1  r in InputCollection2  p(s,r)} • Set-oriented operations : • Union • Intersection • Difference with set membership based on object identity

The OFL Operators • Gardarin & Machucca 1995 • Navigational traversal often interesting : • Existential quantification • Better control of query plans, smaller granularity • Mixing navigational and set-oriented traversal • Based on Bachus’ functional approach • Processing of collections of objects • Side effect introduced through cursors

The OFL Language • Definition : Abstract Collection • A container of objects encapsulated by a finite set of behavioral and traversal functions. • Constructions : • Composition f.g (x) = f(g(x)) • Path expressions f0.f1....fn(x) • Conditional If_Then_Else (p, f1, f2) (x) • Iteration While (p,f) • Sequence Sequence(f1, f2, …, fn)

Collection Traversal in OFL • Quantified function Apply to all • A second order function of signature ForAll(C, p, f) that applies a function f to all objects of a collection C satisfying a predicate p. • Quantified function Apply to any • A second order function of signature ForAny(C, p, f) that applies a function f to any object of a collection C satisfying a predicate p. • Iterator and Annotations • Each quantified function works on an iterator • Set-oriented or navigational traversal is selected

Birthyear Price String Salary Composed Owner Person Vehicle Part Color LastName PartLabel String Database Example

Translating Query in OFL • SELECT p.lastname • FROM p in Person • WHERE exists v in p.owner : v.color = "Red" • ForAll(Person P, null, ForAny(Owner(P) V, StringEqual(Color(V),"Red"), LastName(P) ) );

A More Complex Query • OQL Query • SELECT tuple(p.lastname, v.price, c.partlabel) • FROM p in Person, v in p.owner, c in v.composed • WHERE p.age = 16 and v.price=c.price • OFL translation : • ForAll(Find(AgeIndex,16) P, null, • ForAll(Owner(P) V, null, • ForAll(Composed(V)T,IntegerEqual(Price(V),Price(T)), • Tuple (LastName(P), Price(V), PartLabel(T))))));

Further Operators • Recursive operators • FixPoint(ResultCollection, InitializationExpression,RecursivePredicate, RecursiveExpression, FinalExpression) • gives the OFL program: Sequence(OFLInitializationExpression, While(OFLRecursivePredicate, OFLRecursiveExpression), OFLFinalExpression)

Exercice Algebra • Write in OFL the definition of LORA operations • exemple : Join(InputCollection1,InputCollection2,ResultCollection, JoinPredicate, ProjectionExpression) • gives the OFL program: ForAll(InputCollection1,null, ForAll(InputCollection2, OFLJoinPredicate, InsertResultCollection(ResultCollection, OFLProjectionExpression))))

4. Algebraic Operator Algorithms • Classical relational operators still valid ... • Filtering with a predicate (Restriction) • Sequential scan • Index scan, clustered or non-clustered • Value-based join • Nested loop join : • iterate on the outer collection and compare each outer object with each object in the inner collection • Merge join : • sort on join fields the two collections and then merge • Hash join : • hash the outer collection on join fields, scan the inner table and probe the hashed collection

Path Traversals • Paths may involve multiple collections • Each collection can be qualified by predicates

Depth-First-Fetch • Depth-First-Fetch (DFF) is the natural algorithm for evaluating a path expression. • It follows the path from the root to the target collection, using a depth first graph traversal algorithm. • The corresponding operator is an n-ary operator denoted DFF. • Advantages : • no intermediate results, simple pointer chasing • result are assembled one at a time allowing pipeline • efficient when the memory size is large enough to avoid swapping of objects

Breadth-First-Fetch • Breadth-First-Fetch (BFF) traversal processes the tree of objects using a Forward Join (FJ) algorithm which is based on pointer chasing between two collections. • Successive binary joins of collections are performed from the source collection to the target, following the path in a forward order. • Advantages : • no multiple fetch of objects • requires the construction of hashed support table to memorize FJ results

Reverse-Breadth-First-Fetch • Reverse-Breadth-First-Fetch (RBFF) performs a sequence of binary joins between two neighbor collections to traverse the path, but it proceeds in the reverse order of the path. • Thus, each join is called a Reverse Join (RJ). The join criterion is the member-ship of the second collection object identifier to the first collection pointer attribute values. • Advantages : • efficient when predicate(s) in last collection(s) selective • requires supporting tables and value-based joins

Illustration of BFF & RBFF

Further Algorithms • Various algorithms are available for each operator • Combination of operators can be applied : • to traverse long paths • to derive new algorithms • hash both & sort buckets & merge buckets • limited breadth-first-fetch • Cost is dependent of many factors : • physical organization of objects • size of collections • selectivity of predicates • available memory size • possible degree of parallelism

Exercice Algorithms • List all the possible annotated query plans to process the query : • SELECT C.Name, P.Name, V.Number • FROM C IN Companies, P IN C.Employs, V IN P.Owns • WHERE C.City="Paris" AND P.Age<30 AND V.Pow>10 • over the database schema : Company Employs Name City Person Owns String String Name Age Vehicle String Int Number Power Int Int

5. Query Plan Transformations • Query rewrite : Algebraic rewrite of query tree • semantic transformations based on properties of data types and integrity constraints • syntactic transformation based on properties of operators • Query planning : Selection of best algorithms • annotation of logical operators with selected algorithms • cost of an annotated algorithm often dependent of result of previous algorithm • e.g., no sort needed if result sorted • Query rewrite and query planning are not independent

Extensible Optimizers • Closed Optimizer • set of operators and transformations fixed • heuristic-based or cost-based selection of plans • efficient but hard to modify and extend • e.g., Oracle 7.3, SQL Server 10, ... • Extensible Optimizer • extensible set of operators and transformations • rule-based generation of query plans • selection of "best" plan using a search strategy • e.g., Exodus, Starbust and DB2 CS, Esprit EDS & IDEA, Illustra, ...

Rewrite Rule Base Common Expression Detection Syntactic Optimization Semantic Optimization Predicate Simplification From Gardarin, Finance DKE 93 Query Plan Cost model Heuristics Modular Rule Base STRATEGY Optimized Query Plan

Syntactic Rewrite Rules (1) • Restrict through Union Pushing Rule : Restrict(Union(C1,C2)) <=> Union(Restrict(C1,C2)) • Restrictions through Super Class Pushing Rule : Restrict (Super(C1,C2)) <=> Super(Restrict(C1),Restrict(C2))

Syntactic Rewrite Rules (2) • Join Commutativity Join (C1,C2) <=> Join (C2,C1) • Join Associativity (C1 Join C2) Join C3) <=> C1 Join (C2 Join C3) • Restrict through Join Pushing Rule Restrict(Join(C1,C2)) <=> Join(Restrict(C1),Restrict(C2))

Planning Rules • Join method choice JoinNL (C1,C2) <=> JoinSM (C1,C2) JoinHP (C1,C2) <=> JoinSM (C1,C2) • Depth First Fetch introduction DFF(C1,C2,C3) <=> Join(C1,Join(C2,C3)) • Index Scan introduction Scan(C1,P) <=> Scan(IScan(C1,I(P)),P~I(P))

Semantic Rules • Integrity constraints • Type(x) = Square <=> • Type(x) = Polygon and large(x) = long(x) • User function properties • draw(x+y) = draw(x) + draw(y)

What Rule Language ? • Rules are often complex to express • Conditions on qualifications, operators, results, ... • Proposed rule languages : • C rewriting procedure [Exodus, Starbust] • if <C procedure> is true then <C procedure> • Practical but hard to extend optimizer • Side effective rule language [Finance91] • WHEN <Query Expression> IF <Condition> • THEN <Query Expression'> UNDER <Action> • Complex to implement for pattern matching • OQL Query equivalence [Florescu95] • Parametrized Query ~ Parametrized Query • Lack of generality (e.g., query planning not possible)

Choice of Best Query Plan Algebraic Tree Database Schema Query Plans Transformation Rule base Query Plan Generator Search Strategy Cost Model action:  { } cost:  float goak: boolean "Best" Query Plan

Exercice Rules • Given a linear path expression from collection 1 to i, determine the number of distinct query plans to process the query, assuming that 3 algorithms are available to process any path expression (DFF, BFF, RBFF) • Give the rule base to generate those plans ........ Ci C1 C2 Ci-1

6. Cost Models • Extension of relational cost model to handle : • Object identifiers • Path indexes • Object linking and embedding • Clustering • Takes into account CPU cost and I/O cost : • CPU cost =  * Number of examined objects • I/O cost =  * Number of pages read

Collection Parameters • |C| = number of pages of collection C • ||C|| = number of objects of collection C • |Ci| = number of pages of cluster i of collection C • ||Ci|| = number of objects of cluster i of collection C • SC = average object size in collection C • SProj = average size of projection result • M = available memory size • Sel(Qual) = selectivity of qualification Qual • Sel(Pred) = selectivity of indexed predicate Pred

I/O Scan Formulas • Sequential scan • I/O cost =I/OScan + I/OResult • I/OScan = |C| • I/OResult =Sel(Qual)*|C|*SProj/SC - M if > 0 else 0 • Unclustered index scan • I/O cost = I/OIndex + I/OHit+I/OResult • I/OIndex = Blevel(I) • I/OHit = Yao(|C|,||C||,Sel(Pred)*|C|) • Clustered index scan • I/O cost = I/OIndex + I/OHit+I/OResult • I/OHit = Sel(Pred)*|C|

Object Clustering • Clustering par classe • Regroupement de toutes les instances d'une même classe dans un même fichier • Clustering par composition • Regroupement d' un objet d'une classe avec un ou plusieurs de ses objets composants. • Placement adapté aux parcours de chemin • Clustering aléatoire • les objets sont placés dans l'ordre de leur création, dans un espace unique.

COMPANY PRODUCT Clustered Collection Cases • Default clustering • Simple clustering Cluster objects on disk Reduce the number of IO’s (Placement trees are represented by directed graphs) COMPANY

COMPANY PRODUCT COMMAND More Clustering Cases • Conjunctive clustering • Disjunctive clustering COMPANY PROPOSAL 5 10 COMMAND

Query Optimization in Object Databases

Query Optimization in Object Databases

Presentation Transcript

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization