Chase backchase a versatile tool for query optimization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

Chase & Backchase: A Versatile Tool for Query Optimization PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Chase & Backchase: A Versatile Tool for Query Optimization. Lucian Popa , Alin Deutsch, Arnaud Sahuguet, Val Tannen University of Pennsylvania. Logical vs. Physical. Unfortunately, limited cooperation !. A separation is necessary for physical data independence:. Network. Logical schema.

Download Presentation

Chase & Backchase: A Versatile Tool for Query Optimization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chase backchase a versatile tool for query optimization

Chase & Backchase:A Versatile Tool for Query Optimization

Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen

University of Pennsylvania


Logical vs physical

Logical vs. Physical

Unfortunately, limited cooperation !

  • A separation is necessary for physical data independence:

Network

Logical schema

Physical schema

Physical optimization

Logical optimization


Optimization techniques

Optimization Techniques

Traditional

More “exotic”

  • Some rewriting (unnesting)

  • Acces path (index) use (ad-hoc)

  • Join ordering (dynamic programming)

  • Semantic optimization (more rewriting)

  • Use of materialized views (rewriting, too)

  • Object-oriented techniques(everything above, but OO! )

Logical

Physical

Physical

Logical

Both ?!

Mixed

  • Limited cooperation, so far, because of :

    • different foundations

    • different affiliation: logical or physical


Bridges

Bridges

Use of materialized views

  • What is needed: interaction between all these techniques.

    • can produce better plans, by enabling each other:

  • This work connects together some of these techniques, by finding a common foundation

Semantic optimization

Use of indexes


In a coco nutshell

In a (Coco)Nutshell

  • The Unifier: Constraints (Dependencies):

    • logical constraints:

      • semantic relationships among elements of the logical schema

    • physical constraints:

      • semantic relationships between elements of the physicalschema and elements of the logical schema

  • Chase with constraints produces the universal plan UP:

    • UP incorporates all relevant access paths

    • subqueries of UP provide the search space (candidate plans)

  • Backchase:

    • search among candidate plans for scan-minimal queries, checking for equivalence using (reverse) chase

    • search space can be pruned using cost-based optimization


Talk outline

Talk Outline

  • Motivation and Overview

  • Logical and Physical Constraints,

    Chase and Backchase (VLDB’99)

  • Theoretical Results

    (ICDT’99, VLDB’99, recent improvements)

  • Using Cost Information

  • Experimental Results

    (SIGMOD’00, recent improvements)

  • Conclusion


Chase backchase a versatile tool for query optimization

fresh base type for oids

Dept : Dict < Doid,

Struct { string DName;

Set <string> DProjs; }>

Logical Schema (ODMG syntax)

Proj : Set <

Struct { string PName;

string PDept;

string CustName;

string Budget;} >

class Dept (extent depts){

attribute string DName;

attribute Set<string> DProjs;}

  • To formally describe classes and their operations: use dictionaries (finite functions).


Translating oo queries into dictionary form

Translating OO Queries into Dictionary Form

selectstruct(PN: s, DN: d.DName)

from depts d , d.DProjs s

OO

selectstruct(PN: s, DN: Dept [d].DName)

fromdomDept d , Dept [d].DProjs s

Dict

“domain” as extent

“lookup” for oid dereferencing

  • Dictionary operations: dom M, M[k]

  • Constraints are translated in a similar way


Logical schema constraints

Logical Schema Constraints

  • Describe semantic relationships among elements of the logical schema

  • Example: an inverse relationship between Proj and Dept:

(RIC1) (d domDept) (sDept [d].DProjs)

 (p  Proj) s = p.PName

(INV2) (p Proj) (d domDept)

p.PDept = Dept [d].DName 

 (s Dept [d].DProjs)p.Pname = s

+ two others …


Physical schema

Physical Schema

  • Two indexes for relation Proj :

Primary Index on PNameI : Dict <string, T>

Secondary Index on CustNameSI : Dict <string, Set < T >>

(where T is the type of tuples in Proj)

  • A materialized view (a la “join index”) :

  • JI : Set< Struct{DoidJDoid; string JPN} >

  • JI = select struct (JDoid: d, JPN: p.PName)

  • fromdomDept d, Dept [d].DProjs s, Proj p

  • where s = p.PName


Physical schema constraints

Physical Schema Constraints

Describe semantic relationship between elements of the logical schema and elements of the physical schema.

One of the constraints for SI :

  • (SI1) (p  Proj) (k  domSI)

  • (t  SI[k]) [ k = p.CustName andp = t ]

  • One of the constraints for JI :

  • (JI1) (d domDept) (s Dept [d].DProjs) (p Proj)

  • [ s = p. PName

  • ( j JI) j.JDoid = dand j.JPN = p.Pname

  • ]


  • An example of interaction in optimization

    An Example of Interaction in Optimization

    A user (logical) query

    selectstruct(PN: s, PB: p.Budg, DN: Dept [d].DName)

    fromdomDept d, Dept [d].DProjs s , Proj p

    where s = p.PName and p.CustName = “CitiBank”

    • The query is chased with all the applicable logical and physical constraints (eg., RIC1, SI1, JI1)

    • The result of the chase is the universal plan.


    The generalized chase

    The Generalized Chase

    distinct!

    distinct!

    selectO(r1,…,rm)

    from R1r1,…,Rmrm,

    S1s1,…,Snsn

    where B1(r1,…,rm)and

    B2(r1,…,rm,s1,…,sn)

    d

    Let d be the constraint:

    (r1R1) … (rmRm) [B1(r1,…,rm)

    (s1S1) … (snSn) B2(r1,…,rm,s1,…,sn)

    ]

    selectO(r1,…,rm)

    from R1r1,…,Rmrm

    where B1(r1,…,rm)


    A universal plan

    A Universal Plan

    Added by chase

    selectstruct(PN: s, DN: Dept [d].DName)

    fromdomDept d , Dept [d].DProjs s ,

    Proj p, JI j, dom SI k, SI [k] t, dom I i

    where s = p.PNameand p.CustName = “CitiBank”

    andDept[d].DName = p.PDeptand j.DOID = d and j.PN = p.PName

    and p.CustName = k and p = t and i = p.PNameand p = I[i]

    U:

    • U gathers those elements from both logical and physical schema, that are relevant for alternative implementations

    • U is redundant


    Top down backchase minimization

    (Top-Down) Backchase Minimization

    • Eliminates the redundancies from the universal plan

    • Enumerates subqueries of UP:

      eliminates scans top-down, as long as equivalence is preserved;

      equivalence is verified by a (reverse) chase with applicable constraints (eg., INV2)

    • Outputs several scan-minimal subqueries


    Chase top down backchase

    Chase & (Top-Down) Backchase

    Universal Plan

    U

    BackChase

    Chase

    ...

    d’m

    dn

    ...

    ...

    dn

    d1

    d’1

    d1

    Q1

    . . .

    Qn

    . . .

    Minimal subqueries of U

    More minimization possible ...

    Original Query

    Q0

    Cost-based optimization still needed


    Some of the generated candidate plans

    Some of the Generated Candidate Plans

    Plan 1: Relation Scan

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromProj p

    where p.CustName = “CitiBank”

    Plan 2: Secondary Index Lookup

    Different access paths, but equivalent

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromSI[“Citibank”] p

    Plan 3: Using the Join Index and the Primary Index

    selectstruct(PN: j.JPN, PB: I[j.JPN].Budg, DN: Dept[j.JDoid].DName)

    fromJI j

    whereI[j.JPN].CustName = “CitiBank”


    Talk outline1

    Talk Outline

    • Motivation and Overview

    • Constraints, Chase and Backchase

    • Theoretical Results

    • Using Cost

    • Experimental Results

    • Conclusion


    Chase backchase a versatile tool for query optimization

    No set/dictionary type here

    Path-Conjunctive Language

    Paths: P ::= x | c | R | P.A | dom P | P[x]

    Path-Conjunctions: B ::= P1 = P1’ and ... and Pk = Pk’

    Path-Conjunctive (PC) Queries:

    selectstruct (A1: P1’, ..., An: Pn’)

    from P1 x1, ..., Pm xm

    whereB

    Embedded PC Dependencies (EPCDs):

    (r1P1) … (rmPm) [ B1(r1,…,rm) 

    (s1P’1) … (snP’n) B2(r1,…,rm,s1 ,…, sn) ]

    Still very expressive!


    Overview of pc query containment results

    Overview of PC Query Containment Results

    • The chase is also complete for EPCD implication

    • Strengthen results of Chandra & Merlin, Beeri & Vardi (70’s & 80’s)


    Completeness of c b

    Completeness of C&B

    • Assume the following:

    • Logical constraints: Da set of EPCDs

    • Physical schema:

      • primary indexes

      • materialized views that are PC queries (includes join indexes and access suport relations)

      • access structures representable as dictionary expressions with PC query domain and entry (includes secondary indexes and gmaps [Tsatalos et al])

    • Physical constraints: C(eg., SI1, JI1)

    • Cis a set ofEPCDs. (In general the constraints in C are not full.)


    Chase backchase a versatile tool for query optimization

    Completeness of C&B (continued)

    Theorem (Completeness)

    Let Q be a PC query such that some chasing sequence of Q withD terminates (with chaseD(Q)). Then:

    (a) chase C(chase D (Q))terminates

    (b) any scan-minimal candidate plan equivalent to Q under D is a subquery of chase C(chase D (Q))

    Strengthens results of Levy at al, new search space


    Talk outline2

    Talk Outline

    • Motivation and Overview

    • Constraints, Chase and Backchase

    • Theoretical Results

    • Interaction with Cost-Based Optimization

    • Experimental Results

    • Conclusion


    Using cost

    Using Cost

    Want to

    • reduce the size of the backchase search space

    • output a scan-minimal subquery that is also cost-minimal

      Cost monotonicity assumption (made implicitly earlier): subqueries are cheaper

      Hence the top-down backchase cannot exploit cost!

      Instead, a bottom-up backchase enumerates queries

      built from the scans of the universal plan, checking for

      equivalence with the universal plan and

      cost minimality

      and pruning all superqueries


    Bottom up backchase with cost based pruning

    Bottom-Up Backchase with Cost-Based Pruning

    Scans of U:

    Universal Plan

    S1, S2, S3, S4

    U

    Chase

    ...

    ...

    S1, S3, S4

    S1, S2, S3

    S1, S2, S4

    dn

    subqueries of size 2

    S1, S3

    S1, S4

    ...

    S1, S2

    S3, S4

    d1

    subqueries of size 1

    S1

    S2

    S3

    S4

    Original Query

    Q0

    Equivalent to U ?

    If YES then minimal rewriting.

    Becomes best plan so far.

    Prune all superqueries

    Cost higher than the min cost so far ?

    If YES then prune it together with all superqueries


    Interaction with a cost based component

    Interaction with a Cost-Based Component

    Query

    Cost information

    Logical schema

    =

    relations + OO classes

    Candidate plan q

    Rewriting with logical and physical constraints (C&B)

    Cost-based pruning

    Cost-based

    optimization

    Physical plan and cost for q

    Physical schema

    =

    views + indexes

    Best physical plan


    Cost based optimization of a candidate plan the usual suspects

    Cost-Based Optimization of a Candidate Plan(the usual suspects)

    • Scan order (“join reordering”)

      generalized dynamic programming technique

      • beyond relational

      • global”, i.e., interacting with backchase pruning

    • Placement of selections and projections

      • because the chase works with queries in “normalized” form

    • Join methods: nested loops, index join , others


    Talk outline3

    Talk Outline

    • Motivation and Overview

    • Chase and Backchase (as in proposal): Example of candidate plans enumeration

    • Chase and Backchase (as in proposal): Completeness of path-conjunctive (PC) case

    • Cost-based C&B

    • Experimental results

    • Future Research and Conclusion


    Is the c b technique practical

    Is the C&B Technique Practical ?

    • Is the chase feasible ?

      • tries exponentially many mappings for each step

    • Is the backchase feasible ?

      • explores exponentially many subqueries of UP

      • what is the effect of cost-based pruning ?

    • Is it worthwhile ?

      • does the quality of the generated plans outweigh the cost of optimization ?


    Experiments

    Experiments

    • We have developed a prototype in Java

    • Several experimental configurations that:

      • cover important practical cases

      • are scalable

  • Not shown here:

    • Exper. Config. 1 : relational queries and indexes.

    • Exper. Config. 3 : OO techniques

      interaction between semantic optimization and path indexes


  • Experimental configuration 2 relational views

    Experimental Configuration 2 : Relational Views

    V11

    V21

    V12

    V22

    key constraints

    • Input query: chain of stars

    • Schemas: views and key constraints

      • In addition: indexes on the corner relations

    • Scale-up parameters:

      • # stars, size of stars, # views/star, #indexes/star

    S11

    S21

    R1

    R2

    S22

    S12

    S13

    S23

    • Interaction between semantic optimization and views

      • No rewriting with views in the absence of key constraints


    Time to chase

    Time to Chase

    • Chasing alone is fast !

    • For queries with more than 15 joins and more than 15 constraints it takes seconds


    Stratification

    Stratification

    V11

    V21

    S11

    S21

    R1

    R2

    S22

    S12

    V12

    V22

    S13

    S23

    backchase,

    no cost

    chase

    Plans (minimal

    subqueries)

    Query

    UP

    Full backchase(FB):

    • For UP of size 12-15 joins, FB (top-down or bottom-up) may become impractical

    • OQF (On-line Query Fragmentation):UP can be decomposed, prior to backchase, into smaller univ. plans.

    Complexity: 2k1+ … + 2km << 2k1+…+km


    Backchase strategies

    Backchase Strategies

    • We compare several C&B optimizers obtained by combining in various ways:

      • top-down/bottom-up full backchase

      • cost-based pruning

      • stratification

      • TopDownFB (top-down full, no cost pruning)

      • BottomUpFB (bottom-up full, no cost pruning)

      • BottomUpFB+Prune (bottom-up full, cost pruning)

      • OQF (OQF fragmentation, BottomUpFB within each fragment)

      • OQF+Prune (OQF, BottomUpFB+Prune within each fragment)

      • DP (dynamic programming, i.e. no C&B)

        • also used for cost-evaluation in all the other strategies


    One star query no stratification applicable

    One Star Query (No Stratification Applicable)

    • BottomUpFB+Pruneoutperforms its full counterparts that do not use cost pruning:

      • total optimization time, for query size 6, and 4 views:

        • BottomUpFB+Prune : 6.4s

        • TopDownFB : 108.1 s

        • BottomUpFB : 82.2s


    Adding indexes

    Adding Indexes

    • Total optimization time (query size=6, views=2, indexes=3):

      • BottomUpFB+Prune : 10.7s

      • TopDownFB : 68.3 s

      • BottomUpFB : 63.6s

  • BottomUpFB+Prune can go to larger configurations:

    • (query size=6, views=4, indexes=4): 53.2s


  • Chain of two stars oqf stratification is applicable

    Chain of Two Stars (OQF Stratification is Applicable)

    • BottomUpFB+Prune is similar to OQF: 72s for query size 10 and 4 views

    • OQF+Prune scales best: only 8.6s (faster than DP !)

      • with dictionaries (indexes), it may miss good plans

    • DP becomes expensive at large queries: 14.6s for query size 10

      • DP’s performance directly affects BottomUpFB+Prune


    Is c b worthwhile

    Is C&B Worthwhile ?

    • Configuration: chain of two stars

    • Execution time measured with DB2 on a medium database(15,000 tuples)

      • unoptimized query vs. C&B optimized query (produced with BottomUpFB+Prune)


    Summary

    Summary

    • C&B integrates, flexibly, many aspects of logical and physical optimization.

    • Good theoretical foundations based on constraints and chase

    • The technique is practical (feasible + worthwhile)

      • cost-based pruning is good !

      • stratification is good !

      • even better when we can combine them


    Future work

    Future Work

    • Better stratification techniques:

      • complete for arbitrary constraints

      • complete, when combined with cost-based pruning

    • Other query languages:

      • union / disjunction

      • grouping / aggregates

      • bag and list semantics


    Chase backchase a versatile tool for query optimization

    The End


    Tableaux chase

    Tableaux Chase

    (r1R)(r2R) [ r1.B = r2.B 

    (r3R) r3.A = r1.A and r3.B = r1.B ]

    d

    d

    • Used to check equivalence of tableaux (conjunctive queries) under dependencies.

    • A chase step:

    Think conjunctive query syntax ...


    Interaction of indexes with views

    Interaction of Indexes with Views

    • (Levy et al ’95) For conjunctive queries and views:

      • finitely many minimal equivalent rewritings

    • However, the optimal plan may not be among them!

      • Scenario:

        • relations R(A,B), S(B, C) and view V = A (R  S)

        • input query Q = R  S

        • P = V  R  S is equivalent, but not minimal thrown away

        • but, if V is small and R has an index on A, then P can be better than Q !

    No interaction captured: indexes not in the language


    C b captures interaction of indexes with views

    C&B Captures Interaction of Indexes with Views

    Elim k

    selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s, V v

    where r.B = s.B and v.A = r.A

    Elim v

    selectstruct(A=v.A, B=s.B, C=s.C)

    from V v, S s

    where IR [v.A].B = s.B

    selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s

    where r.B = s.B

    P1

    P2

    • The index IRon A is explicit in our framework.

    • Chase Q with constraints describing V and IR to obtain U:

    Interaction! More minimal rewritings, incorporating indexes

    U = selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s, V v, dom IR k

    where r.B = s.B and v.A = r.A and k = r.A and IR [k] = r


    Chase backchase a versatile tool for query optimization

    Path-Conjunctive Language (cont’d)

    The following query plan:

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromSI[“Citibank”] p

    is not PC. However, the following is PC:

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromdomSI k, SI[k] p

    where p = “CitiBank”

    In general, PC cannot express navigation OO queries, index-based joins or index-based selections.

    We will rediscover them when we translate PC queries into physical plans.


    Chase backchase a versatile tool for query optimization

    PC Containment

    Theorem(Containment). The PC containment problem, Q1Q2 (under all instances), is in NP (and equivalent to the existence of a PC-containment mapping).

    • Theorem (Containment under EPCDs). Let:

    • Da set of EPCDs

    • Q1, Q2PC queries s.t. some chasing sequence ofQ1 with D

    • terminates (withchase D (Q1)).

    • The following are equivalent :

    • (a)Q1 D Q2(b)chase D (Q1)  Q2


    Oqf on line query fragmentation

    OQF (On-line Query Fragmentation)

    INPUT: Query Q, Constraints C

    OQF

    OUTPUT:

    F1

    Fm

    Q =

    F2

    C1

    C2

    Cm

    C =

    C&B

    C&B

    C&B

    ...

    partial plans P1

    partial plans Pm

    ...

    All plans


    Oqf prune

    OQF + Prune

    INPUT: Query Q, Constraints C

    OQF

    OUTPUT:

    F1

    Fm

    Q =

    F2

    C1

    C2

    Cm

    C =

    BottomUpFB+ Prune

    BottomUpFB+ Prune

    ...

    BottomUpFB+ Prune

    partial plan P1

    partial plan Pm

    ...

    Plan P


  • Login