Chase backchase a versatile tool for query optimization
1 / 48

Chase & Backchase: A Versatile Tool for Query Optimization - PowerPoint PPT Presentation

  • Uploaded on

Chase & Backchase: A Versatile Tool for Query Optimization. Lucian Popa , Alin Deutsch, Arnaud Sahuguet, Val Tannen University of Pennsylvania. Logical vs. Physical. Unfortunately, limited cooperation !. A separation is necessary for physical data independence:. Network. Logical schema.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Chase & Backchase: A Versatile Tool for Query Optimization' - dunn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chase backchase a versatile tool for query optimization

Chase & Backchase:A Versatile Tool for Query Optimization

Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen

University of Pennsylvania

Logical vs physical
Logical vs. Physical

Unfortunately, limited cooperation !

  • A separation is necessary for physical data independence:


Logical schema

Physical schema

Physical optimization

Logical optimization

Optimization techniques
Optimization Techniques


More “exotic”

  • Some rewriting (unnesting)

  • Acces path (index) use (ad-hoc)

  • Join ordering (dynamic programming)

  • Semantic optimization (more rewriting)

  • Use of materialized views (rewriting, too)

  • Object-oriented techniques(everything above, but OO! )





Both ?!


  • Limited cooperation, so far, because of :

    • different foundations

    • different affiliation: logical or physical


Use of materialized views

  • What is needed: interaction between all these techniques.

    • can produce better plans, by enabling each other:

  • This work connects together some of these techniques, by finding a common foundation

Semantic optimization

Use of indexes

In a coco nutshell
In a (Coco)Nutshell

  • The Unifier: Constraints (Dependencies):

    • logical constraints:

      • semantic relationships among elements of the logical schema

    • physical constraints:

      • semantic relationships between elements of the physicalschema and elements of the logical schema

  • Chase with constraints produces the universal plan UP:

    • UP incorporates all relevant access paths

    • subqueries of UP provide the search space (candidate plans)

  • Backchase:

    • search among candidate plans for scan-minimal queries, checking for equivalence using (reverse) chase

    • search space can be pruned using cost-based optimization

Talk outline
Talk Outline

  • Motivation and Overview

  • Logical and Physical Constraints,

    Chase and Backchase (VLDB’99)

  • Theoretical Results

    (ICDT’99, VLDB’99, recent improvements)

  • Using Cost Information

  • Experimental Results

    (SIGMOD’00, recent improvements)

  • Conclusion

Chase backchase a versatile tool for query optimization

fresh base type for oids

Dept : Dict < Doid,

Struct { string DName;

Set <string> DProjs; }>

Logical Schema (ODMG syntax)

Proj : Set <

Struct { string PName;

string PDept;

string CustName;

string Budget;} >

class Dept (extent depts){

attribute string DName;

attribute Set<string> DProjs;}

  • To formally describe classes and their operations: use dictionaries (finite functions).

Translating oo queries into dictionary form
Translating OO Queries into Dictionary Form

selectstruct(PN: s, DN: d.DName)

from depts d , d.DProjs s


selectstruct(PN: s, DN: Dept [d].DName)

fromdomDept d , Dept [d].DProjs s


“domain” as extent

“lookup” for oid dereferencing

  • Dictionary operations: dom M, M[k]

  • Constraints are translated in a similar way

Logical schema constraints
Logical Schema Constraints

  • Describe semantic relationships among elements of the logical schema

  • Example: an inverse relationship between Proj and Dept:

(RIC1) (d domDept) (sDept [d].DProjs)

 (p  Proj) s = p.PName

(INV2) (p Proj) (d domDept)

p.PDept = Dept [d].DName 

 (s Dept [d].DProjs)p.Pname = s

+ two others …

Physical schema
Physical Schema

  • Two indexes for relation Proj :

Primary Index on PNameI : Dict <string, T>

Secondary Index on CustNameSI : Dict <string, Set < T >>

(where T is the type of tuples in Proj)

  • A materialized view (a la “join index”) :

  • JI : Set< Struct{DoidJDoid; string JPN} >

  • JI = select struct (JDoid: d, JPN: p.PName)

  • fromdomDept d, Dept [d].DProjs s, Proj p

  • where s = p.PName

Physical schema constraints
Physical Schema Constraints

Describe semantic relationship between elements of the logical schema and elements of the physical schema.

One of the constraints for SI :

  • (SI1) (p  Proj) (k  domSI)

  • (t  SI[k]) [ k = p.CustName andp = t ]

  • One of the constraints for JI :

  • (JI1) (d domDept) (s Dept [d].DProjs) (p Proj)

  • [ s = p. PName

  • ( j JI) j.JDoid = dand j.JPN = p.Pname

  • ]

  • An example of interaction in optimization
    An Example of Interaction in Optimization

    A user (logical) query

    selectstruct(PN: s, PB: p.Budg, DN: Dept [d].DName)

    fromdomDept d, Dept [d].DProjs s , Proj p

    where s = p.PName and p.CustName = “CitiBank”

    • The query is chased with all the applicable logical and physical constraints (eg., RIC1, SI1, JI1)

    • The result of the chase is the universal plan.

    The generalized chase
    The Generalized Chase




    from R1r1,…,Rmrm,


    where B1(r1,…,rm)and



    Let d be the constraint:

    (r1R1) … (rmRm) [B1(r1,…,rm)

    (s1S1) … (snSn) B2(r1,…,rm,s1,…,sn)



    from R1r1,…,Rmrm

    where B1(r1,…,rm)

    A universal plan
    A Universal Plan

    Added by chase

    selectstruct(PN: s, DN: Dept [d].DName)

    fromdomDept d , Dept [d].DProjs s ,

    Proj p, JI j, dom SI k, SI [k] t, dom I i

    where s = p.PNameand p.CustName = “CitiBank”

    andDept[d].DName = p.PDeptand j.DOID = d and j.PN = p.PName

    and p.CustName = k and p = t and i = p.PNameand p = I[i]


    • U gathers those elements from both logical and physical schema, that are relevant for alternative implementations

    • U is redundant

    Top down backchase minimization
    (Top-Down) Backchase Minimization

    • Eliminates the redundancies from the universal plan

    • Enumerates subqueries of UP:

      eliminates scans top-down, as long as equivalence is preserved;

      equivalence is verified by a (reverse) chase with applicable constraints (eg., INV2)

    • Outputs several scan-minimal subqueries

    Chase top down backchase
    Chase & (Top-Down) Backchase

    Universal Plan














    . . .


    . . .

    Minimal subqueries of U

    More minimization possible ...

    Original Query


    Cost-based optimization still needed

    Some of the generated candidate plans
    Some of the Generated Candidate Plans

    Plan 1: Relation Scan

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromProj p

    where p.CustName = “CitiBank”

    Plan 2: Secondary Index Lookup

    Different access paths, but equivalent

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromSI[“Citibank”] p

    Plan 3: Using the Join Index and the Primary Index

    selectstruct(PN: j.JPN, PB: I[j.JPN].Budg, DN: Dept[j.JDoid].DName)

    fromJI j

    whereI[j.JPN].CustName = “CitiBank”

    Talk outline1
    Talk Outline

    • Motivation and Overview

    • Constraints, Chase and Backchase

    • Theoretical Results

    • Using Cost

    • Experimental Results

    • Conclusion

    Chase backchase a versatile tool for query optimization

    No set/dictionary type here

    Path-Conjunctive Language

    Paths: P ::= x | c | R | P.A | dom P | P[x]

    Path-Conjunctions: B ::= P1 = P1’ and ... and Pk = Pk’

    Path-Conjunctive (PC) Queries:

    selectstruct (A1: P1’, ..., An: Pn’)

    from P1 x1, ..., Pm xm


    Embedded PC Dependencies (EPCDs):

    (r1P1) … (rmPm) [ B1(r1,…,rm) 

    (s1P’1) … (snP’n) B2(r1,…,rm,s1 ,…, sn) ]

    Still very expressive!

    Overview of pc query containment results
    Overview of PC Query Containment Results

    • The chase is also complete for EPCD implication

    • Strengthen results of Chandra & Merlin, Beeri & Vardi (70’s & 80’s)

    Completeness of c b
    Completeness of C&B

    • Assume the following:

    • Logical constraints: Da set of EPCDs

    • Physical schema:

      • primary indexes

      • materialized views that are PC queries (includes join indexes and access suport relations)

      • access structures representable as dictionary expressions with PC query domain and entry (includes secondary indexes and gmaps [Tsatalos et al])

    • Physical constraints: C(eg., SI1, JI1)

    • Cis a set ofEPCDs. (In general the constraints in C are not full.)

    Chase backchase a versatile tool for query optimization

    Completeness of C&B (continued)

    Theorem (Completeness)

    Let Q be a PC query such that some chasing sequence of Q withD terminates (with chaseD(Q)). Then:

    (a) chase C(chase D (Q))terminates

    (b) any scan-minimal candidate plan equivalent to Q under D is a subquery of chase C(chase D (Q))

    Strengthens results of Levy at al, new search space

    Talk outline2
    Talk Outline

    • Motivation and Overview

    • Constraints, Chase and Backchase

    • Theoretical Results

    • Interaction with Cost-Based Optimization

    • Experimental Results

    • Conclusion

    Using cost
    Using Cost

    Want to

    • reduce the size of the backchase search space

    • output a scan-minimal subquery that is also cost-minimal

      Cost monotonicity assumption (made implicitly earlier): subqueries are cheaper

      Hence the top-down backchase cannot exploit cost!

      Instead, a bottom-up backchase enumerates queries

      built from the scans of the universal plan, checking for

      equivalence with the universal plan and

      cost minimality

      and pruning all superqueries

    Bottom up backchase with cost based pruning
    Bottom-Up Backchase with Cost-Based Pruning

    Scans of U:

    Universal Plan

    S1, S2, S3, S4





    S1, S3, S4

    S1, S2, S3

    S1, S2, S4


    subqueries of size 2

    S1, S3

    S1, S4


    S1, S2

    S3, S4


    subqueries of size 1





    Original Query


    Equivalent to U ?

    If YES then minimal rewriting.

    Becomes best plan so far.

    Prune all superqueries

    Cost higher than the min cost so far ?

    If YES then prune it together with all superqueries

    Interaction with a cost based component
    Interaction with a Cost-Based Component


    Cost information

    Logical schema


    relations + OO classes

    Candidate plan q

    Rewriting with logical and physical constraints (C&B)

    Cost-based pruning



    Physical plan and cost for q

    Physical schema


    views + indexes

    Best physical plan

    Cost based optimization of a candidate plan the usual suspects
    Cost-Based Optimization of a Candidate Plan(the usual suspects)

    • Scan order (“join reordering”)

      generalized dynamic programming technique

      • beyond relational

      • global”, i.e., interacting with backchase pruning

    • Placement of selections and projections

      • because the chase works with queries in “normalized” form

    • Join methods: nested loops, index join , others

    Talk outline3
    Talk Outline

    • Motivation and Overview

    • Chase and Backchase (as in proposal): Example of candidate plans enumeration

    • Chase and Backchase (as in proposal): Completeness of path-conjunctive (PC) case

    • Cost-based C&B

    • Experimental results

    • Future Research and Conclusion

    Is the c b technique practical
    Is the C&B Technique Practical ?

    • Is the chase feasible ?

      • tries exponentially many mappings for each step

    • Is the backchase feasible ?

      • explores exponentially many subqueries of UP

      • what is the effect of cost-based pruning ?

    • Is it worthwhile ?

      • does the quality of the generated plans outweigh the cost of optimization ?


    • We have developed a prototype in Java

    • Several experimental configurations that:

      • cover important practical cases

      • are scalable

  • Not shown here:

    • Exper. Config. 1 : relational queries and indexes.

    • Exper. Config. 3 : OO techniques

      interaction between semantic optimization and path indexes

  • Experimental configuration 2 relational views
    Experimental Configuration 2 : Relational Views





    key constraints

    • Input query: chain of stars

    • Schemas: views and key constraints

      • In addition: indexes on the corner relations

    • Scale-up parameters:

      • # stars, size of stars, # views/star, #indexes/star









    • Interaction between semantic optimization and views

      • No rewriting with views in the absence of key constraints

    Time to chase
    Time to Chase

    • Chasing alone is fast !

    • For queries with more than 15 joins and more than 15 constraints it takes seconds















    no cost


    Plans (minimal




    Full backchase(FB):

    • For UP of size 12-15 joins, FB (top-down or bottom-up) may become impractical

    • OQF (On-line Query Fragmentation):UP can be decomposed, prior to backchase, into smaller univ. plans.

    Complexity: 2k1+ … + 2km << 2k1+…+km

    Backchase strategies
    Backchase Strategies

    • We compare several C&B optimizers obtained by combining in various ways:

      • top-down/bottom-up full backchase

      • cost-based pruning

      • stratification

      • TopDownFB (top-down full, no cost pruning)

      • BottomUpFB (bottom-up full, no cost pruning)

      • BottomUpFB+Prune (bottom-up full, cost pruning)

      • OQF (OQF fragmentation, BottomUpFB within each fragment)

      • OQF+Prune (OQF, BottomUpFB+Prune within each fragment)

      • DP (dynamic programming, i.e. no C&B)

        • also used for cost-evaluation in all the other strategies

    One star query no stratification applicable
    One Star Query (No Stratification Applicable)

    • BottomUpFB+Pruneoutperforms its full counterparts that do not use cost pruning:

      • total optimization time, for query size 6, and 4 views:

        • BottomUpFB+Prune : 6.4s

        • TopDownFB : 108.1 s

        • BottomUpFB : 82.2s

    Adding indexes
    Adding Indexes

    • Total optimization time (query size=6, views=2, indexes=3):

      • BottomUpFB+Prune : 10.7s

      • TopDownFB : 68.3 s

      • BottomUpFB : 63.6s

  • BottomUpFB+Prune can go to larger configurations:

    • (query size=6, views=4, indexes=4): 53.2s

  • Chain of two stars oqf stratification is applicable
    Chain of Two Stars (OQF Stratification is Applicable)

    • BottomUpFB+Prune is similar to OQF: 72s for query size 10 and 4 views

    • OQF+Prune scales best: only 8.6s (faster than DP !)

      • with dictionaries (indexes), it may miss good plans

    • DP becomes expensive at large queries: 14.6s for query size 10

      • DP’s performance directly affects BottomUpFB+Prune

    Is c b worthwhile
    Is C&B Worthwhile ?

    • Configuration: chain of two stars

    • Execution time measured with DB2 on a medium database(15,000 tuples)

      • unoptimized query vs. C&B optimized query (produced with BottomUpFB+Prune)


    • C&B integrates, flexibly, many aspects of logical and physical optimization.

    • Good theoretical foundations based on constraints and chase

    • The technique is practical (feasible + worthwhile)

      • cost-based pruning is good !

      • stratification is good !

      • even better when we can combine them

    Future work
    Future Work

    • Better stratification techniques:

      • complete for arbitrary constraints

      • complete, when combined with cost-based pruning

    • Other query languages:

      • union / disjunction

      • grouping / aggregates

      • bag and list semantics

    Tableaux chase
    Tableaux Chase

    (r1R)(r2R) [ r1.B = r2.B 

    (r3R) r3.A = r1.A and r3.B = r1.B ]



    • Used to check equivalence of tableaux (conjunctive queries) under dependencies.

    • A chase step:

    Think conjunctive query syntax ...

    Interaction of indexes with views
    Interaction of Indexes with Views

    • (Levy et al ’95) For conjunctive queries and views:

      • finitely many minimal equivalent rewritings

    • However, the optimal plan may not be among them!

      • Scenario:

        • relations R(A,B), S(B, C) and view V = A (R  S)

        • input query Q = R  S

        • P = V  R  S is equivalent, but not minimal thrown away

        • but, if V is small and R has an index on A, then P can be better than Q !

    No interaction captured: indexes not in the language

    C b captures interaction of indexes with views
    C&B Captures Interaction of Indexes with Views

    Elim k

    selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s, V v

    where r.B = s.B and v.A = r.A

    Elim v

    selectstruct(A=v.A, B=s.B, C=s.C)

    from V v, S s

    where IR [v.A].B = s.B

    selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s

    where r.B = s.B



    • The index IRon A is explicit in our framework.

    • Chase Q with constraints describing V and IR to obtain U:

    Interaction! More minimal rewritings, incorporating indexes

    U = selectstruct(A=r.A, B=r.B, C=s.C)

    from R r, S s, V v, dom IR k

    where r.B = s.B and v.A = r.A and k = r.A and IR [k] = r

    Chase backchase a versatile tool for query optimization

    Path-Conjunctive Language (cont’d)

    The following query plan:

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromSI[“Citibank”] p

    is not PC. However, the following is PC:

    selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)

    fromdomSI k, SI[k] p

    where p = “CitiBank”

    In general, PC cannot express navigation OO queries, index-based joins or index-based selections.

    We will rediscover them when we translate PC queries into physical plans.

    Chase backchase a versatile tool for query optimization

    PC Containment

    Theorem(Containment). The PC containment problem, Q1Q2 (under all instances), is in NP (and equivalent to the existence of a PC-containment mapping).

    • Theorem (Containment under EPCDs). Let:

    • Da set of EPCDs

    • Q1, Q2PC queries s.t. some chasing sequence ofQ1 with D

    • terminates (withchase D (Q1)).

    • The following are equivalent :

    • (a)Q1 D Q2(b)chase D (Q1)  Q2

    Oqf on line query fragmentation
    OQF (On-line Query Fragmentation)

    INPUT: Query Q, Constraints C





    Q =





    C =





    partial plans P1

    partial plans Pm


    All plans

    Oqf prune
    OQF + Prune

    INPUT: Query Q, Constraints C





    Q =





    C =

    BottomUpFB+ Prune

    BottomUpFB+ Prune


    BottomUpFB+ Prune

    partial plan P1

    partial plan Pm


    Plan P