Loading in 2 Seconds...
Loading in 2 Seconds...
Chase & Backchase: A Versatile Tool for Query Optimization. Lucian Popa , Alin Deutsch, Arnaud Sahuguet, Val Tannen University of Pennsylvania. Logical vs. Physical. Unfortunately, limited cooperation !. A separation is necessary for physical data independence:. Network. Logical schema.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lucian Popa, Alin Deutsch, Arnaud Sahuguet, Val Tannen
University of Pennsylvania
Unfortunately, limited cooperation !
Network
Logical schema
Physical schema
Physical optimization
Logical optimization
Traditional
More “exotic”
Logical
Physical
Physical
Logical
Both ?!
Mixed
Use of materialized views
Semantic optimization
Use of indexes
Chase and Backchase (VLDB’99)
(ICDT’99, VLDB’99, recent improvements)
(SIGMOD’00, recent improvements)
Dept : Dict < Doid,
Struct { string DName;
Set <string> DProjs; }>
Logical Schema (ODMG syntax)
Proj : Set <
Struct { string PName;
string PDept;
string CustName;
string Budget;} >
class Dept (extent depts){
attribute string DName;
attribute Set<string> DProjs;}
selectstruct(PN: s, DN: d.DName)
from depts d , d.DProjs s
OO
selectstruct(PN: s, DN: Dept [d].DName)
fromdomDept d , Dept [d].DProjs s
Dict
“domain” as extent
“lookup” for oid dereferencing
(RIC1) (d domDept) (sDept [d].DProjs)
(p Proj) s = p.PName
(INV2) (p Proj) (d domDept)
p.PDept = Dept [d].DName
(s Dept [d].DProjs)p.Pname = s
+ two others …
Primary Index on PNameI : Dict <string, T>
Secondary Index on CustNameSI : Dict <string, Set < T >>
(where T is the type of tuples in Proj)
Describe semantic relationship between elements of the logical schema and elements of the physical schema.
One of the constraints for SI :
A user (logical) query
selectstruct(PN: s, PB: p.Budg, DN: Dept [d].DName)
fromdomDept d, Dept [d].DProjs s , Proj p
where s = p.PName and p.CustName = “CitiBank”
distinct!
distinct!
selectO(r1,…,rm)
from R1r1,…,Rmrm,
S1s1,…,Snsn
where B1(r1,…,rm)and
B2(r1,…,rm,s1,…,sn)
d
Let d be the constraint:
(r1R1) … (rmRm) [B1(r1,…,rm)
(s1S1) … (snSn) B2(r1,…,rm,s1,…,sn)
]
selectO(r1,…,rm)
from R1r1,…,Rmrm
where B1(r1,…,rm)
Added by chase
selectstruct(PN: s, DN: Dept [d].DName)
fromdomDept d , Dept [d].DProjs s ,
Proj p, JI j, dom SI k, SI [k] t, dom I i
where s = p.PNameand p.CustName = “CitiBank”
andDept[d].DName = p.PDeptand j.DOID = d and j.PN = p.PName
and p.CustName = k and p = t and i = p.PNameand p = I[i]
U:
eliminates scans top-down, as long as equivalence is preserved;
equivalence is verified by a (reverse) chase with applicable constraints (eg., INV2)
Universal Plan
U
BackChase
Chase
...
d’m
dn
...
...
dn
d1
d’1
d1
Q1
. . .
Qn
. . .
Minimal subqueries of U
More minimization possible ...
Original Query
Q0
Cost-based optimization still needed
Plan 1: Relation Scan
selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)
fromProj p
where p.CustName = “CitiBank”
Plan 2: Secondary Index Lookup
Different access paths, but equivalent
selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)
fromSI[“Citibank”] p
Plan 3: Using the Join Index and the Primary Index
selectstruct(PN: j.JPN, PB: I[j.JPN].Budg, DN: Dept[j.JDoid].DName)
fromJI j
whereI[j.JPN].CustName = “CitiBank”
Path-Conjunctive Language
Paths: P ::= x | c | R | P.A | dom P | P[x]
Path-Conjunctions: B ::= P1 = P1’ and ... and Pk = Pk’
Path-Conjunctive (PC) Queries:
selectstruct (A1: P1’, ..., An: Pn’)
from P1 x1, ..., Pm xm
whereB
Embedded PC Dependencies (EPCDs):
(r1P1) … (rmPm) [ B1(r1,…,rm)
(s1P’1) … (snP’n) B2(r1,…,rm,s1 ,…, sn) ]
Still very expressive!
Completeness of C&B (continued)
Theorem (Completeness)
Let Q be a PC query such that some chasing sequence of Q withD terminates (with chaseD(Q)). Then:
(a) chase C(chase D (Q))terminates
(b) any scan-minimal candidate plan equivalent to Q under D is a subquery of chase C(chase D (Q))
Strengthens results of Levy at al, new search space
Want to
Cost monotonicity assumption (made implicitly earlier): subqueries are cheaper
Hence the top-down backchase cannot exploit cost!
Instead, a bottom-up backchase enumerates queries
built from the scans of the universal plan, checking for
equivalence with the universal plan and
cost minimality
and pruning all superqueries
Scans of U:
Universal Plan
S1, S2, S3, S4
U
Chase
...
...
S1, S3, S4
S1, S2, S3
S1, S2, S4
dn
subqueries of size 2
S1, S3
S1, S4
...
S1, S2
S3, S4
d1
subqueries of size 1
S1
S2
S3
S4
Original Query
Q0
Equivalent to U ?
If YES then minimal rewriting.
Becomes best plan so far.
Prune all superqueries
Cost higher than the min cost so far ?
If YES then prune it together with all superqueries
Query
Cost information
Logical schema
=
relations + OO classes
Candidate plan q
Rewriting with logical and physical constraints (C&B)
Cost-based pruning
Cost-based
optimization
Physical plan and cost for q
Physical schema
=
views + indexes
Best physical plan
generalized dynamic programming technique
interaction between semantic optimization and path indexes
V11
V21
V12
V22
key constraints
S11
S21
R1
R2
S22
S12
S13
S23
V11
V21
S11
S21
R1
R2
S22
S12
V12
V22
S13
S23
backchase,
no cost
chase
Plans (minimal
subqueries)
Query
UP
Full backchase(FB):
Complexity: 2k1+ … + 2km << 2k1+…+km
(r1R)(r2R) [ r1.B = r2.B
(r3R) r3.A = r1.A and r3.B = r1.B ]
d
d
Think conjunctive query syntax ...
No interaction captured: indexes not in the language
Elim k
selectstruct(A=r.A, B=r.B, C=s.C)
from R r, S s, V v
where r.B = s.B and v.A = r.A
Elim v
selectstruct(A=v.A, B=s.B, C=s.C)
from V v, S s
where IR [v.A].B = s.B
selectstruct(A=r.A, B=r.B, C=s.C)
from R r, S s
where r.B = s.B
P1
P2
Interaction! More minimal rewritings, incorporating indexes
U = selectstruct(A=r.A, B=r.B, C=s.C)
from R r, S s, V v, dom IR k
where r.B = s.B and v.A = r.A and k = r.A and IR [k] = r
Path-Conjunctive Language (cont’d)
The following query plan:
selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)
fromSI[“Citibank”] p
is not PC. However, the following is PC:
selectstruct(PN: p.PName, PB: p.Budg, DN: p.PDept)
fromdomSI k, SI[k] p
where p = “CitiBank”
In general, PC cannot express navigation OO queries, index-based joins or index-based selections.
We will rediscover them when we translate PC queries into physical plans.
Theorem(Containment). The PC containment problem, Q1Q2 (under all instances), is in NP (and equivalent to the existence of a PC-containment mapping).
INPUT: Query Q, Constraints C
OQF
OUTPUT:
F1
…
Fm
Q =
F2
C1
C2
…
Cm
C =
C&B
C&B
C&B
...
partial plans P1
partial plans Pm
...
All plans
INPUT: Query Q, Constraints C
OQF
OUTPUT:
F1
…
Fm
Q =
F2
C1
C2
…
Cm
C =
BottomUpFB+ Prune
BottomUpFB+ Prune
...
BottomUpFB+ Prune
partial plan P1
partial plan Pm
...
Plan P