1 / 17

CSE 636 Data Integration

CSE 636 Data Integration. Answering Queries Using Views Bucket Algorithm. The Bucket Algorithm. Each subgoal g of Q must be “covered” by some view Make a list of candidates (buckets) per query subgoal Consider combinations of candidates from different buckets Not all combos are “compatible”

nyoko
Download Presentation

CSE 636 Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 636Data Integration Answering Queries Using Views Bucket Algorithm

  2. The Bucket Algorithm • Each subgoal g of Q must be “covered” by some view • Make a list of candidates (buckets) per query subgoal • Consider combinations of candidates from different buckets • Not all combos are “compatible” • Keep the compatible ones and minimize them • Discard the ones contained in another • Take their union

  3. The Bucket Algorithm q(X,Y,R) :- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > 1985 Step 1: For each subgoal, put the relevant sources into a bucket: V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 would be relevant V3(name, year) :- ForSale(name, year, “France”, “cheese”) would be irrelevant Step 2: Take the Cartesian product of the buckets Algorithm produces maximally contained rewriting Ignores interactions between subgoals in Step 1

  4. The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr):- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs):- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr):- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P):- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 Step 1: For each query subgoal, put the relevant sources into a bucket

  5. The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 PProf, CCrs, QQtr Note: Arithmetic predicates don’t pose a problem Buckets teaches reg course V2 V4

  6. The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title):- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 SStd, CCrs, QQtr Note: V3 doesn’t work: arithmetic predicates not consistent V4 doesn’t work: S not in the output of V4 Buckets teaches reg course V2 V1 V4 V2

  7. The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr ≤ Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95 CCrs, TTitle Buckets teaches reg course V2 V1 V1 V4 V2 V4

  8. The Bucket Algorithm: Example • Step 2: • Try all combos of views, one each from a bucket • Test satisfaction of arithmetic predicates in each case • e.g., two views may not overlap, i.e., they may be inconsistent • Desired rewriting = union of surviving ones • Query rewriting 1: • q1(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S”,C,Q’,T) • no problem from arithmetic predicates (none in V2) • May or may not be minimal (why?) teaches reg course V2 V1 V1 V4 V2 V4

  9. The Bucket Algorithm: Example • Unfolding of rewriting 1: • q1’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), • c(C,T), C ≥ 500, Q ≥ Aut98, C ≥ 500, Q’ ≥ Aut98 • Black r’s can be mapped to green r:S’S, S”S, Q’Q • Black c can be mapped to green c: just extend above mapping to TT’ • Minimized unfolding of rewriting 1: • q1m’(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98 • Minimized rewriting 1: • q1m(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’)

  10. The Bucket Algorithm: Example teaches reg course • Query Rewriting 2: • q2(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) • q2’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), • r(S,C,Q), c(C,T’), C ≥ 500, Q ≥ Aut98, • r(S”,C,Q’), c(C,T), t(P’,C,Q’), Q’ ≤ Aut97 • This combo is infeasible: consider the conjunction of arithmetic predicates in V1 and V4 • Query rewriting 3: • q3(S,C,P) :- V2(S’,P,C,Q), V2(S,P’,C,Q), V4(P”,C,T,Q’) V2 V1 V1 V4 V2 V4 teaches reg course V2 V1 V1 V4 V2 V4

  11. The Bucket Algorithm: Example • Unfolding of rewriting 3: • q3’(S,C,P) :- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’), • c(C,T), t(P”,C,Q’), Q’ ≤ Aut97 • The green subgoals can cover the black ones under the mapping: S’S, S”S, P’P, P”P, Q’Q • Minimized rewriting 3: • q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q) • Verify that there are only two rewritings that are not covered by others • Maximally Contained Rewriting: • q’ = q1m  q3m

  12. The Bucket Algorithm: Example 2 Query: q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y) Views: V4(A) :- cites(A,B), cites(B,A) V5(C,D) :- sameTopic(C,D) V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G) Note: Should we list V4(X) twice in the buckets? Buckets cites cites sameTopic V4 V4 V5 V6 V6 V6

  13. The Bucket Algorithm: Example 2 • Consider all combos & check for containment of the unfolded rewriting in Q • V4(X) cannot be combined with anything (why?) • Try q1(X) :- V4(X), V4(X), V5(X,Y) • Try q2(X) :- V4(X), V6(X,Y), V5(X,Y) • Does any of these work? • When can we discard a view from consideration?

  14. The Bucket Algorithm: Example 2 • Here is a successful rewriting: • q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y) • By itself is not contained in Q • But, with subgoal X=Y added, it is! • By minimizing the rewriting, we get: • q3m(X,Y) :- V6(X,X)

  15. The Bucket Algorithm: Example 2 • Remarks: • V4 didn’t contribute to any rewrite, but the bucket algorithm doesn’t recognize it ahead • Consider:q2(X,Y) :- cites(X,Y), cites(Y,X) • Then both cites predicates can be folded into V4 • Not recognized by the bucket algorithm

  16. The State of Affairs • Bucket algorithm: • deals well with predicates, Cartesian product can be large (containment check required for every candidate rewriting) • Inverse rules: • modular (extensible to binding patterns, FD’s) • no treatment of predicates • resulting rewritings need significant further optimization Neither scales up • The MINICON algorithm: • change perspective: look at query variables

  17. References • Querying Heterogeneous Information Sources Using Source Descriptors • By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille • VLDB, 1996 • Laks VS Lakshmanan • Lecture Slides • Alon Halevy • Answering Queries Using Views: A Survey • VLDB Journal, 2000 • http://citeseer.ist.psu.edu/halevy00answering.html

More Related