1 / 28

A Scalable Algorithm for Answering Queries Using Views

A Scalable Algorithm for Answering Queries Using Views. Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy. Answering Queries Using Views. Problem: access views instead of original relations Useful in data integration and query optimization NP-Complete

Download Presentation

A Scalable Algorithm for Answering Queries Using Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy

  2. Answering Queries Using Views • Problem: access views instead of original relations • Useful in data integration and query optimization • NP-Complete • Many papers on the subject • No empirical testing of algorithms

  3. Data Integration:Query Reformulation • Data sources are pre-calculated views • Views are not complete • Get the most answers possible given the views • Many data sources Car sale information Ford cars - dealer prices - sticker prices - inventory Cheap cars - prices -manufacturer Used cars - prices - dealer - year

  4. Data Integration Example Query: find the prices of cars that we can buy at cost Q(cost):-dealercost(car,cost) & stickerprice(car,cost) V1(price1,price2):-dealercost(car, price1) & stickerprice(car, price2) & maker(car, “Ford”) V2(cost):-dealercost(car, cost) & stickerprice(car,cost) & cheap(car) Q’1(cost):-Ford(cost, cost)  Q’2(cost):-BMW(cost) Database relations Query Views distinguished existential Maximally contained rewriting Conjunctive rewritings

  5. Outline • Previous algorithms • Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] • Inverse rules [Duschka, Genesereth, 1997] • Minimum Necessary Connections (MiniCon) Algorithm • Experimental evaluation • Extension to arithmetic comparisons • Conclusions and future work

  6. The Bucket Algorithm • Introduced as part of Information Manifold • Treats subgoals individually

  7. r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Bucket Algorithm: Populating buckets • For each subgoal in the query, place relevant views in the subgoal’s bucket Inputs: Q(x):- r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Buckets:

  8. For every combination in the Cartesian products from the buckets, check containment in the query Candidate rewritings: Q’1(x) :- V1(x) & V2(x)  Q’2(x) :- V1(x) & V3(x)  Q’3(x) :- V3(x) & V2(x)  Q’4(x) :- V3(x) & V3(x)  r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Combining Buckets Bucket Algorithm will check all possible combinations Buckets: r1(x,y) r2(y,x)

  9. Inverse Rules Part of the Info Master system Inverse rules show how to get database tuples from the views Cannot be extended to interpreted predicates Stops earlier than the Bucket Algorithm

  10. Inputs: V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Inverse Rules: IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Creating Inverse Rules • For each V(X):-r1(X1) &… & rn(Xn) • for each j = 1, …, n form an inverse rule: rj(Xj):-V(X) Skolem Function

  11. Inverse Rules + IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Tuples V1(g) V2(h) V3(j) V3(m) Combining Inverse Rules At query time, query over rules Query + Q(x):-r1(x,y)& r2(y,x) • = Expansion: • r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j) • r1(m,sfV3(m)), r2(sfV3(m),m)

  12. Unfolding rules before tuples Q(x):- r1(x,y) & r2(y,x) IR1 IR3 IR2 IR4 Use unification to see if rewriting is contained in the query No containment check necessary

  13. The MiniCon Algorithm Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs) Combine MCDs that only overlap on distinguished view variables No containment check!

  14. view mapping subgoals mapped V3 x  f, y  g 1, 2 MiniCon Description Formation • Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together Inputs: Q(x) :-r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) MCDs:

  15. view mapping subgoals mapped V3 x  f, y  g 1, 2 MiniCon Combination Take all combinations of MCDs that • map disjoint sets of subgoals • map all subgoals of the query MCDs: Rewriting: Q’(x):-V3(x)

  16. Experimental Evaluation Tested performance and scale up of: • Bucket Algorithm • Inverse Rules extended with unification • MiniCon Algorithm MiniCon at least as good in all cases, much better in some Show results for chain queries: Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)

  17. Many Rewritings

  18. Few rewritings, very structured query and views

  19. Few rewritings, less structured views

  20. Extension:Interpreted Predicates Problem is in general undecidable We looked at subgoals of the form: var < constant or var > constant If maps to an existential view variable, require interpreted predicates implied Ex: Q(x):-r1(x,y), y > 17 V1(a):-r1(a,b), b > 18 Guaranteed to be sound Interpreted Predicates

  21. Interpreted Predicate Results

  22. Future Work • Query Optimization Look for the fastest answer to query Assume that all views are complete Require equivalent rewritings Need to allow overlap on subgoals mapped • A fuller comparison of interpreted predicates

  23. Conclusions • Scalability of previous algorithms understood • MiniCon Algorithm invented • First experimental comparison of algorithms for answering queries using views • Extensions to binding patterns, interpreted predicates • New maximally contained rewriting form

  24. Maximally contained Rewritings • Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn if • For any database D, and extensions v1, …, vn of the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2)  Q(D) for all i • There is no other query Q1 such that • Q’(v1, …, vn)  Q1(v1, …, vn) • (2) Q1(v1, …, vn)  Q(D), and there exists at least one database for which  is a strict subset

  25. Containment Checks • Q1 Q2 if the answer to Q1 is a subset of Q2 • m is a containment mapping from Vars(Q2) to Vars(Q1) if • m maps every subgoal in the body of Q2 to a subgoal in the body of Q1 • m maps the head of Q2 to the head of Q1

  26. Inverse Rules With Unification • Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal • For each rule in the first bucket • For each other subgoal, i, attempt to unify the rules so far with all elements in the bucket for I • If we cannot unify with anything in that bucket, break out of loop, otherwise, recurse

  27. Correctness requirements • We need both soundness and completeness • A sound rewriting has a valid containment mapping from the variables of the query to the variables of the view • For completeness we need only to check rewritings of length less than or equal to that of the query

  28. Extensions to XML • Need to choose a query language • Containment checks should still hold • Need to check to make sure that restructured elements are distinguished • May even be more scalable vs Inverse Rules, Bucket Algorithm

More Related