A Scalable Algorithm for Answering Queries Using Views

A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy

Answering Queries Using Views • Problem: access views instead of original relations • Useful in data integration and query optimization • NP-Complete • Many papers on the subject • No empirical testing of algorithms

Data Integration:Query Reformulation • Data sources are pre-calculated views • Views are not complete • Get the most answers possible given the views • Many data sources Car sale information Ford cars - dealer prices - sticker prices - inventory Cheap cars - prices -manufacturer Used cars - prices - dealer - year

Data Integration Example Query: find the prices of cars that we can buy at cost Q(cost):-dealercost(car,cost) & stickerprice(car,cost) V1(price1,price2):-dealercost(car, price1) & stickerprice(car, price2) & maker(car, “Ford”) V2(cost):-dealercost(car, cost) & stickerprice(car,cost) & cheap(car) Q’1(cost):-Ford(cost, cost)  Q’2(cost):-BMW(cost) Database relations Query Views distinguished existential Maximally contained rewriting Conjunctive rewritings

Outline • Previous algorithms • Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] • Inverse rules [Duschka, Genesereth, 1997] • Minimum Necessary Connections (MiniCon) Algorithm • Experimental evaluation • Extension to arithmetic comparisons • Conclusions and future work

The Bucket Algorithm • Introduced as part of Information Manifold • Treats subgoals individually

r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Bucket Algorithm: Populating buckets • For each subgoal in the query, place relevant views in the subgoal’s bucket Inputs: Q(x):- r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Buckets:

For every combination in the Cartesian products from the buckets, check containment in the query Candidate rewritings: Q’1(x) :- V1(x) & V2(x)  Q’2(x) :- V1(x) & V3(x)  Q’3(x) :- V3(x) & V2(x)  Q’4(x) :- V3(x) & V3(x)  r2(y,x) r1(x,y) V2(x), V3(x) V1(x),V3(x) Combining Buckets Bucket Algorithm will check all possible combinations Buckets: r1(x,y) r2(y,x)

Inverse Rules Part of the Info Master system Inverse rules show how to get database tuples from the views Cannot be extended to interpreted predicates Stops earlier than the Bucket Algorithm

Inputs: V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Inverse Rules: IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Creating Inverse Rules • For each V(X):-r1(X1) &… & rn(Xn) • for each j = 1, …, n form an inverse rule: rj(Xj):-V(X) Skolem Function

Inverse Rules + IR1 r1(a, sfV1(a)) :-V1(a) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Tuples V1(g) V2(h) V3(j) V3(m) Combining Inverse Rules At query time, query over rules Query + Q(x):-r1(x,y)& r2(y,x) • = Expansion: • r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j) • r1(m,sfV3(m)), r2(sfV3(m),m)

Unfolding rules before tuples Q(x):- r1(x,y) & r2(y,x) IR1 IR3 IR2 IR4 Use unification to see if rewriting is contained in the query No containment check necessary

The MiniCon Algorithm Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs) Combine MCDs that only overlap on distinguished view variables No containment check!

view mapping subgoals mapped V3 x  f, y  g 1, 2 MiniCon Description Formation • Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together Inputs: Q(x) :-r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) MCDs:

view mapping subgoals mapped V3 x  f, y  g 1, 2 MiniCon Combination Take all combinations of MCDs that • map disjoint sets of subgoals • map all subgoals of the query MCDs: Rewriting: Q’(x):-V3(x)

Experimental Evaluation Tested performance and scale up of: • Bucket Algorithm • Inverse Rules extended with unification • MiniCon Algorithm MiniCon at least as good in all cases, much better in some Show results for chain queries: Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)

Many Rewritings

Few rewritings, very structured query and views

Few rewritings, less structured views

Extension:Interpreted Predicates Problem is in general undecidable We looked at subgoals of the form: var < constant or var > constant If maps to an existential view variable, require interpreted predicates implied Ex: Q(x):-r1(x,y), y > 17 V1(a):-r1(a,b), b > 18 Guaranteed to be sound Interpreted Predicates

Interpreted Predicate Results

Future Work • Query Optimization Look for the fastest answer to query Assume that all views are complete Require equivalent rewritings Need to allow overlap on subgoals mapped • A fuller comparison of interpreted predicates

Conclusions • Scalability of previous algorithms understood • MiniCon Algorithm invented • First experimental comparison of algorithms for answering queries using views • Extensions to binding patterns, interpreted predicates • New maximally contained rewriting form

Maximally contained Rewritings • Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn if • For any database D, and extensions v1, …, vn of the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2)  Q(D) for all i • There is no other query Q1 such that • Q’(v1, …, vn)  Q1(v1, …, vn) • (2) Q1(v1, …, vn)  Q(D), and there exists at least one database for which  is a strict subset

Containment Checks • Q1 Q2 if the answer to Q1 is a subset of Q2 • m is a containment mapping from Vars(Q2) to Vars(Q1) if • m maps every subgoal in the body of Q2 to a subgoal in the body of Q1 • m maps the head of Q2 to the head of Q1

Inverse Rules With Unification • Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal • For each rule in the first bucket • For each other subgoal, i, attempt to unify the rules so far with all elements in the bucket for I • If we cannot unify with anything in that bucket, break out of loop, otherwise, recurse

Correctness requirements • We need both soundness and completeness • A sound rewriting has a valid containment mapping from the variables of the query to the variables of the view • For completeness we need only to check rewritings of length less than or equal to that of the query

Extensions to XML • Need to choose a query language • Containment checks should still hold • Need to check to make sure that restructured elements are distinguished • May even be more scalable vs Inverse Rules, Bucket Algorithm

A Scalable Algorithm for Answering Queries Using Views

A Scalable Algorithm for Answering Queries Using Views

Presentation Transcript

Answering Queries Using Views: The Last Frontier

Answering Queries Using Views LMSS 95

Answering Queries Using Views: A Survey

Answering Top-k Queries Using Views

Answering Queries Using Views: A Survey

Answering Top-k Queries Using Views

Query Answering using Views

Optimizing Queries Using Materialized Views

Answering Queries: Problems

Answering Approximate Queries Efficiently

Answering Queries using views: A survey

Answering Top-k Queries Using Views

Answering Tree Pattern Queries Using Views

Answering Queries Using Views

Answering Queries Using Views

A Scalable Algorithm for Answering Queries Using Views

Answering Queries using Templates with Binding Patterns

An Efficient Algorithm for Answering Graph Reachability Queries

Generating Efficient Plans for Queries Using Views

Answering Approximate Queries Efficiently