190 likes | 303 Views
This article delves into the complexities associated with incomplete databases and how views can provide only certain answers. It discusses the definitions of sound, complete, and precise views and their implications on query processing. Through examples such as the World Cup Soccer Tournament, the paper illustrates the relationships between different types of views and queries. It highlights the challenges in computing certain answers, particularly focusing on the complexities involved between varying query languages and their containment properties.
E N D
Views as Incomplete Databases – Certain & Possible Answers Views – an incomplete representation Certain and possible answers Complexity results for certain answers certain
Views – an incomplete representation Given: a view def V, view extension I Sound V: I is contained in V(D) Complete V: I contains V(D) Precise V: I = V(D) V may also be mixed: some views are sound, others are complete In general, more than one db D may exist s.t. certain
Example : teams in World Cup Soccer Tournament Global scheme : Team(country, group) (gr – assignment for 1st round) Source1: S-C(C) – the countries that participate Source2 : S-Q(C) -- countries that participated in qualifying games Source3 : S-T(C) – teams whose games will be on T.V For all three, the logical mapping is v(X) :- Team(X, Y) certain
Given V (including a specification in s/c/p) and I poss(V,I) = {D | D is a db for which I is a possible view} Since we have only the views, this is the set of possible databases. For sound views : an infinite set For complete views : contains the empty db For precise views : may be empty -- inconsistent views Example : v1(X, Y) :- R(X, Y, Z), v1={(a, b), (b, c)} v2(X,Z) :- R(X, Y, Z), v2={(a, d), (c, e)} * The above changes when the global db is known to satisfy constraints (e.g. keys) certain
Certain and possible answers Now, assume also a query Q cert(Q, V, I) – seems easier to compute, always finite poss(Q, V, I) – may be infinite and where do we obtain values not in I? A possible approach: a finite representation of a possibly infinite family of partially unknown databases certain
We concentrate on certain answers -- an absolute notion of answering queries using views Cert(Q, V, I) depends on soundness/completeness of views Example : global : p(x, y) v1(x) :- p(x, y), v2(y):- p(x, y) I = {v1(a), v2(b)} Q: q(x, y) :- p(x, y) Sound views : cert(Q, V, I) is empty Precise views : cert(Q, V, I) is {(a, b)} certain
An issue in query processing : For same example, let Q’ : s(x) :- p(x, y) To allow relational algebra manipulation of certain answers, we need more than a simple relational representation! We need algorithms for performing operations on representations of partially unknown db’s (not in this course) certain
From now : sound views, certain answers Was investigated for views defined in L1, query defined in L2, where L1, L2 in {CQ, CQ!=, NR-Datalog, Datalog, FO} Results include: • Complexity – lower bounds • Algorithms – upper bounds certain
Complexity results for certain answers Thm : for V in L1 , Q in L2, the following are equivalent: (a) computing cert(Q, V, I) (b) deciding containment: is Q1 (in L1) contained in Q2 (in L2)? • (a) is decidable iff (b) is • When decidable, combined complexity of (a) = query complexity of (b) • data complexity of (a) <= query complexity of (b) [ Data complexity: function of db size Query complexity: function of query size Combined : both ] certain
Proof (sketch) : given t, how hard to decide if t is in cert(Q, V, I)? Let I = {vi(tij)}, define Q’ by Q’ contains the rules that define V, and one more “large” rule: (t follows from facts in I) Claim: Hence deciding if t in cert(Q, V, I) is no harder than this containment (Note: for L1 = CQ, need to “massage” Q’ into CQ) certain
How hard to check containment of Q1 in Q2? let p be a new predicate Define V by: rules of Q1, and v(c) :- q1(X), p(X) , let I = {v(c)} Define Q by: rules of Q2 , and q(c) :- q2(X), p(X) Then: (c) is in cert(Q, V, I) iff Q1 is contained in Q2 certain
Consequences : computing certain answers (depends on L1,L2) Is: undecidable for Datalog, FO decidable if: one side <= datalog, other side <= nr-datalog For decidable cases, the above gives combined complexity, We are interested more in data complexity; here it is Co-NP data complexity is bad: impractical to compute, no datalog plan! We will not prove co-NP complexity results same certain
Claim : For Q in Datalog, V in CQ(!=), let V~ be the same view def, with inequalities omitted Then cert(Q, V, I) = cert(Q, V~, I) (Computing the certain answers from I using V w/o the inequalities gives same results) Proof : (b) If t is in cert(Q, V~, I), then for each D in poss(V~, I), t in Q(D) If D also in poss(V, I) -- fine If D not in poss(V, I), exists larger D’ in poss(V, I) s.t. t is in Q(D’) Hence, t is in cert(Q, V, I) certain
Proof of last claim: some s in I, but s not in V(D), because of some inequality Since s is in V(D’’), inequality involves attribute in view body • can add some tuples to D so obtain D1, s.t. s is in V(D1) • adding for all such s gives D’ that contains D, s.t. D’ is in poss(V, I) • If t in Q(D’), since Q has no inequalities, t also in Q(D) certain
For CQ views, Datalog queries, Query plan: datalog program P on V exp(P) – replace views by their definitions (using fresh names for existential variables) P is maximally-contained in Q: • exp(P)(D) is contained in Q(D) • exp(P’)(D) is contained in ep(P)(D) for all other plans P’ Such a plan is best among all plans (This is a language-dependent notion – given a more expressive language, P may not be best any more) But, if a plan delivers cert(Q, V, I) it is absolutely best certain
Thm : For CQ sound views, Datalog queries, the inverse rules algorithm computes cert(Q, V, I) (Thus, for this case, a Datalog query plan can give the absolute best possible answer) Corollary: If P is max-cont(Q) then, for all view instances, I P(I) = cert(Q, V, I) we proceed to prove the theorem certain
Def: A tableau is a collection of atoms, with constants and variables A tableau T represents a db D: there is a valuation from T into D Rep(T) = {D | for some h, D contains H(T) } certain
Claim : For a Datalog query Q, tableau T cert(Q, rep(T)) = the tuples w/o variables in Q(T) Proof : • Can consider only D in rep(T) s.t. D = h(T) every tuple in Q(D’) but not in Q(D) where D’ is larger than h(T) is not in cert(Q, rep(T)) (b) For such D, h(Q(T)) = Q(D) a ground tuple in Q(T) is in cert(Q, rep(T)) (c) For a non-ground t tuple in Q(T), can find D1, D2 in rep(T) that give different values to variables in t no instance of this tuple is in cert(Q, rep(T)) certain
The inverse rules of V create from a view I a database with elements that are skolem functions. Consider each skolem term to be a distinct variable • This is a tableau T(V, I) Claim : T(V, I) represents poss(V, I) Proof : easy Corollary : is cert(Q, V, I) This is precisely what the inverse rule algorithm produces: For each I, the inverse rules produce T(V, I), then apply Q end of story Next: one more (last) algorithm, for CQ queries and views, that is fastest so far certain