- 156 Views
- Uploaded on
- Presentation posted in: General

Decidable Containment of Recursive Queries

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Decidable Containment of Recursive Queries

Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi

presented by Axel Polleres

http://www.dis.uniroma1.it/pub/calvanes/calv-degi-vard-ICDT-2003.pdf

- Checking whether one query yields necessarily a subset of the result of another one for every database
- Important for information integration, query rewriting, verification, information integration, cooperative answering, integrity checking, etc.

- A conjunctive query is a query of the form:
ans(X0) :- r1(X1), r1(X1), …, rn(Xn).

where the Xi = (x1i, …, xni) range over a set of variables

{u1, …, uk} and the variables in X0are called distinguished

variables.

In SQL often called S(elect)P(roject)J(oin)-Queries

Containment of conjunctive queries is decidable!

In fact, NP-complete: [14]

Proof Sketch (membership in NP): A conj. Query Q1 is contained in Q2 iff there is a containment mapping from (the variables in) Q2 to (the variables in) Q1. Guessing and checking that homomorphism is clearly in NP.

Also completeness can be shown (e.g. by reduction

of “exact cover”, cf. [])

- Full Datalog add Union and Recursion to CQ Containment is undecidable
- Undecidability can be shown by reduction from containment for context free grammars [22]
So, CQ and Full Datalog span two extremes

But …not all is lost! There are interesting classes in between!

n

2

2

- Containment Monadic Datalog (all rule heads use a single variable) is decidable
- Checking containment of full Datalog in non-recursive Datalog is decidable in exponential time
- Checking containment of non-recursive Datalog in full Datalog is decidable in triple exponential time , i.e. O( 2 )
- When the non-recursive query is unfolded then
“only” double exponential.

- When the non-recursive query is unfolded then

- Query containment in the context of conceptual graphs (e.g. RDF-graphs), namely for Regular Path Queries, i.e.:
- Asking for all pairs of objects in a graph that are connected by a path conforming to a regular expression:
i.,e.:

E(x,y)… where E is a regular

expression over graph edges

Refinement:

- 2RPQs: “inverse” is allowed in traversal of

- Asking for all pairs of objects in a graph that are connected by a path conforming to a regular expression:

A conjunctive 2-way regular path query (C2RPQ) of arity n is a query of the form:

where are 2RPQs.

UC2RQPs are then unions of conjunctive 2-way regular path queries (C2RPQs) with the same arity. Here, the answer set to

Note that CQs (with only binary body predicates) are just a special case of 2RPQs!

We define for a datalog program Π, an IDB predicate Q and a database (EDB predicates) G:

i.e. the set of facts Q (fixpoint) which can be obtained by applications of rules inΠ, then:

- Idea: Reuse of variables is allowed, as long as the variables are not “connected” in the tree. So, we can build proof trees with a bound number of variables by twice the number of the maximum of variables occurring in IDB atoms num_var(r) in rules r of Π = num_var(Π).
- A proof tree is then simply an expansion tree only using variables from {x1,…,xnum_var(Π)}

Approach: the notion of a containment mapping is generalized to Datalog and to UC2RPQs by expansions of Datalog programs:

can be defined via an infinite sequence of conjunctive queries:

Let trees(Q, Π) be the set of trees for predicate Q

labeled with a Rule at each node, such that the children of a node N always are labeled with rules having as head atoms corresponding to the IDB atoms of the rule of N and leaves are rules labeled with rules having EDB predicates only in their bodies. Note that trees(Q, Π) can be infinite.

Intuition: Πis contained in a union of conjunctive queries

if there is a containment mapping from some to each expansion tree

in trees(Q, Π). … not yet, since the number of variables and hence the

number of node labels is unbounded.

- To reconstruct an expansion tree for a gicen proof tree, we need to distinguish among occurrences of variables:
- Let g1, g2 be nodes in a proof tree, then we call occurrences x1, x2 of variable x in the rules labeling g1, and g2, respectively connected if every rule on the path from g1 to g2 (except maybe the lowest common ancestor g0) has an occurrence of x in the head.
- We say that an occurrence x of a variable xin τis a distinguished occurrenceif it is connected to an occurrence of xin the head of the root of τ.

A strong containment mapping from a conjunctive query ϕto a

proof tree τ is acontainment mapping hfrom ϕto τwith:

– hmaps distinguished occurrences in ϕto distinguished occurrences in τ, and

– if x1 and x2 are two occurrences of a variable xin ϕ, then the occurrences h(x1) and h(x2) in τ are connected.

Then:

An expansion of a C2RPQ

is a CQ of the form:

The authors show how to check this condition using tree-automata:

Idea: The set of proof trees for a Datalog program Π with a goal predicate Q can be described by a nondeterministic tree automaton (doubly exponential in the size of Π), accepting exactly the proof trees. …

concluding:

- Adding transitive to CQ closure does not increase upper-bound-results for containment of Datalog (2EXP matches the upper bound for containment in unions of conjunctive queries) [25]
- However whether this upper-bound is tight is not clear, but conjectured by the authors
- (lower bound EXPSPACE follows from containment of UC2RPQs in UC2RPQs [34])
- Observe: Containment in the other direction already undecidable for RPQs [22]

- How do te proof obligations we need relate to RPQs/2RPQs/UC2RPQs
- How do RPQs/2RPQs/UC2RPQs relate to OWL DL/Light/Flight and rule extensions thereof?
- Decidable yes, but (hardly) scalable, or no? Not necessarily if queries/programs are of moderate size.
- We need more use cases to show what kinds of containment we need!