Conjunctive Queries, Datalog, and Recursion Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 23, 2003 Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan Administrivia/Reminders
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
October 23, 2003
Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan
(The Levy paper doesn’t need a review)
Queries have form:
{<x1,x2, …, xn> p }
Predicate: boolean expression over x1,x2, …, xn
<xi,xj,…> R xiop xj xiop constconstop xi
xi. p xj. p pq, pq p, pq
where op is , , , , , and
xi,xj,… are domain variables; p,q are predicates
domain variables
predicate
Borrows the flavor of the relational calculus but is a “real” query language
Rout(T1) R1(T2), R2(T3), …, c(T2[ … Tn)
where Rout is the relation representing the query result, Ri are predicates representing relations, c is an expression using arithmetic/boolean predicates over vars, and Ti are tuples of variables
idb(x,y) r1(x,z), r2(z,y), z < 10
body
head
subgoals
Recall our example of a binary relation for graphs or trees (similar to an XML Edge relation):
edge(from, to)
If we want to know what nodes are reachable:
reachable(F, T) : edge(F, T) distance 1
reachable(F, T) : edge(F, X), edge(X, T) dist. 2
reachable(F, T) : edge(F, X), dist2(X, T) dist. 3
But how about all reachable paths? (Note this was easy in XPath over an XML representation  //edge)
(another way of writing )
Define a recursive query in datalog:
reachable(F, T) : edge(F, T) distance 1
reachable(F, T) : edge(F, X), reachable(X, T) distance >1
What does this mean, exactly, in terms of logic?
One of the three Datalog models is based on a notion of fixpoint:
In the RA, this requires a while loop!
Datalog:
reachable(F, T) : edge(F, T)
reachable(F, T) : edge(F, X), reachable(X, T)
RA procedure with while:
reachable += edge
while change {
reachable += F, T(T ! X(edge) ⋈F ! X(reachable))
}
Datalog allows for negation in rules
Single(X) Person(X), NOT Married(X,Y)
Range restriction, which requires that every variable:
Safe:q(X) r(X,Y)q(X) X = 5 q(X) : r(X,X), s(X)q(X) r(X) Ç (t(Y),u(X,Y))
Unsafe:q(X) r(Y)q(X) : r(X,X)q(X) r(X) Ç t(Y)
r (1,1)
r (1,2)
s (1,1)
v1 (1,2)q (1,2)
r (1,1)
r (1,2)
s (1,1)
v1 (1,2)q(x,y) : v1(x,y), : s(x,y)
r (1,1)
r (1,2)
s (1,1)
v1(x,y) : r(x,y), : s(x,y)q(x,y) : v1(x,y), : s(x,y)
+
q
v1

v1(x,y) : r(x,y), : s(x,y)q(x,y) : v1(x,y), : s(x,y)
+
s
r
foreach predicate p, set stratum(p) = 1
do until no change, or some stratum > # of predicates
foreach rule h b {
foreach negated subgoal of b with predicate q {
stratum(p) = max(stratum(p), 1+stratum(q))
}
foreach positive subgoal of b with predicate q {
stratum(p) = max(stratum(p), stratum(q)
}
}
A single Datalog rule with no “Ç,” “:,” “8” can express select, project, and join – a conjunctive query
We know how to “minimize” conjunctive queries
An important simplification that can’t be done for general SQL
We can test whether one conjunctive query’s answers always contain another conjunctive query’s answers (for ANY instance)
Suppose we have two queries:q1(S,C) : Student(S, N), Takes(S, C), Course(C, X), inCSE(C), Course(C, “DB & Info Systems”)q2(S,C) : Student(S, N), Takes(S, C), Course(C, X)
Intuitively, q1 must contain the same or fewer answers vs. q2:
We can say that q2 contains q1 because this holds for any instance of our DB {Student, Takes, Course}
(This is an NPcomplete algorithm in the size of the query. Testing for full firstorder logic queries is undecidable!!!)
Student
Takes
Course
inCSE
Need to get tuple <S,C> in executing q2 over this database
We’ve seen a new language, Datalog
We’ve seen that a particular kind of query, the conjunctive query, is written naturally in Datalog