XML Views & Reasoning about Views. Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 3, 2005. Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan. Administrivia. Project plans due
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
November 3, 2005
Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan
for $x in document(“dblp.xml”)/proceedings
order by $x/title/text()
return $x
By default:
XQuery has a way of telling the query engine to avoid preserving order:
Can get a node’s name by querying nodename():
for $x in document(“dblp.xml”)/dblp/*
return nodename($x)
Can construct elements and attributes using computed names:
for $x in document(“dblp.xml”)/dblp/*,
$year in $x/year,
$title in $x/title/text(),
element nodename($x) {
attribute {“year” + $year} { $title }
}
<xsl:stylesheet version=“1.1”>
<xsl:template match=“/dblp”>
<html><head>This is DBLP</head>
<body>
<xsl:applytemplates />
</body>
</html>
</xsl:template>
<xsl:template match=“inproceedings”>
<h2><xsl:applytemplates select=“title” /></h2>
<p><xsl:applytemplates select=“author”/></p>
</xsl:template>
…
</xsl:stylesheet>
<inproceedings>
<title>Paper1</title>
<author>Smith</author>
</inproceedings>
<inproceedings>
<author>Chakrabarti</author>
<author>Gray</author>
<title>Paper2</title>
</inproceedings>
</dblp>
<html><head>This Is DBLP</head>
<body>
<h2>Paper1</h2>
<p>Smith</p>
<h2>Paper2</h2>
<p>Chakrabarti</p>
<p>Gray</p>
</body>
</html>
Results of XSLT StylesheetXSLT is a language primarily designed from going from XML nonXML
Obviously, we can do XML XML in XQuery
… Or relations relations
… What about relations XML and XML relations?
Let’s start with XML XML, relations relations
SQL:
CREATE VIEW V(A,B,C) AS
SELECT A,B,C FROM R WHERE R.A = “123”
XQuery:declare function V() as element(content)* {
for $r in doc(“R”)/root/tree,
$a in $r/a, $b in $r/b, $c in $r/c
where $a = “123”
return <content>{$a, $b, $c}</content>
}
Using the views:
SELECT * FROM V, RWHERE V.B = 5 AND V.C = R.C
for $v in V()/content, $r in doc(“r”)/root/treewhere $v/b = $r/breturn $v
Providing security/access control
Can be used as relations in other queries
Describe transformations from one schema (the base relations) to another (the output of the view)
Allow us to define recursive queries
CREATE VIEW V(A,B,C) AS
SELECT A,B,C FROM R WHERE R.A = “123”
SELECT * FROM V, RWHERE V.B = 5 AND V.C = R.C
A claim we’ve made several times:
“XML can’t represent anything that can’t be expressed in in the relational model”
If this is true, then we must be able to represent XML in relations
Store a relational view of XML (or create an XML view of relations)
If we have no IDs, we CREATE values…
BinaryLikeEdge(key, label, type, value, parent)
The Simplest Way to Encode a TreeWhat are shortcomings here?
Colors are representative of separate SQL queries…
select E.label AS rLabel, E.target AS rid, E.ord AS rOrd, null AS cLabel, null AS cid, null AS cOrd, null AS subOrd, null AS sid, null AS str, null AS intfrom Edge Ewhere parent IS NULL
select null AS rLabel, E.target AS rid, E.ord AS rOrd, E1.label AS cLabel, E1.target AS cid, E1.ord AS cOrd, null AS …from Edge E, Edge E1where E.parent IS NULL AND E.target = E1.parent
select null as rLabel, E.target AS rid, E.ord AS rOrd, null AS cLabel, E1.target AS cid, E1.ord AS cOrd, E2.label as sLabel, E2.target as sid, E2.ord AS sOrd, null as …from Edge E, Edge E1, Edge E2where E.parent IS NULL AND E.target = E1.parent AND E1.target = E2.parent
select null as rLabel, E.target AS rid, E.ord AS rOrd, null AS cLabel, E1.target AS cid, E1.ord AS cOrd, null as sLabel, E2.target as sid, E2.ord AS sOrd, Vi.val AS str, null as intfrom Edge E, Edge E1, Edge E2, Vint Vi where E.parent IS NULL AND E.target = E1.parent AND E1.target = E2.parent AND Vi.vid = E2.target
( select E.label as rLabel, E.target AS rid, E.ord AS rOrd, … from Edge E where parent IS NULL)UNION ( select null as rLabel, E.target AS rid, E.ord AS rOrd, E1.label AS cLabel, E1.target AS cid, E1.ord AS cOrd, null as … from Edge E, Edge E1 where E.parent IS NULL AND E.target = E1.parent) UNION (
.
:) UNION ( . :
)
tree
?
@id
*
content
@id
*
*
icontent
subcontent
author
book
name
for $X in document(“mydoc”)/tree/contentwhere $X/subcontent = “XYZ”return $X
TheRoot(rootID)
Content(parentID, id, @id)
Subcontent(parentID, varchar)
Icontent(parentID, int)
Queries have form:
{<x1,x2, …, xn> p }
Predicate: boolean expression over x1,x2, …, xn
<xi,xj,…> R xiop xj xiop constconstop xi
xi. p xj. p pq, pq p, pq
where op is , , , , , and
xi,xj,… are domain variables; p,q are predicates
domain variables
predicate
Borrows the flavor of the relational calculus but is a “real” query language
Rout(T1) R1(T2), R2(T3), …, c(T2[ … Tn)
where Rout is the relation representing the query result, Ri are predicates representing relations, c is an expression using arithmetic/boolean predicates over vars, and Ti are tuples of variables
idb(x,y) r1(x,z), r2(z,y), z < 10
body
head
subgoals
Recall our example of a binary relation for graphs or trees (similar to an XML Edge relation):
edge(from, to)
If we want to know what nodes are reachable:
reachable(F, T) : edge(F, T) distance 1
reachable(F, T) : edge(F, X), edge(X, T) dist. 2
reachable(F, T) : edge(F, X), dist2(X, T) dist. 3
But how about all reachable paths? (Note this was easy in XPath over an XML representation  //edge)
(another way of writing )
Define a recursive query in datalog:
reachable(F, T) : edge(F, T) distance 1
reachable(F, T) : edge(F, X), reachable(X, T) distance >1
What does this mean, exactly, in terms of logic?
One of the three Datalog models is based on a notion of fixpoint:
In the RA, this requires a while loop!
Datalog:
reachable(F, T) : edge(F, T)
reachable(F, T) : edge(F, X), reachable(X, T)
RA procedure with while:
reachable += edge
while change {
reachable += F, T(T ! X(edge) ⋈F ! X(reachable))
}
Datalog allows for negation in rules
Single(X) Person(X), : Married(X,Y)
Range restriction, which requires that every variable:
Safe:q(X) r(X,Y)q(X) X = 5 q(X) : r(X,X), s(X)q(X) r(X) Ç (t(Y),u(X,Y))
Unsafe:q(X) r(Y)q(X) : r(X,X)q(X) r(X) Ç t(Y)
r (1,1)
r (1,2)
s (1,1)
v1 (1,2)q (1,2)
r (1,1)
r (1,2)
s (1,1)
v1 (1,2)q(x,y) : v1(x,y), : s(x,y)
r (1,1)
r (1,2)
s (1,1)
v1(x,y) : r(x,y), : s(x,y)q(x,y) : v1(x,y), : s(x,y)
+
q
v1

v1(x,y) : r(x,y), : s(x,y)q(x,y) : v1(x,y), : s(x,y)
+
s
r
foreach predicate p, set stratum(p) = 1
do until no change, or some stratum > # of predicates
foreach rule h b {
foreach negated subgoal of b with predicate q {
stratum(p) = max(stratum(p), 1+stratum(q))
}
foreach positive subgoal of b with predicate q {
stratum(p) = max(stratum(p), stratum(q)
}
}
A single Datalog rule with no “Ç,” “:,” “8” can express select, project, and join – a conjunctive query
We know how to “minimize” conjunctive queries
An important simplification that can’t be done for general SQL
We can test whether one conjunctive query’s answers always contain another conjunctive query’s answers (for ANY instance)
Suppose we have two queries:q1(S,C) : Student(S, N), Takes(S, C), Course(C, X), inCSE(C), Course(C, “DB & Info Systems”)q2(S,C) : Student(S, N), Takes(S, C), Course(C, X)
Intuitively, q1 must contain the same or fewer answers vs. q2:
We can say that q2 contains q1 because this holds for any instance of our DB {Student, Takes, Course}
(This is an NPcomplete algorithm in the size of the query. Testing for full firstorder logic queries is undecidable!!!)
Student
Takes
Course
inCSE
Need to get tuple <S,C> in executing q2 over this database
We’ve seen a new language, Datalog
We’ve seen that a particular kind of query, the conjunctive query, is written naturally in Datalog