XPath, the best known modal logic ever. And . . . made in Amsterdam! Maarten Marx Information and Language Processing Systems (ILPS) Informatics Institute, University of Amsterdam, The Netherlands. XPath, what is that? • A standard language proposed by the W3C in November 1999.
XPath, what is that?
• A standard language proposed by the W3C in November 1999.
• XPath is a language for addressing parts of an XML document.
• XPath beats temporal logic as the best known modal logic:
• Google: Resultaten 1 - 10 van circa 1.870.000 voor XPath
• Google: Resultaten 1 - 10 van circa 242.000 voor ”temporal logic”
Research aim behind this talk
• Create an expressively complete navigational query language for XML
documents.
Aim of this talk
• Show that modal logic is the right paradigm for such a task.
• One can get remarkable results with (for modal logicians) simple proofs.
• The modal logic literature is full of hints and almost-results.
Known results
• For binary relations, very little is known. Immerman Kozen:
1. strings have the 3 variable property;
2. bounded trees have a k variable property.
• For unary relations, more is known:
1. Kamp’s theorem, strings have H-dimension 3;
2. unbounded unordered trees have no finite H-dimension (Schlingloff)
3. unbounded ordered trees have H-dimension 3 (PODS 2004).
• Nb.
1. k-variable property is stronger than H-dimension;
2. k-variable property is independent from “finite complete set of
operators property” (Hodkinson–Simon, JPhL).
• Thus we cannot answer our research goal by known results.
Semantics
• Given an ordered tree,
– each path wff denotes a set of pairs of nodes, and
– each node wff denotes a set of nodes.
• All set theoretical operations have their standard meaning.
• hh p wff ii is true at a node n iff n is in the domain of the relation p wff.
Note! Every path wff (node wff) defines a first order definable binary
(unary) relation.
Example expressions
child :: pi child/?pi
child :: pi[descendant :: ] child/?pi/? hh child +
ii /descendant :: pi ? ¬hh parent ii /child + /?pi
child :: child
self :: pi[child] ?(pi ^ hh child ii )
preceding :: pi parent/left + /child/?pi.
Equivalent XPath 1.0 and Conditional XPath expressions.
Conditional XPath fulfills our research goal
• Theorem 1 (Kamp/PODS 2004) Every FO definable set of nodes is
definable by a Conditional XPath node wff.
• Theorem 2 Every first order definable binary relation is definable by a
Conditional XPath path wff.
• Corollary Every FO relation °(x1, . . . , xn) is equivalent to a union of
conjunctive queries consisting of atoms of the form xi path wff xj.
Difference between the two theorems
• Theorem 1 is about node wffs and unary relations. Theorem 2 about
path wffs and binary relations.
• Theorem 2 implies theorem 1, but not conversely.
• Node wffs have much stronger operators (and, not, bounded quantification).
• Path wffs only have “until”, concatenation and union.
XML document
An XML document can be seen as a
finite, node labelled, sibling ordered unbounded tree.
(Nb. We abstract away from the “data details” and only focus on the skeleton of an XML document.)
Design Constraints
• Stay as close as possible to the existing W3C standard XPath.
• This means:
– no (first or second order) variables.
– express sets of nodes (answer sets) and relations between nodes
(paths).
– relations should be “drawable” (use only the regular expression
operators)
Navigational XPath
We can give W3C XPath 1.0 a PDL like definition:
step ::= child | parent | right | left.
path wff ::= step | step +
| ?node wff
| path wff ; path wff | path wff [ path wff.
node wff ::= p | hh path wff ii | ¬ node wff | node wff ^ node wff.
• Note the very restricted use of ( · ) + !
• We use hh path wff ii to mean “I start a path wff”.
Conclusion
• The W3C standard is a well designed language. They have reinvented a wheel which has been shown to possess very good properties.
• Still, the expressive completeness is not completely satisfactory. (Note that W3C XPath is not complete for two variable FO for paths.)
• Conditional XPath is excellent for expressing first order queries.
• Implementing the conditional axis is still open (special staircase joins?)