1 / 35

XPath and Beyond: Formal Foundations

XPath and Beyond: Formal Foundations. Jean-Yves Vion-Dury Xerox Research Centre Europe / INRIA. Pierre Genevès INRIA. Roadmap: Part 1. XPath: a cornerstone of the XML architecture Theory and Engineering Some key problems The trends around XPath theoretical studies

liza
Download Presentation

XPath and Beyond: Formal Foundations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XPath and Beyond: Formal Foundations Jean-Yves Vion-Dury XeroxResearch Centre Europe / INRIA Pierre GenevèsINRIA

  2. Roadmap: Part 1 • XPath: a cornerstone of the XML architecture • Theory and Engineering • Some key problems • The trends around XPath theoretical studies • A Logic Based approach • Mathematical Characterization • Why using the Coq Proof Assistant ? XPath and Beyond: Formal foundations

  3. XPath: a cornerstone of the XML architecture • Expresses both node selection and/or structural properties • Currently used in XSLT, XQuery, XML Schema, XLink, XPointer,… • XPath is elegant, compact, effective and powerful • Claim: will be increasingly used and studied in the future • Indexing large document bases • Checking integrity constraints / global structural properties • Linking increasing document volumes XPath and Beyond: Formal foundations

  4. Theory and Engineering in Computer Sciences • Some decades ago, some theoretical studies prepared engineering • The relational algebra enabled a huge market around data storage and access • Information Theory prepared digital processing (networks, image and sound processing, compression algorithms,…) • Linguistic, Logic and Formal mathematics prepared programming languages • A Strange situation today around documents… • W3C Standardization activities produce specifications, and many problems remain open • Some theoreticians try to capture problems and to understand underlying issues, long after the publication of the specifications! • This induces new difficulties and requires different approaches • In order to deal with low level issues, closed from implementations • In order to face complexity of systems XPath and Beyond: Formal foundations

  5. Some Key Problems around XPath • Formal semantics definition • Formal Model of Documents (trees, streams, graphs, strings,…?) • Precise, useful and simple Denotational/Operational semantics • Type checking • Constraints on Document structure (tree grammars, graph grammars, pattern matching) • Valid/Invalid Path expression with respect to a particular schema • Rewriting path expressions • In order to customize compilation/interpretation • Normalization • Optimization • Reduction of the complexity of suitable models • Simplifying expressions while preserving semantics • Equivalence p1 ≈ p2 • gives a fundamental understanding of the language • Containment p1 ≤ p2 • Gives an even more fundamental view • Key inference: If p is a key for a schema S, then all p’ such that p’ ≤ p are keys too XPath and Beyond: Formal foundations

  6. Linking Key Problems around XPath • Invalid expression and containment • p ≤  • Rewriting and equivalence • (p1 | p2)/p -> p1/p | p2/p and • (p1 | p2)/p ≈ p1/p | p2/p • Optimization and containment • If p1 ≤ p2 then (p1 | p2)/p -> p2/p • Equivalence and containment • p1 ≈ p2 iff p1 ≤ p2 and p2 ≤ p1 • Containment and type checking • Structural constraints can be captured in XPath expression • Structural Constraint satisfaction can thus be checked XPath and Beyond: Formal foundations

  7. The problem of containment (expression) XPath and Beyond: Formal foundations

  8. The problem of typed containment (expression) XPath and Beyond: Formal foundations

  9. The Trends around XPath Theoretical Studies XPath and Beyond: Formal foundations

  10. A Logic Based Approach • A set of axioms to reason on terms comparison ≤ • As opposed to model based approaches • A partial equivalence relation to minimize the axiom set • fully congruent (e.g. p1 ≤ p2 and p1==p3 implies p3 ≤p2) • Theorems for simplifying the containment proofs • E.g. reflexivity, transitivity • Drawback: syntactic level • more combinatorial as opposed to model based approaches • Advantage: syntactic level • more extensible, provided the previous point is addressed • Gives more indication on the underlying issues due to language peculiarities XPath and Beyond: Formal foundations

  11. XPath: abstract syntax ([Wadler99],[Olteanu01]) XPath and Beyond: Formal foundations

  12. Denotational semantics ([Wadler99][Olteanu01]) XPath and Beyond: Formal foundations

  13. Denotational semantics ([Wadler99][Olteanu01]) XPath and Beyond: Formal foundations

  14. Denotational semantics ([Wadler99][Olteanu01]) XPath and Beyond: Formal foundations

  15. Basic axioms XPath and Beyond: Formal foundations

  16. Union & Intersection XPath and Beyond: Formal foundations

  17. Qualifiers XPath and Beyond: Formal foundations

  18. The equivalence relation ( [Olteanu01]) XPath and Beyond: Formal foundations

  19. Using equivalence in proofs XPath and Beyond: Formal foundations

  20. Mathematical Characterization • Soundness of the equivalence • Soundness of rules (e.g.) • Completeness of rule system (e.g.) XPath and Beyond: Formal foundations

  21. Why Using the Coq Proof Assistant ? • Coq http://coq.inria.fr is a Proof Assistant based on the Calculus of Inductive Constructions • Higher Order Logic • Constructive Logic • Typed • To address the complexity problem related to proofs • To benefit from the help of the Proof Assistant in case analysis • To maintain all the mathematical architecture along exploratory work • To work in a rigorous frame • To produce rock solid and readable results • The challenge: • Require powerful data structure modelling capabilities • Learning Coq is an additional difficulty ! • Developing a proof in Coq is more demanding • But… • Coq is quite mature now (v8.0, 25 years of research !) and very expressive XPath and Beyond: Formal foundations

  22. Roadmap: Part 2 • Modelling XPath using inductive constructions • Formal Semantics and interpretations • Interpreter based on the denotational semantics • A relational semantics for XPath • Modelling the containment relation • Using the proof system: containment checking • Current work on characterization • Methodology and expected outcomes XPath and Beyond: Formal foundations

  23. Modelling XPath using inductive constructions • Paths are defined inductively • “void” (), “top” () are atoms • |/  … are binary constructors • [] involves qualifiers • _true, _false are atoms • “and”, “or”, “not” : constructors • “leq” (): a cross-inductive definition • Functional notation, example: • a/b[c] • slash a (qualif b c) XPath and Beyond: Formal foundations

  24. Interpreter based on the denotational semantics • Evaluates a path p from the context node x of the tree t • The evaluation of a path returns a set of nodes • Cross-Recursive and terminating functions • The evaluation of a qualifier returns a boolean XPath and Beyond: Formal foundations

  25. Need for a logic-based semantics • The classical semantics describes an interpreter that computes nodesets • This computational vision leads to useless complexity in proofs • Is there another way to capture XPath Semantics? XPath and Beyond: Formal foundations

  26. A Relational Semantics for XPath • An Interpretation of paths in First-Order Logic • A path is translated into a dyadic formula • Rp holds for all pairs (x,y) of nodes such that y is accessed from x through the path p. • Advantages: • interpretations of paths and qualifiers are unified • Direct translation in Coq Sem math du papier XPath and Beyond: Formal foundations

  27. Modelling the containment relation (1) • A binary logical relation “Ple” • Gathers all containment rules in a single inductive construction • Suited for using Coq’s built-in tactics (constructor, inversion) XPath and Beyond: Formal foundations

  28. Modelling the containment relation (2) • The containment relation ≤ for paths • Is inductive • Is defined using its dual relation  for qualifiers (“Qipl”) XPath and Beyond: Formal foundations

  29. Using the proof system: Containment Checking • We have modelled: • XPath terms • Their interpretation • The containment relation (that gathers our containment axioms) • We can now check containment facts with the proof engine • Demo of a tactical which proves the fact: ./*/b ≤ ./descendant::b • Underlying goal: extend the tactical in order to automatize the checking of all containment facts XPath and Beyond: Formal foundations

  30. Proving Properties: Characterization • Proving the equivalence of semantics (done) • Current work: proving the validity of our axiomatization: • Soundness • Completeness • Finding relevant induction schemes • mutual induction (duality between paths-qualifiers) • Induction on a measure of the term complexity • Finding generic and modular Coq tactics (to reduce combinatorial issues) XPath and Beyond: Formal foundations

  31. Methodology and Possible outcomes Extend the fragment Inductive Relation Ple Fix wrong rules Add missing rules Not Sound Sound Incomplete Complete Intrinsically Incomplete Undecidable Decidable why? Algorithm Undecidable Decidable Incomplete Algorithm why? XPath and Beyond: Formal foundations

  32. Conclusion • We proposed a Logic based framework for static analysis of XPath • Modelling with inductive constructions (XPath terms and interpretations, Containment Relation) • Preliminary result: a simpler semantics • Ongoing Work on Characterization XPath and Beyond: Formal foundations

  33. Backup slides • Applications XPath and Beyond: Formal foundations

  34. Some Applications (1) • Optimization of XPath queries • Detecting contradictions (p ≤ void) • Eliminating redundancies • Example: • //a[*/b/c and descendant::b] /descendant::a[*/b/c] • */b/c => descendant::b • An optimization not currently achieved at runtime by XPath engines: Xalan C++ XPath and Beyond: Formal foundations

  35. Some Applications (2) • Static Analysis of XPath host languages • Example: XSLT • Checking XSLT stylesheets • Optimization of XSLT stylesheets • Extending XPath expressive power with an inclusion constraint: p[p1  p2] • Integrity Constraint-Checking • id(//book/@authors)  //persons/@name • Transformation languages strongly based on XPath XPath and Beyond: Formal foundations

More Related