1 / 17

Time to Leave the Trees: From Syntactic to Conceptual Querying of XML

Time to Leave the Trees: From Syntactic to Conceptual Querying of XML. Bertram Lud ä scher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego. Overview. Motivating Example: querying XML w/o and w/ conceptual-level information

garret
Download Presentation

Time to Leave the Trees: From Syntactic to Conceptual Querying of XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego XMLDM'02, Prague

  2. Overview • Motivating Example: • querying XML w/o and w/ conceptual-level information • “syntactic” vs. “conceptual” querying of XML • Distilling conceptual-level information: • MXS (abstract Model for XML Schema) • XPathT: • Incorporating conceptual-level information in XPath XMLDM'02, Prague

  3. Motivating Example • Example: “Books DB” (yes, more complex examples exist... ;) • elements: <myDB> ... <book> .... <price> .... <author> ... • Sample Queries: • Q1: Which <book>s have a <price> below $80? • Q2: What’s the count and average <price> of <book>s? • (Nice) Try: • Q1: myDB//book[price<80] • Q2: N := count(myDB//book); S := sum(myDB//book/price); Avg := S/N; • But what about ... • ... <book>s with multiple <price>s? • ... <awe> (award-winning-exemplars) elements (= subtype of book having subelement <award>): we forgot those! XMLDM'02, Prague

  4. Schema Information to the Rescue! • XML & Semistructured Data Model: • labeled ordered trees • “instance contains its own schema information” • XML instances and DTDs have very little “schema info”: • tag names (aka element “types”) = attribute names • element nesting = object (“slot”) structure • no data types, constraints, classes, class hierarchy, ... • Schemas are Good for You! • link to conceptual models/DB design, query formulation, • validation, storage layout (optimization), • query processing (optimization), ... • XML Schema XMLDM'02, Prague

  5. Motivating Example (Cont’d) • Q1 after studying <myDB> and/or its XML Schema: • there is a type hierarchy below type bookT • tag names are bound to those types • but XPath doesn’t know this => use Syntactic Queries: //*[book OR tbook OR cbook OR...OR awe] [price<80] • tedious and error-prone (do-it-yourself: Appendix A) • e.g. you overlooked <publication xsi:type=“bookT”> ! (usually schema info notcontained in the XML instance) • small changes in the schema (adding a new subtype) require rewriting of your query... XMLDM'02, Prague

  6. From Syntactic to Conceptual XML Queries 1. Distill conceptual information from the XML Schema • Abstract Model of XML Schema (MXS) 2. Incorporate MXS information into the query language • XPathT (“XPath with types/classes”) • turn Syntactic XML Query //*[book OR tbook OR cbook OR ... OR awe] [price<80] • into a more adequate Conceptual XMLQuery: //*[ts(bookT)][price<80] /* works for any subtype of bookT */ • more robust w.r.t. schema changes • new opportunities for semantic query optimization XMLDM'02, Prague

  7. Abstract Model of XML Schema (MXS) • Basic Ideas: • Formal abstract model(never mind the XML Schema syntax!), inspired by Model Schema Language (MXL) [Brown-Fuchs-Robie-Wadler-WWW10-2001] • “Types as Classes” • XML Schema Names: • T: Type Names • E: Element Names • A: Attribute Names • XML Instances... • ... usually contain only element names (tags) Eand attributes A ( exception: “xsd:type = ...” ) XMLDM'02, Prague

  8. Abstract Model of XML Schema (MXS) • MXS Names • T: Types, E: Elements, A: Attributes • Kinds of Types • simple vs. complex: T_s, T_c • abstract vs. concrete: T_a, T_na • Type Hierarchy • restrict (T_s  T_s)  (T_c  T_c) • restricts possible instances, keeping structure • extend (T_s  T_c)  T_c • adds “slots” (elements and attributes) • subtype = extend  restrict • extend and restrict are subtyping mechanisms XMLDM'02, Prague

  9. Type (Class) Hierarchy in XML Schema • Convention: user-defined type names end with “T” • authorT, publicationT, bookT, ... XMLDM'02, Prague

  10. EXTEND SUBTYPE RESTRICT Inheritance in XML Schema (I) expTextBookT ::= SUBTYPE(bookT) that RESTRICTs<price> to expPriceTandEXTENDs with <recommended_for> XMLDM'02, Prague

  11. multiple inheritance single inheritance Inheritance in XML Schema (II) 19thcenturyTextBookType ::= SUBTYPE{textBookT, c19bookT} XML Schema type system does not known the two are equivalent! XMLDM'02, Prague

  12. Framework for Conceptual Queries in XML • Binding Types to Elements • bind  (E  (T_s  T_c ))  (A  T_s) • binds element names to simple or complex types • binds attribute names to simple types • Syntactic XML Instance: D • root(NodeId), child(NodeId,Integer,NodeId), tag(NodeId,Tagname), data(NodeId,Data) • Conceptual XML Instance: D+ • restrict(T, T), extend(T, T), subtype(T, T), • bind(E  T, T) • ... XMLDM'02, Prague

  13. XPathT: Incorporating Type (Class) Information in XPath • XPath patterns p and qualifiers q: p[q] returns matches of p which qualify according to q • New XPathT patterns: • r(t), e(t), s(t):restrict, extend, subtype type t • tr(t), te(t), ts(t): transitive versions XMLDM'02, Prague

  14. Semantics of XPathT • Example: “transitive subtype”: SEM( ts(t) ) := { t’ | subtype*(t,t’) } from types to element names: SEM( [T] ) := { e | bind(t,e), tT } SEM( [ts(bookT)] ) := {book,ebook,tbook, ...} XMLDM'02, Prague

  15. conceptual information tree structure information Conceptual(-level) XML Queries in XPathT • Which books have price below $80? //*[ts(bookT)][price<80] • Semantic-aware equivalent rewriting: //*[ts(bookT)][NOT ts(expTextBookT)][price<80] • Logic XPathT Query Plan: XMLDM'02, Prague

  16. Summary • Complex domains require conceptual level modeling and querying capabilities beyond just tree structure • Statues Quo: XML Schema: simple “conceptual model” with may ad-hoc “design decisions”/restrictions • Abstract Model of XML Schema (MXS) • XPathT: first step towards “conceptual” or “semantic” XML query language extensions • more concise, intuitive, flexible, and robust queries • the system maps conceptual to syntactic queries, not the programmer/query designer! XMLDM'02, Prague

  17. Next Steps & Outlook • extend MXS to include more conceptual information • develop formal semantics • XPathT, extensions: XPathC, XQueryC • research problems: • mapping: XPathC queries => equivalent XPath queries • formalize equivalence, always possible? Then, conventional XML query processors can be used! • “proxy XML Schema doc”: instead of rewriting into XPath over the original instance, can one materialize some conceptual info as a “proxy XML doc” such that conceptual queries become conventional queries against the proxy... • semantic query optimization: equivalent rewritings given the conceptual level constraints XMLDM'02, Prague

More Related