1 / 36

Query Processing with XML

Query Processing with XML. CSE 350 – Advanced Database Topics Jeffrey R. Ellis. Query Processing Topics. Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research Other Query Languages XISS (XML Indexing and Storage System).

gunnar
Download Presentation

Query Processing with XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis

  2. Query Processing Topics • Why? • Java and Other Programming Languages • XPath/XSLT • XQuery (W3C-sponsored Query Language) • Current Research • Other Query Languages • XISS (XML Indexing and Storage System)

  3. FIRST – Distinction between XML and HTML/Web Technologies • XML spotlight is analogous to Java • Immediate benefits applied to World Wide Web • Long-range, more exciting benefits in applications • XML IS NOT AN HTML REPLACEMENT • HTML marks pages up for presentation on the web • XML marks text for semantic information purposes • XML can encode HTML pages, but HTML works well on the Web

  4. XML Data Storage • XML Documents • Data is delineated semantically • Schemas/DTDs control contents of elements • Semi-structured attitude allows flexibility • Text is human-readable and machine-parsable • Open standards work with common tools • File data storage allows for easy sharing • Can queries control access to data?

  5. Traditional Database Storage • Databases • Data is delineated semantically • Schemas control contents of rows • No flexibility from semi-structured storage • Data is not human-readable, but only machine-parsable • Proprietary standards prevent interoperability • Proprietary storage prevents data sharing • Queries control access to data

  6. XML for Query Processing • If we can get efficient query processing, XML document storage provides many benefits over traditional database storage. • Sample application • Employee database document • XML Schema assumed to exist • Employee information queried as per standard HR processing

  7. <?xml version="1.0"?> <!DOCTYPE employees SYSTEM "employee.xsd"> <employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> … </employees>

  8. Tree Structure of XML Document • Remember that XML documents are trees emp gender name position salary location last first mi

  9. Query Processing – Programming Languages • XML Documents are flat files • Any language with file I/O can read XML document • Any language with string parsing capabilities can use XML data • Query processing done through language syntax • “Obvious” result different from traditional databases

  10. Query Processing – Programming Languages • Strategy • Basic File I/O through language • Basic String matching to identify elements • Processing possible, but not necessarily efficient • Languages have gathered XML processing tools in libraries • xerces – Apache library for Java and C++ • Two methods for parsing XML data • DOM • SAX

  11. DOM • Document Object Model • Defined by W3C for XML, HTML, and stylesheets • Provides an hierarchical, object-view of the document • DOMParser parses through file, then provides access to nodes • Key: Every item in XML document is a node

  12. DOM Example Node (Element) name=“emp” attribute1 child1 Node (Attr) name=“gender” value=“m” parent Node (Element) name=“name” parent child1 Node (Element) name=“last” parent child1 Node (Text) value=“Bissell” parent

  13. SAX • Simple API for XML • Defined by XML-DEV mailing list • Provides an event-driven processing of the document • XMLReader parses through file and activates different methods and functions based on the elements retrieved • Key: Methods are defined in interface, implemented in user code

  14. DOM versus SAX • SAX is primarily Java-based; DOM defined for most languages • DOM requires storage of entire document in memory; SAX processes as it reads • DOM mirrors a document that can be revisited; suited for document processing • SAX mirrors object lifecycles; suited for data processing

  15. Query Processing - XPath/XSLT • Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure • XPath identifies the location of various document elements • XSL Stylesheets provide methods for tranforming data from one format to another • Combining XPath and XSLT provides easy generation of result sets based on queries

  16. XPath • Provides element, value, and attribute identification employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian” //salary = “35,000”, “40,000”, “35,000”, “60,000” count(/employees/emp) = 4 //mi = “Q”

  17. XSLT • Stylesheet transforms data from one form into another <xsl:template match=“name”> <xsl:value-of select=“first”/> <xsl:value-of select=“last”/> </xsl:template> = Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos

  18. Combine XPath and XSLT for Queries • Query: Find the last name and position of each employee named Brian <xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template>

  19. Combine XPath and XSLT for Queries • Query: Find the average salary of all non-managers <xsl:template match='employees'> <xsl:variable name='running_sum'> <xsl:value-of select='sum(emp/salary[../position!="Manager"])'/> </xsl:variable> <xsl:variable name='running_count'> <xsl:value-of select='count(emp[position!="Manager"])'/> </xsl:variable> <xsl:value-of select='$running_sum div $running_count'/> </xsl:template>

  20. Results XSLT/XPath • Many SQL queries can be accomplished • XPath provides element (data) access • XPath provides basic functions (e.g., sum() ) • XPath provides WHERE functionality • XSLT provides SELECT functionality • XSLT provides ORDER BY functionality (sort) • XSLT provides result set formatting • UNION functionality provided ..?

  21. Querying with XPath and XSLT • Important questions • Is it sufficient? • Is it efficient? • Is there a better way? • XML community has need to design a full query language • XQuery – Working draft published 7 June 2001

  22. Query Processing - XQuery • XML provides flexibility in representing many kinds of information • Good query language must be likewise flexible • Pre-XQuery languages are good for specific types of data • Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.”

  23. XQuery Forms • Path expressions • Element constructors • FLWR expressions • Operator/Function expressions • Conditional expressions • Quantified expressions • Data Type expressions

  24. XQuery – Path Expressions • Contribution of XPath • XQuery 1.0 and XPath 2.0 Data Model document(“sample1.xml”)//emp/salary /employees/emp/name[../@gender=‘f’] //emp[1 TO 3]/name/first

  25. XQuery – Element Constructors • Queries can generate new elements • Similar to XSLT abilities <worker> {$name/last} {$position} </worker>

  26. XQuery – FLWR Expressions • For clause/Let clause/Where clause/Return • Similar to SQL FOR $e IN document(“sample1.xml”)//emp WHERE $e/salary > 38000 AND $e/@gender = ‘f’ RETURN $e/name

  27. XQuery – Operator/Function Expressions • Pre-defined and user-defined operators and functions • Still under development: Union, Intersect, Except FOR $e IN //employees/emp WHERE not(empty($e//mi)) RETURN $e/name

  28. XQuery – Conditional Expressions • If-then-else expressions are not yet limited to boolean (ongoing discussion) FOR $e IN /employees/emp RETURN <worker> {$name} IF ($e/position=“Manager”) THEN <manager /> </worker>

  29. Quanitifed Expressions • Some/Every conditions • Some/Every evaluates to True or False FOR $e IN //employees WHERE SOME $p IN $e//emp/position = “Manager” RETURN $e

  30. Data Types • Data Types based on those available from XML Schema • Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) ) • User-defined data types are also allowable and parsable

  31. XQuery • More choices than XSLT/XPath combination • Work in progress • Current W3C efforts into query language • Influencing the future design of the core XML technologies (XPath) • Hopes to be fully flexible for all future XML applications

  32. Query Processing – Research • XQuery specification continues to undergo review and change • 6 of 7 specification documents released since June • All specifications released in 2001 • Other avenues of research • Other Query languages • Indexing strategies • Implementation

  33. Query Processing – Other Query Languages • Many query languages exist • Quilt (basis for XQuery) • W3C early languages (XML-QL, XQL) • Adopted traditional languages (OQL, XSQL) • Research papers (XML-GL, YATL, Lorel) • Other query languages often optimized for a particular subset of XML documents • Query language field *MAY* be standardizing to XQuery

  34. Query Processing – Indexing Strategy • Query language less important; better indexing techniques lead to efficiency • XISS (XML Indexing and Storage System) • September 19, 2001 publishing • Builds sets of indexes on XML data elements and attributes on initial parse of XML document • Lookup becomes constant-time through the various built indexes • Demonstrated successes in test runs

  35. Query Processing - Implementation • XML is currently in state of flux • Standards are still being revised • Industry cautious before embracing a new technology • Economic slowdown may prevent new research and development efforts • XML still waiting for its “Killer App”, application that forces immediate acceptance

  36. XML Query Processing • XML is a functional database storage language • Efficient query language needed to turn XML into a viable database • Query language solutions are being developed • Java/C++ hooks first developed – OK • XSLT/XPath implemented – GOOD • XQuery being designed – GREAT? • Future additions – ????

More Related