1 / 45

Gregg Lippa Themis Inc. themisinc glippa@themisinc

DB2 Version 9: Overview of pureXML. Plus an XPath Primer, XML Column Indexes, Support for Validation. Gregg Lippa Themis Inc. http://www.themisinc.com glippa@themisinc.com.

Download Presentation

Gregg Lippa Themis Inc. themisinc glippa@themisinc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DB2 Version 9:Overview of pureXML Plus an XPath Primer,XML Column Indexes, Support for Validation Gregg Lippa Themis Inc. http://www.themisinc.com glippa@themisinc.com

  2. Gregg Lippa is currently a Senior Technical Advisor at Themis Inc. He teaches DB2-related courses on SQL, Application Programming, and optimization, performance and tuning as well as Java, J2EE, distributed computing and Java-related topics. Gregg has worked with DB2 as a consultant and trainer for over 20 years and with Java for 10 years. This material is taken from the Themis course DB1091: DB2 9 for z/OS pureXML Features. For more information visit http://www.themisinc.com. Products and company names mentioned may be trademarks of their respective companies. Mention of third party products or software is for reference only and constitutes neither a recommendation nor an endorsement.

  3. DB2 Version 9 • pureXML technology • Seamlessly integrating XML with relational data • Including publishing and parsing functions • Storing XML data natively in a database table • Including a new XML data type

  4. An XML Document <patient id="13579"> <name>John Doe</name> <addr> <street>123 Main</street> <city>Anytown</city> <state>PA</state> <zip>19134</zip> </addr> <phone type="home">610-654-1234</phone> <phone type="work">610-987-4321</phone> <email>jdoe@mymail.com</email> <service> <sdate>2008-09-22</sdate> <reason>flu shot</reason> <descrip>gave flu shot</descrip> <cost>40.00</cost> <copay>10.00</copay> </service> <service> <sdate>2008-05-11</sdate> <reason>sore foot</reason> <descrip>referred to specialist</descrip> <cost>60.00</cost> <copay>10.00</copay> </service> </patient> Root Nodes Elements Attributes Values (text) Well-formed Valid

  5. XML – Why? • Pervasive • Versatile • Self-describing • Neutral for exchanging data among diverse devices • Universal standard for data interchange • Growth of XML data • XML-based industry and data standards • SOA and Web services; Services-based frameworks • messages are encapsulated as XML • Web 2.0 technologies • XML feeds • Syndication services - rendered as XML files • XML data becoming more critical to enterprise operations

  6. Previous Approaches to XML Document Storage • File systems • Not storing XML documents in a database • Does not scale well • Stuffing • XML data stored as large objects or VARCHAR • Inefficient for querying the XML data • Shredding • Decomposing XML data into multiple columns & tables • Often leads to complex join requirements • May be difficult to recreate original XML document • Utilizing XML-only database systems • Few options and little expertise available

  7. XML vs Relational Model • Major differences: • XML data is hierarchical; relational data is tabular • XML data is self-describing; relational data is not • XML data is ordered; relational data is not • Which approach is right for my data? • XML maximizesflexibility of the data structure • Relational data provides performance benefits for data retrieval • Relational data supports referential integrity requirements • Referential constraints cannot be based on XML columns • Data warehouses are oriented toward relational data

  8. pureXML Capabilities • XML data type • With specialized hierarchical storage structure • Indexing capabilities • Based on data within XML documents • New query languages (XPath and SQL/XML) • New query optimization techniques too • XML schema support • Including validation • Database utilities support • Integration with JDBC, ODBC, Embedded SQL • XML shredding and publishing facilities • For composing and decomposing XML documents

  9. Benefits of DB2 pureXML Technology • Faster development • Code simplification • Avoiding XML-relational transformations • Increased agility • Versatile XML schema evolution • Quickly modify applications to support new or changing requirements • Improved usability • Exploit previously unmanaged XML data • Speed up query processing through XML-optimized storage and indexing

  10. pureXML Usage Scenarios • Integration ofdiverse data sources • DB2 provides the ability to join XML documents • Formprocessing • Store whole electronic forms in DB2 rather than shredding • Document storage and querying • Store and manage less structured (document-centric) XML data in DB2 • Using XML to support transactions • Service-oriented architectures (SOA) offer message-based transactions • XML data can then be retrieved, updated, searched and analyzed • Syndication and XML feeds • Serve XML feeds through a Web services interface • Provide a repository for XML data to support these feeds

  11. DB2 Support for XML 0 – XML Storage 1 – Bind in XML 2 – Store as XML 3 – Shred into Relational 4 – Retrieve XML data 5 – Publish as XML 6 – Bind XML output 7 – XML to XML 8 – XML to Relational 9 – Relational to XML DB2 ENGINE 7 XML 0 2 1 Applicationthat makesuse of XML 4 Textual XML 3 8 9 6 5 Relational XSR (validation)

  12. pureXML Architecture • DB2 9 – a hybrid database system • One database with both relational and native XML data • Single hybrid database enginehandles all processing • Application may combine SQL and SQL/XML • Access relational and XML data in a hybrid database • XML data is stored separately from other table contents • Has its own table space • Supports XML document validation with XML schemas • XML schemas used for validation are registered with DB2 • XML Schema Repository (XSR)

  13. XML Data Type • DB2 native XML support includes a new XML data type • An XML column holds one XML document for each row • XML data is stored in a parsed tree structure • XML document can also be stuffed into LOB or VARCHAR • Provides advantages in certain scenarios • XML storage requirements • Separate XML tablespace plus space for any needed indexes

  14. The XML Data Type • Optimized storage • New XML data type • Supports insert, update, and delete • Stores parsed XML documents • Available when creating or altering tables CREATE TABLE PATIENT (PATIENTID CHAR(6), PATIENT_XML XML) CREATE TABLE P2 (PID INT, PTYP CHAR(8), PX1 XML, PX2 XML) • Supports access to nodes within XML document via XPath • Query optimization • CREATE INDEX supports specification of an xmlpattern

  15. XML Indexes • Indexes are often used to improve query performance • Indexes on XML columns are supported in DB2 9 • Uses an XML pattern (XPath) expression • Indexes paths and values in stored XML documents • XML index entries provide access to document nodes • Not limited to providing access to the beginning of a document • Index keys are created based on XML pattern expressions CREATE INDEX PATINDEX ON PATIENT(PATIENT_XML) GENERATE KEYUSING XMLPATTERN '/patient/service/sdate' as SQL VARCHAR(10)

  16. Application Development Support • Supports developing apps that include XML requirements • Language support: C/C++, Java, Assembler, Cobol, PL/I • API support: JDBC, DB2 / ODBC, Embedded SQL, SQLJ • SQL/XML query support • DB2 sample database enhancements • Universal DB2 driver for JDBC enhanced to support XML • Provided extension XML type: com.ibm.db2.DB2Xml import com.ibm.db2.jcc.DB2Xml; DB2Xml xml1 = (DB2Xml) rs.getObject ("patient_xml"); String s = xml1.getDB2String(); InputStream is = xml1.getDB2XMLBinaryStream("UTF-16");

  17. DB2 and XPath Navigating Through the XML Tree

  18. DB2 XPath • XPath is an expression language • Designed by the World Wide Web Consortium (W3C) • Used to navigate XML documents • XPath expressions are similar to file path notations • DB2 XPath can be used: • With the XMLQUERY SQL built-in function • To extract data from an XML column • With the XMLEXISTS SQL predicate • To evaluate data in an XML column • When creating an XML index • To determine the XML document nodes to be indexed '/patient/name'  All name elements within patient elements

  19. XPath Expressions • XPath expressions: the basic building block of Xpath • Types of expressions provided by DB2 XPath: • Primary expressions: basic primitives of the language • Include literals, variable references, and function calls • Path expressions: locate nodes within a document tree • Include Node Tests and Filter Expressions • Arithmetic expressions: add, subtract, multiply, divide, modulus • Comparison expressions: compare two values • Logical expressions: use boolean logic • Anywhere an expression is expected, any kind of expression can be used • Operands of an expression are typically other expressions

  20. XPath Data Model Key: Document node Comment node Element Node Attribute node Text node OneEmp.xml Sample comment Employee Name Phone Phone Address Dept type="home" type="cell" mgr="bob" 800-555-1234 212-321-4321 Sales LastName FirstName Street City State Zipcode Roger Rabbit 123 Main Smallville Wyoming 98765

  21. Types of XPath Nodes • Document node encapsulates an XML document • Parent of root element node • Element node encapsulates an XML element • Can have one parent and many children • Attribute node represents an XML attribute • Belongs to an element • Text node encapsulates XML character content • Elements may have these • Processing Instruction (PI) node • Encapsulates XML processing instruction • Comment node encapsulates an XML comment • Namespaces node is considered to be a node in XPath

  22. XPath Nodes Processing Order <?xml version="1.0"?> <th:course xmlns:th="http://themis.com/test" th:format="instructor led"> <th:title>XML for DB2</th:title> <th:descrip>Lots of information <!-- To be determined --> </th:descrip> <?ourOwnPIforCourseware?> </th:course> Document node 1 Element node 2,5,7 Comment node 9 Processing Instruction 10 Text node 6,8 Attribute node 4 Namespace node 3 1 course 2 title descrip 7 5 10 3 4 th="http://themis.com/test" name=formatvalue=“instructor led” target=ourOwnPIforCourseware 6 8 9 Lots of information XML for DB2 To be determined

  23. Path Expressions Overview • Path expressions navigate the XML tree structure to locate nodes • Navigation axes are used in XPath; always start at context node • ForwardAxis moves down through the XML tree: • attribute (@) • child (default) • descendant • self (.) • descendant-or-self(//=/descendant-or-self:node()/) • ReverseAxis moves up through the XML tree: • parent (..) • Three parts of an axis step: • Axis specifies a direction of movement • Node test specifies node selection criteria • Predicates (zero or more) filter returned sequence parent Element self attribute Element Element Text Comment children descendants Text

  24. Path Expressions Syntax • XML document to support upcoming example code: • XPath expressions support abbreviated syntax in axis steps @ is abbreviated syntax for attribute // is abbreviated syntax for /descendant-or-self::node()/ .. is abbreviated syntax for parent::node() . is abbreviated syntax for self::node() is abbreviated syntax for child::node() • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>

  25. Path Expression Examples • ‘ / ’ slash indicates that path begins at root node • Whole XML document • ‘//service’  two slashes at beginning of path expression • Requested node, service, may be located anywhere in the document • Returns the entire service element, including all of its children • ‘//phone/@*’  All attributes appearing under the phone element • Only attribute under phone is: home • ‘//@*’  All attributes in the XML document • All attributes: 11123 home • ‘/patient/name’  All name elements under patient • Only name is Sara Lee • ‘/patient/addr/city/..’  All child elements of addr • Element addr, the parent of city, plus all child elements of addr

  26. Filter Expressions • A filter expression is a path expression followed by predicates in square brackets • Filter its result based on applying conditions; for example: • All service information of patients who have a $15.00 copay '/patient/*[copay="15.00"]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribe hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> • Set context node to patient and filter based on attribute type rather than on an element /patient/phone[@type="home"] <phone type="home">908-842-7531</phone> • Any patient that has email /patient[email] Entire XML document • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>

  27. Arithmetic Expressions • Arithmetic expressions: perform operations that involve addition, subtraction, multiplication, division, and modulus • The XPath arithmetic operators: *multiplication div division idiv integer division mod modulus + addition - subtraction • An arithmetic expression results in a numeric value • Or an empty sequence or an error • Place arithmetic expressions in parentheses

  28. Arithmetic Expression Example • An arithmetic expression to calculate the the remaining balance after the copay (and its return value): /patient/service/(cost – copay) 69.00 • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531</phone> • <email>saralee@cakemail.com</email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones</descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>

  29. Comparison Expressions – General Comparisons • Comparison expressions allow comparing two values • The comparison operators are = != < <= > >= • All services with a cost greater than 60 '/patient/service/cost > 60' true • All services with a cost greater than 60, but get the patient’s service info instead of just true or false '/patient/service[cost > 60]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84.00</cost> <copay>15.00</copay> </service> • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531 </phone> • <email>saralee@cakemail.com • </email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones • </descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>

  30. Comparison Expressions – Logical Comparisons • Logical expressions using AND return true if both of two expressions are true • Logical expressions using OR return true if one or both expressions are true • Return phone numbers of the type work or fax (two options shown)'//phone[./@type="work" or ./@type="fax"]''//phone[@type="work" or @type="fax"]' • The result is this phone info <phone type="work">908-842-7531</phone><phone type="fax">908-751-2468</phone> • <patient id="11123"> • <name>Sara Lee</name> • <addr> • <street>33 Maple</street> • <city>Nearly</city> • <state>NJ</state> • <zip>07123</zip> • </addr> • <phone type="home">908-842-7531 </phone> • <phone type="fax">908-751-2468 </phone> • <email>saralee@cakemail.com • </email> • <service> • <sdate>2004-02-29</sdate> • <reason>brittle nails</reason> • <descrip>prescribed hormones • </descrip> • <cost>84.00</cost> • <copay>15.00</copay> • </service> • </patient>

  31. The Built-in Function Library • Built-in functions offered by the DB2 XPath library: • String functions • Numeric functions • Functions working on boolean values • Functions working on sequences • Calls to these function are allowed in an XPath expression anywhere an expression is expected • Example: SELECT XMLQUERY('fn:concat($x,$y)' PASSING 'come ' AS "x", 'together' AS "y") FROM SYSIBM.SYSDUMMY1;

  32. A Sampling of XPath Functions fn:compare compares two strings to see which one is greater fn:concat concatenates two or more strings into a single string fn:contains determines whether a string contains a given substring fn:count returns the number of values in a sequence fn:normalize-space strips leading and trailing whitespace characters fn:lower-case converts a string to lowercase fn:matches determines whether a string matches a given pattern fn:position returns the position of the context item in the sequence fn:replace replaces characters that match a pattern fn:round returns the integer that is closest to a numeric value fn:string returns the string representation of a value fn:string-length returns the length of a string fn:substring returns a substring of a string fn:upper-case converts a string to uppercase

  33. XML Namespaces • XML namespaces prevent naming collisions • An XML namespace is a set of names identified by a namespace URI • Distinguishes element types or attribute names with the same name associated with different DTDs or Schemas • Namespaces allow qualifying names of elements and attributes • Contain an optional namespace prefix, a colon, and a local name • Example: two elements with the same name bound to different URIs: <anElement xmlns:p1="someURI" xmlns:p2="otherURI"> <ElementABC> <p1:table>excel</p1:table> <p2:table>dining</p2:table> <table>DB2</table> </ElementABC> </anElement> empty prefix; bound to default element namespace

  34. Prolog – Namespace Definition • A DB2 XPath expression optionally contains a prolog • Establishes the processing environment • Prolog declaration may specify multiple namespace declarations • May also specify one default namespace declaration • Prolog declaration is always followed by a semicolon (;) • Syntax: • Examples: declare namespace prefix="namespace string literal"; declare default element namespace "namespace string literal"; declare namespace fn="http://www.w3.org/2005/xpath-functions"; declare default element namespace "http://www.xyz.com/movies";

  35. Indexes and XML Indexes Built On Values Within XML Documents

  36. XML Indexing • DB2 supports creating indexes on XML columns • Generated using XML pattern expressions • Support access to nodes in the document • Multiple parts of an XML document can satisfy an XML pattern • Multiple index keys may be generated for insert of a single document • GENERATE KEY USING XMLPATTERN clause of CREATE INDEX • Specifies what you want to index • Contains XML pattern expression Same as before XML CREATE UNIQUE INDEX PATIENT_ID_IX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/@id' AS SQL VARCHAR(5) XML node tobe indexed Required keywords Type of storedindex values

  37. Data Types Associated With Pattern Expressions • Keys from XML pattern expression specified in a CREATE INDEX statement must be associated with a data type • May use either DECFLOAT or VARCHAR(n), where n <= 1000 • Value being inserted/indexed must be convertible to this type INSERT INTO PATIENT VALUES('12345', '<patient id="123456"><name>Jim Beam</name></patient>') Value too long forVARCHAR(5) index DSNT408I SQLCODE = -20305, ERROR: AN XML VALUE CANNOT BE INSERTED OR UPDATED BECAUSE OF AN ERROR DETECTED WHEN INSERTING OR UPDATING THE INDEX IDENTIFIED BY 'DBID~132 OBID~23' ON TABLE *N. REASON CODE = 1.

  38. UNIQUE Keyword in XML Index Definition • The UNIQUE keyword is supported in XML index definitions • However, its meaning is different than in relational index definitions • When creating a relational index, the UNIQUE keyword enforces uniqueness across all rows in the table • When creating an index over XML data, the UNIQUE keyword enforces uniqueness across all documents in an XML column

  39. Example Queries and Supporting Indexes • Example 1 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/service[copay="10.00"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX copayIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/copay' AS SQL DECFLOAT • Example 2 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/phone[@type="work"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX phoneTypIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/phone/@type' AS SQL VARCHAR(20) • Example 3 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/addr[city="Uptown"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX cityIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/addr/city' AS SQL VARCHAR(20)

  40. XML Schemas, Validation and XML Schema Repository (XSR)

  41. XML Schema • XML Schema Definition (XSD) defines structure of XML instance documents • Published as a recommendation by W3C • Defines elements and attributes permitted in a document • Defines parent / child relationship between elements • Defines data types, constraints and values for elements and attributes <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="patient"> <xsd:complexType> <xsd:sequence> <xsd:element name="id" type="xsd:integer"/> <xsd:element name="name" type="xsd:string"/> <xsd:element name="addr" type="xsd:string"/> <xsd:element name="dob" type="xsd:date"/> <xsd:element name="amount" type="xsd:double"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>

  42. XML Schema Management With the XSR • XML schema repository (XSR) • Set of tables that store XML schemas • Created during DB2 installation or migration • XML schemas may be added to the XSR • Then used to validate XML documents being inserted or updated • Registering XML schema documents (adding to XSR): • Call DB2-supplied stored procedures from a DB2 application • Or invoke a provided JDBC method from a Java application • Removing XML schema documents from the DB2 XSR • Call DB2-supplied stored procedure or invoke JDBC method Only schemas,and not DTDs,may be used forXML validation inDB2 Version 9

  43. XML Schema Validation • Use SQL INSERT statement to insert data into XML column • Inserted data must be a well-formed XML document • Validate the XML against a registered XML schema during insertion using the DSN_XMLVALIDATE function • User Defined Function • XML validation determines whether the structure, content, and data types of an XML document are valid according to a corresponding schema • Validation is optional INSERT into AutoDealers VALUES( '12345', CURRENT DATE, 'Sams Deals', DSN_XMLValidate(:xmlDealerInfo, SYSXSR.DealerInfoSchema));

  44. XML Decomposition • Decomposition, or shredding, is the process of storing XML document content in columns of relational tables • Decomposed data has SQL type of column where it is inserted • An XML schema consists of one or more XML schema documents • Annotated XML schema decomposition • Control of the decomposition process is provided by XML schema annotation

  45. Resources DB2 Version 9.1 for z/OS XML Guide (SC18-9858-03) DB2 9 for z/OS Technical Overview (SG24-7330-00) DB2 Version 9.1 for z/OS Application Programming and SQL Guide (SC18-9841-01) DB2 Version 9.1 for z/OS Utility Guide and Reference (SC18-9855-02) DB2 Version 9.1 for z/OS Application Programming Guide and Reference for Java(SC18-9842-01 )

More Related