1 / 53

DB2 9 for z/OS pureXML Performance and Best Practices

DB2 9 for z/OS pureXML Performance and Best Practices. Information Management June, 2010. Agenda. F. XML performance Performance monitoring and tuning Best practices. Note: the following performance numbers are not latest and have been improved since measurements.

annick
Download Presentation

DB2 9 for z/OS pureXML Performance and Best Practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DB2 9 for z/OS pureXML Performance and Best Practices Information ManagementJune, 2010

  2. Agenda F • XML performance • Performance monitoring and tuning • Best practices Note: the following performance numbers are not latestand have been improved since measurements.

  3. Create tables with XML columns or alter table add XML columns Insert XML data, optionally validated against schemas Create indexes on XML data Efficiently search XML data Extract XML data Decompose XML data into relational data or create relational view Construct XML documents from relational and XML data Handle XML objects in all the utilities and tools What You Can Do with pureXML - Managing XML data the same way as relational data XML

  4. Storage for UNIFI Messages 70% 52% 58% 71% 67% 96 sample documents Strip WS: Strip Whitespaces Pres WS: Preserve Whitespaces

  5. Whitespace option and Compression 70% saving 80% saving Note : UNIFI (International Standard ISO 20022 – UNIversal Financial Industry message scheme )

  6. Table Compression Impact

  7. Insert Performance (Batch) 3.9 millions 10K docs per hour or 1100 docs/sec Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

  8. Insert XML – with indexes

  9. XML Index Create or Rebuild //e /a/b/@c /a/b/f/g

  10. Insert Performance – compare w/ CLOB (average of 1K to 10M document insert performance)

  11. Fetch Performance (Batch) 9.3 millions 10K docs per hour or 2580 docs/sec Measurement in March 2007, z9 DS8300, Single thread, Docs in EBCDIC

  12. XML Index Exploitation

  13. XML Insert v.s. Validation v.s. Decomposition

  14. Large Sample Tax Document Insert Performance z9-109, one LPAR with dedicated 3 CPs. Documents were stored remotely in AIX box and inserted using Java application. Time in seconds.

  15. z/OS XML Specialty Engine Support 48% 32% 14%

  16. LOAD Testing Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 4 General Purpose CPs, 1 zAAP, 1 zIIP, all dedicated Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

  17. TPoX Insert Test Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 dedicated General Purpose CPs (no zIIP no zAAP) Memory: 24GB memory Storage IBM DS8300 Case–A Operating system z/OS Version 1.8 DB2 DB2 9 June 2007 PTF level Case –B Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level

  18. TPoX Mixed Transaction Test (1/2) Processor IBM System z9 Enterprise Class (z9 EC) LPAR configuration: 3 General Purpose CPs, dedicated (no zAAP or zIIP) Memory: 24GB memory Storage IBM DS8300 Operating system z/OS Version 1.9 DB2 DB2 9 Feb 2008 PTF level Threads 35

  19. TPoX Mixed Transaction Test (2/2)

  20. TPoX 2500 100.00% 90.00% 2000 80.00% 70.00% 1500 60.00% ETR (tps) 50.00% cpu busy (%) 1000 40.00% 30.00% 500 20.00% 10.00% 0 0.00% PK81260 PK80732 10 PK80732 20 PK80732 30 PK80732 40 TPoX Benchmark October, 2009 z10, 5 CPs Number of users

  21. Agenda • XML performance • Performance monitoring and tuning • Best practices F

  22. Performance Monitoring and Tuning • Since XML native storage is built on top of regular tablespace structure, there are no special changes in DB2 Performance Expert to support XML other than minor points - such as new XML locks (type x’35’). • XML performance problem can be analyzed through accounting traces and performance traces. • There is a new LOAD MODULE for XML: DSNNXML • XML indexes have the same consideration as other indexes. • The REORG utility should be used to maintain order and free space. • Run RUNSTATS for statistics to help pick XML indexes.

  23. XML Query Performance Issues • 85% of the performance issues relate to: • Query execution plans • Index usage (indexing presentation) • Proper coding of SQL/XML and XQuery expressions (Best Practices section)

  24. How to obtain and analyze XML query plans • Create Explain tables • Use member DSNTESC of the SDSNSAMP library • Option E from menu of DB2 admin tool (DSN_STATEMNT_TABLE) • Use Visual Explain • Optim Development Studio • IBM DB2 Optimization Service Center for DB2 for z/OS(OSC) • Gather explain information • Use SPUFI – prefix query with EXPLAIN PLAN SET QUERYNO • SELECT from PLAN_TABLE

  25. Use RUNSTATS • Use RUNSTATS to collect statistics for XML data and indexes so the optimizer can pick the right access methods LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

  26. Agenda • XML performance • Performance monitoring and tuning • Best practices F

  27. Best Practices • Tip 1: Choose the right table and storage design • Tip 2: Choose the right XML document granularity • Tip 3: Be aware of XML schema validation overhead • Tip 4: Avoid encoding conversion during XML insert and retrieval • Tip 5: In XPath expressions, use fully specified paths as much as possible • Tip 6: Define lean XML indexes • Tip 7: Put document filtering predicates in XMLEXISTS instead of XMLQUERY • Tip 8: Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS • Tip 9: Use RUNSTATS to collects statistics for XML data and indexes • Tip 10: Use SQL/XML publishing views to expose relational data as XML • Tip 11: Use XMLTABLE views to expose XML as relational data • Tip 12: Use SQL/XML statements with parameter markers and host vars

  28. Tip 1: Decision making: XML input => storage Regulatory Requirements Flexible Digital Signature Intact Significant Data (strip whitespace) (Relational/XML) (preserve whitespace) No Structures RegularFixed Search in XML Return XMLalways Yes Relational Complex Flexible Never Yes Yes Light Reporting XML with XMLTABLE() XML with XML indexes LOB VARCHAR VARBIN (can be materialized) Heavy Analytics Relational with XML

  29. Some considerations • Tedious normalization and frustrated changes of schema are an indicator for using native XML. • Store hybrid or redundant data in relational/XML, when • Fully normalized storage is an overkill • Referential integrity: extract into relation columns • Store in XML, but materialize frequently used fields in relational for heavy analytic applications • Document size • Use compression for XML data always

  30. Table Design DOCTAB ID DOCTYPE XMLDOC • Mixed document types in one table • Flexibility in exchange of overhead (such as index maintenance) • Separate tables for different document types • to avoid overhead DOCTYPE1TAB ID XMLDOC DOCTYPE2TAB ID XMLDOC

  31. Tip 2: Choose the right XML document granularity • Small vs. large documents? (KBs vs. MBs)XML Indexes filter at the document level • Smaller documents tend to perform betterBut, rule of thumb: Document granularity should match the predominant granularity of access

  32. Document Granularity: Example

  33. Tip 3:Beware of Schema Validation Overhead create tabledept(deptID char(8), deptdocxml); Validation is optional, and per document (per row): insert intodeptvalues (?, ?) insert intodeptvalues (?, dsn_xmlvalidate(?, ?)) No validation with validation Validation increases CPU time for inserts, and reduces throughput. Use schema validation if needed.Avoid schema validation for highest possible insert performance.

  34. Tip 4: Avoid encoding conversion • Internally encoded XML: encoding derived from the data, e.g. Unicode Byte-Order Mark or optional XML declaration: <?xml version="1.0" encoding="UTF-8" ?> • Externally encoded XML: application encoding determines XML encoding if character type variables are used • Internally encoded XML with UTF-8 is preferred • CLI: use SQL_C_BINARY data buffers rather than SQL_C_CHAR, SQL_C_DBCHAR, SQL_C_WCHAR • Java: use binary stream (setBinaryStream) rather than string (setString). • COBOL: SQL BLOB

  35. Tip 5: Use fully specified paths if possible • As much as possible possible, use fully specified XPath expression rather than wildcards, e.g. • /customerinfo/phone instead of //phone • /customerinfo/addr/state instead of /customerinfo/*/state <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo>

  36. Tip 6: Lean XML Indexes create tablecustomer(infoXML); create indexidx4 on customer(info) generate key using xmlpattern '/customerinfo/phone' as sql varchar(40); • create unique index idx1 on customer(info) • generate key using • xmlpattern '/customerinfo/@Cid' • as sql decfloat; • create index idx2 on customer(info) • generate key using • xmlpattern '/customerinfo/name' • as sql varchar(40); <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo> create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40); LUW: “as sql double” zOS: “as sql decfloat”

  37. create tablecustomer(infoXML); Tip 6: Lean XML Indexes Don’t index everything! Very expensive for insert, update, delete ! • create unique index idx1 on customer(info) • generate key using • xmlpattern '/customerinfo/@Cid' • as sql decfloat; • create index idx2 on customer(info) • generate key using • xmlpattern '/customerinfo/name' • as sql varchar(40); <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> <phone type="work">905-555-4789</phone> <phone type="home">416-555-3376</phone> <assistant> <name>Peter Smith</name> <phone type="home">416-555-3426</phone> </assistant> </customerinfo> create index idx3 on customer(info) generate key using xmlpattern '//name' as sql varchar(40); create index idx4 on customer(info) generate key using xmlpattern '//text()' as sql varchar(40);

  38. <customerinfo Cid="1004"> <name>Matt Foreman</name> <addr country="Canada"> <street>1596 Baseline</street> <city>Toronto</city> <state>Ontario</state> <pcode>M3Z-5H9</pcode> </addr> (…) </customerinfo> Tip 6: Lean XML Indexes and Indexing non-leaf Nodes • Typically not useful ! • …xmlpattern '/customerinfo/addr' as sql varchar(128); • Single index entry. Key value = concatenation of all text nodes under “addr”: • (/customerinfo/addr, “1596 BaselineTorontoOntarioM3Z-5H9”) • Better: 4 separate indexes ! • …xmlpattern '/customerinfo/addr/street' as sql varchar(50); • …xmlpattern '/customerinfo/addr/city' as sql varchar(40); • …xmlpattern '/customerinfo/addr/state' as sql varchar(25); • …xmlpattern '/customerinfo/addr/pcode' as sql varchar(10);

  39. Tip 7 & 8 :Put document filtering predicates in XMLEXISTS instead of XMLQUERY & Use square brackets [ ] to avoid Boolean predicates in XMLEXISTS • XMLQUERY function in a SELECT clause does not filter documents or rows, does not use indexes • Document/Row-filtering predicates must be in XMLEXISTS in the WHERE clause • Predicates in XMLEXISTS must be in square brackets

  40. create tablecustomer( infoXML); SQL/XML with XMLQUERY • select xmlquery(‘$i/customerinfo[phone = “905-555-4789”]/name’ passing info as “i”) • from customer customer table: 3 record(s) selected 1 record(s)selected • select xmlquery(‘$i/customerinfo/name’ passing info as “i”) • from customer • where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

  41. create tablecustomer( infoXML); SQL/XML with XMLEXISTS • select xmlquery(‘$i/customerinfo/name’ passing info as “i”)from customerwhere xmlexists(‘$i/customerinfo/phone = “905-555-4789”’ passing info as “i”) True or false, not empty! customer table: 3 record(s) selected 1 record(s) selected • select xmlquery(‘$i/customerinfo/name’ passing info as “i”) • from customer • where xmlexists(‘$i/customerinfo[phone = “905-555-4789”]’ passing info as “i”)

  42. Tip 9: Use RUNSTATS on XML data! • RUNSTATS does collect statistics for XML data and XML indexes! • The optimizer does use these stats! LISTDEF DBACORDTSLIST INCLUDE TABLESPACES DATABASE DBACORD RUNSTATS TABLESPACE LIST DBACORDTSLIST TABLE(ALL) INDEX(ALL)

  43. Tip 10:Use SQL/XML publishing views to expose relational data as XML • SQL/XML publishing functions hidden in a view • create table unit( unitID char(8), name char(20), manager varchar(20)); • create view UnitView(unitID, name, unitdoc) as select unitID, name, XMLELEMENT(NAME "Unit", XMLELEMENT(NAME "ID", u.unitID), XMLELEMENT(NAME "UnitName", u.name), XMLELEMENT(NAME "Mgr", u.manager) ) • from unit u;

  44. Tip 10:Use SQL/XML publishing views to expose relational data as XML • Queries that perform sub-optimally select unitdoc from UnitViewwhere xmlexists('$i/Unit[ID = "WWPR"]' passing unitdoc as "i"); • Query that performs well: filter on relational select unitdoc from UnitViewwhere UnitID = "WWPR"; In a nutshell, include relational columns in a SQL/XML publishing view, and when querying the view express any predicates on those columns rather than on the constructed XML.

  45. Tip 11:Use XMLTABLE views to expose XML data in relational format • Values returned from XML documents in tabular format • create table customer(info XML); • create view myview(CustomerID, Name, Zip, Info) as SELECT T.*, info FROM customer, XMLTABLE ('$c/customerinfo' passing info as “c” COLUMNS “CID” INTEGER PATH './@Cid', “Name” VARCHAR(30) PATH './name', “Zip” CHAR(12) PATH './addr/pcode' ) as T;

  46. Tip 11:Use XMLTABLE views to expose XML data in relational format • Query with an XML predicate • May perform sub-optimally select CustomerID,Name from myview where Zip = “95141”; • Will perform well select CustomerID, Name from myview where xmlexists('$i/customerinfo[addr/pcode ' “95141”] passing info as “i”); In a nutshell, be careful with XMLTABLE views which expose XML data in relational form. When possible, include additional columns in the view definition so that filtering predicates can be expressed on those columns instead of the XMLTABLE columns.

  47. Extracting values from XML for Hybrid Store using Trigger • CUST(ID, NAME, CITY, ZIP, INFO): extract NAME, CITY, ZIP from INFO (XML) CREATE TRIGGER ins_cust AFTER INSERT ON cust REFERENCING NEW AS newrow FOR EACH ROW MODE DB2SQL BEGIN ATOMIC update cust set (name, city, zip) = (select X.name, X.city, X.zip from cust, XMLTABLE('customerinfo' PASSING CUST.INFO COLUMNS name varchar(30) PATH 'name', city varchar(20) PATH 'addr/city', zip varchar(12) PATH 'addr/pcode-zip') as X where cust.id = newrow.id ) where cust.id = newrow.id; END #

  48. Tip 12: Use Parameter markers and host vars for fast XML queries select info from customer where xmlexists('$i/customerinfo[phone ="905-555-4789"]' passing info as "i") select info from customer where xmlexists('$i/customerinfo[phone= $p]' passing info as "i", cast(?as varchar(12)) as"p") select info from customer where xmlexists('$i/customerinfo[phone = $p]' passing info as "i", :vchostvar as "p")

  49. XML Queries – Things to do to improve Performance LUW chart • Use XPath instead of FLWOR where possible • Reason: Simpler is better, for humans and for the DB2 optimizer. • XML may not have to be reconstructed and FLWOR can use more tempspace • Avoid parent for predicates • Eg. /a/b//d[../c=fn:string(“abc”)] • Reason: parent steps in the predicate prevent index usage • See: http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/#cases • Use range predicates when appropriate • Eg. [dateOfBirth [.>= xs:date(“2000-01-01”) and .<= xs:date(“2000-12-31”)]] • What is important here is the notation with the self axes (the dots). • Reason: This allows DB2 to use a single XISCAN instead of 2 XISCANs + IXAND. • See: http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/#rangepredicates

  50. Using SPUFI or JCL SELECT Cid, InfoFROM DSN8910.CUSTOMERWHERE XMLEXISTS ('declare default element namespace "http://posample.org";//addr[city="Toronto"]' passing INFO) • XML and XPath are case-sensitive: CAP off/case mixed • SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN UNEXPECTED TOKEN DEFAULT FOLLOWING DECLARE. • Terminal session CCSID setting has to be consistent with application encoding scheme as “[” and “]” have different code points in different code pages. • SQLCODE = -16002, ERROR: AN XQUERY EXPRESSION HAS AN UNEXPECTED TOKEN  FOLLOWING "Toronto".

More Related