xml in an rdbms world l.
Skip this Video
Loading SlideShow in 5 Seconds..
XML In An RDBMS World PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 75

XML In An RDBMS World - PowerPoint PPT Presentation

  • Uploaded on

XML In An RDBMS World. Michael D. Thomas mdthomas@ibiblio.org. Data, data, data. In general, computing is never far from data XML, RDBMSes deal with data – play in the same very large playground Extensive overlap between RDBMSes and XML

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'XML In An RDBMS World' - mada

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
xml in an rdbms world


Michael D. Thomas


data data data
Data, data, data
  • In general, computing is never far from data
  • XML, RDBMSes deal with data – play in the same very large playground
  • Extensive overlap between RDBMSes and XML
  • Goal: Learn how to choose between RDBMS vs. XML, how to integrate the two technologies
first some pragmatism
First, some pragmatism…
  • “When all you have is a hammer, everything looks like a nail.”
  • What if all you have is a hammer?
  • “When all you have is a hammer, everything must look like a nail.”
  • Best to have XML and RDBMS in your toolset
  • Architectural purity is often impossible
  • But we should know when & what we are compromising
tonight s agenda
Tonight’s Agenda
  • Compare XML and SQL RDBMS Technologies
  • Understand Their Sweet Spots
  • Examine Their Use Through Studies of Anti-Patterns and Patterns Of Usage
  • Discuss XQuery and XML Storage In The RDBMS
  • Defining The Playground – XML vs. RDBMS
  • Anti-Pattern: XML As Transactional Data Store
  • Data Stores & The ACID Test
  • Anti-Pattern: Document Storage & The RDBMS
  • XQuery & RDBMS Doc Storage
  • Anti-Pattern: XML Declarative Languages & Gratuitous Encapsulation Of SQL
  • Understanding Declarative Languages, Such As SQL
  • Anti-Pattern: XML Meta-data & The Overly Abstract Database
  • Managing The Split Between XML-based & RDBMS-based Data Models In The Same Application
xml as database anti pattern
XML As Database Anti-Pattern
  • “I’ll use an XML file as a database”
  • Advantages: Simple, Cheap
  • An XML Schema defines the metadata of the database
  • Data is right on the filesystem, not “hidden” in a complicated SQL RDBMS
xml as database problems
XML As Database Problems
  • Updates are hard
  • Have to re-invent concurrency handling – multiple users changing data at the same time
  • No constraint checking
  • No optimization, such as indexes, caching, etc.
organization of data
Organization of Data
  • Been formalizing the storage information for a long time
  • Age of computing called for new approaches
  • Relational model highly successful
  • XML is relatively new
  • Good for loosely structured data – documents -- and data transmission
transmission interoperability
Transmission & Interoperability
  • Electronic transmission is newer than electronic storage
  • With the growth of the Internet, transmission of data is exploding
  • No Interoperability problems in 1950 – lots of interoperability problems now
  • XML is an important standard for transmission and interoperability
  • Datastore – Anything that stores data
  • SQL Relational Database – A database that organizes info in tables and adheres to relational theory
  • XML – 1) eXtensible Markup Language 2) The XML standard 3) All or part of an XML document
  • XML Database – A database specialized for the storage of XML documents
  • Object Database – A database that stores objects
  • Object-Relational Database – A relational database with an extensible type system
data centric vs document centric
Data Centric vs. Document Centric
  • Data Centric approach – the datastore is focused on handling highly structured, fine grained data. Favors the relational model
  • Document centric approach – the datastore is focused on handling semi-structured data, such as web pages, books, etc. Favors XML
data centric vs document centric13
Data Centric vs. Document Centric
  • Best queries are largely a result of structuring the data well (Messy desk vs. organized file cabinet)
  • Documents are semi-structured data with ad hoc structures
  • The overhead of defining rigorous structure for each type of document increases the overall cost of management
  • (Would still give you the best queries)
storage vs transmission
Storage vs. Transmission
  • Data Storage and Data Transmission are two different concepts
  • XML is very strong for interoperable transmission
  • Can store by writing XML to a file
  • You can ‘transmit’ relational data by exporting a few tables and ftping, but isn’t a strong solution
three data models
Three Data Models
  • Persistent Model – how data is stored
  • Active Model (Object Model) – how data is arranged when it is being manipulated by a program, usually written in an imperative language
  • Presentation/Transmission Model – how data is transmitted, usually as XML
datastore basics
Datastore Basics
  • Any Datastore must tackle the following issues:
    • Concurrency
    • Transactions – the ACID test
    • Locking
    • Joins
    • Normalization
    • Administration Issues
  • Datastores support concurrency if multiple users can access the same datastore at the same time
  • Datastores must not allow the same data to be modified at the same time
  • Atomicity – No matter how complex, a transaction is atomic and indivisible. Transactions are “all or nothing.”
  • Consistency – Transactions must leave the database in a consistent state, i.e., consistent with the rules
  • Isolation – Transaction is isolated from other transactions
  • Durability – The effects of a transaction persist
isolation levels
Isolation levels
  • TRANSACTION_READ_UNCOMMITTED – Dirty reads, non-repeatable reads, phantom reads

– Non-repeatable reads, phantom reads


– phantom reads

  • Data must be locked for transactions to be isolated
  • Locking is both a datastore and an application concern
  • How much extraneous data is locked? (Page level locking, document level locking)
  • Pessimistic locking: prevents reading of locked data
  • Optimistic locking: generates an error when inappropriate data updates are attempted
  • A Join joins the data between two different data entities
  • E.g., SELECT * from emp, dept WHERE emp.deptno = dept.deptno
  • Joins are the cornerstone of SQL
  • XPath doesn’t do joins between XML documents!
  • XQuery, others can
  • Normalization – organizing data to minimize redundancy
  • Normalized data is easier to maintain and easier to understand conceptually
  • In relational db design, normalized DBs need to be denormalized for performance reasons
  • XML docs can also be normalized, but not as much support for tying elements together
  • Normalization is important when designing, less crucial at implementation
administrative issues
Administrative Issues
  • Includes backup & recovery, installation, upgrades, optimization, maintenance, etc.
  • In general:
    • Bigger is better (economies of scale, 24x7 support)
    • More popular, more standard is better (law of increasing returns)
    • Whatever is already working in your organization is better (no need to hire, re-train administrators)
  • Existing DB vendors have a huge advantage
  • Different types:
    • Relational Database – highly structured data such as account balances, inventory quantities, etc.)
    • Document Database – used to store documents, probably in XML format
    • The same DB can serve as both, e.g., Oracle
overview of relational databases
Overview Of Relational Databases
  • The Relational Model
  • SQL
  • Entity Relationship Diagrams
relational model structure
Relational Model -- structure
  • Data is grouped in tables
  • Tables have columns and rows
  • Columns are fairly fixed – the set of columns shouldn’t change much (if at all) over the life of a table
  • A table can have any number of rows
  • Rows change constantly
relational model primary key
Relational model – primary key
  • In general, a table should have one or more columns defined as the primary key
  • The primary key is unique and non-null
  • Usually only one column
  • Can consist of more than one column (composite primary key)
relational model foreign key
Relational model – foreign key
  • A foreign key describes a relationship between two tables
  • The foreign key column of tableA points to a column in tableB
  • TableB is said to be the parent of tableA
  • Often, the foreign key points at a primary key

Three types:

  • Query SELECT * FROM emp WHERE deptno=10 ORDER BY ename
  • Data Modification Language (DML): Update, Insert
  • Data Definition Language (DDL): Create Table
  • A Join is a Cartesian Cross-Product:

SELECT count(*) FROM emp;

SELECT count(*) FROM emp;

SELECT count(*) FROM emp, dept;

SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept;

SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept WHERE emp.deptno = dept.deptno;


SELECT ename, dname FROM emp, dept WHERE emp.deptno = dept.deptno

entity relationship diagrams
Entity Relationship Diagrams
  • Used to create a map of your data
  • Describes tables, attributes of tables and relationships between tables
  • Come at it from two directions: lay out the entities, assign the attributes; group the attributes into entities
xml vs sql tables
XML vs. SQL Tables
  • XML – order matters! (Rows in a relational table are unordered)
  • XML is a tree structure
  • XML documents tend to be semi-structured, SQL tables are highly structured
  • SQL tables aren’t as flexible or inter-changable
  • XML joins aren’t straightforward
  • An XML document/element doesn’t serve multiple purposes as well as DB schema/table
anti pattern xml document storage as a sql blob
Anti-Pattern: XML Document Storage As A SQL BLOB
  • Need to store XML in a database
  • Better than storing in a filesystem
  • Make a BLOB (Binary Large Object) and store a document as an element in a row of a table
  • Problem: can’t query contents of document without extracting it from the db!
  • For simple queries, might have to extract all of the documents into the application – very inefficient
anti pattern one off sql schema for a particular xml document
Anti-Pattern: One-off SQL Schema For A Particular XML Document
  • BLOB storage is bad, so why not “shred” the document across multiple DB tables?
  • I.e., for XML elements named “Dept” make a “Dept” table, “Emp” elements make a “Emp” table, describe the hierarchy with foreign key constraints
  • Query performance is much, much better – probably better than XPath against XML as a file
  • A lot of work! Have to do this for every XML schema
  • Hard! XML schemas are inherently more flexible than SQL schemas. Some mappings can be difficult
  • Negates flexibility of XML
  • Not as learnable – Programmers have to learn to query your SQL schema, not just XPath and XML Schema
anti pattern developing a general shredding solution for all of xml
Anti-Pattern: Developing A General “Shredding” Solution For All Of XML
  • Your Application has several XML schemas, and it’s time-consuming to develop “shredding” for all of them
  • So, you try to do a more generalized shredding
  • Is possible, but is a very horizontal problem. You probably won’t get the time to solve it completely.
  • Still might present learnability problems
xml datastore architectures
XML Datastore Architectures
  • XML Views Of Relational Data
  • Relational Wrappers Of XML
  • Independent Storage of XML Documents (Native XML Database)
  • Text Storage Of XML In RDBMS
  • XML Shredding Across Relational Tables
  • Storing XML as Objects in Object- Relational DBs (Oracle XMLType)
  • Everything is XML (XQuery approach)
vocabulary xml collections
Vocabulary: XML Collections
  • XML Collection is a collection of XML documents
  • “A row is to a table as an XML document is to an XML collection.”
xml derived from rdbms
XML Derived From RDBMS
  • Data exists naturally as relational data
  • Needs to be represented as an XML document for some reason
  • The derived XML is usually used for transmission
independent xml store
Independent XML Store
  • A specialized database is used to store XML documents
  • Typically, an application will either have two datastores – relational and XML – or relational data will be stored as XML
  • In the dual datastore case, the application code has to join the different data sets
text storage of xml in databases
Text Storage Of XML In Databases
  • XML is stored in a column of a relational table
  • Allows you to mix structured and semi-structured approaches
  • However, hard to query against the XML directly
xml shredding across tables
XML Shredding Across Tables
  • An XML document is stored across many tables
  • Is possible to use SQL queries against the document’s parts
  • Vendors can implement an XPath-to-SQL translator
  • Can structure the tables based on a schema
  • A pain to handle yourself
  • Shred an XML schema across relational tables
  • Object-Relational databases allow you to define your own types
  • You could define an Address object, define functions for the object, and store instances of the object in a column in the database
  • Oracle defines an object-relational type, XMLType, for the storage of XML documents in the database
  • SELECT e.poDoc.getClobval() AS poXML FROM po_xml_tab e WHERE e.poDoc.existsNode('/PO[PNAME = "po_2"]') = 1;
  • Shredding is managed & encapsulated for you
everything is xml xquery
Everything Is XML (XQuery)
  • With XMLType, Oracle says that everything fits in to the Object-Relational realm
  • XQuery says that everything can be represented and queried as XML
  • Relational data is derived from RDBMS using SQL/XML
sql xml

select xmlelement("emp",

'Employee ' ,

xmlelement( "name", e.job || ' ' || e.ename),

' was hired on ',

xmlelement("hiredate", e.hiredate)) as result

from emp e;



<name>CLERK SMITH</name>

was hired on



query soup
Query Soup
  • SQL – queries relational data
  • XPath – queries a particular XML document
  • SQL/XML – a standard for deriving XML from relational data
  • XQuery – queries a collection of XML documents and uses XPath to query particular XML documents
query soup54
Query Soup
  • There are other XML query languages, such as XML-QL and Quilt
  • XSLT is a transformation, not a query, language
  • XSQL & MS XML/SQL are wrapper languages
anti pattern encapsulating sql with xml
Anti-Pattern: Encapsulating SQL With XML
  • General Rule: “Encapsulation is good, but encapsulating good things is bad.”
  • SQL is good
  • (Also, HTML is good)
  • Why hide SQL with XML?
xml encapsulating sql
XML encapsulating SQL

SELECT emp.ename, dept.dname FROM emp, dept WHERE emp.deptno = dept.deptno



<field> ename </field>

<field> dname </field>



<table> emp </table>

<table> dept </table>



<join>emp.deptno = dept.deptno </join>



same concept in xqueryx
Same concept in XQueryX

“XQueryX is an XML representation of an XQuery… The result is not particularly convenient for humans to read and write, but it is easy for programs to parse…”

xquery example
XQuery Example
  • XQuery:

{ for $b in doc("http://www.bn.com/bib.xml")/bib/book where

$b/publisher = "Addison-Wesley" and

$b/@year > 1991 return <book year =

"{ $b/@year }"> { $b/title } </book>


xqueryx version
XQueryX Version

<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="xqueryx.xsl"?> <xqx:module xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xqx="http://www.w3.org/2003/12/XQueryX" xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd"> <xqx:mainModule> <xqx:queryBody> <xqx:expr xsi:type="xqx:elementConstructor"> <xqx:tagName>bib</xqx:tagName> <xqx:elementContent> <xqx:expr xsi:type="xqx:flwrExpr"> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>b</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>bib.xml</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>bib</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>book</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:whereClause> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>AND</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>=</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>Addison-Wesley</xqx:value> \

xqueryx version cont
XQueryX Version (cont.)

<xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>Addison-Wesley</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>&gt;</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName> <xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:integerConstantExpr"> <xqx:value>1991</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:whereClause> <xqx:returnClause> <xqx:expr xsi:type="xqx:elementConstructor"> <xqx:tagName>book</xqx:tagName> <xqx:attributeList> <xqx:expr xsi:type="xqx:attributeConstructor"> <xqx:attributeName>year</xqx:attributeName> <xqx:attributeValue> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName> <xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:expr> </xqx:attributeValue> </xqx:expr> </xqx:attributeList> <xqx:elementContent> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>title</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:returnClause> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:queryBody> </xqx:mainModule> </xqx:module>

xml as user interface
XML As User Interface
  • Development teams often forget to write use cases for administrators, downstream developers
  • XML declarative languages must be considered as ways to provide user interfaces
  • Documentation, examples important
  • Usability, Learnability issues important
declarative xml in app lifecycle
Declarative XML In App. Lifecycle
  • Compile-Time – Best Performance, Least Flexibility
  • Start-Time – XML Parsed Once, Good Performance, Good Flexibility. App must be restarted or manually refreshed to get flexibility
  • Run-Time – Can Impact Performance, But Great Flexibility. Web Services Approach
anti pattern the overly abstract database
Anti-Pattern: The Overly Abstract Database
  • By using XML, you can achieve runtime configurability
  • Instead of “hard coding” SQL table names, do the same job in XML configuration files
  • Problem: database is so abstract that it is very hard or impossible to do SQL queries against it directly
  • Database is closed – just a persistence extension to the application
  • The application (and thus, the app dev team) assumes all responsibility for any work with the data
the split persistence data model
The Split Persistence Data Model
  • Applications Have Persistence Data Models
  • Persistence Data Model – any data stored on disk
  • Everything might be in the DB
  • Everything might be in XML docs
  • Usually, there is some split – i.e., 95% in a SQL DB, but 5% in a properties file
  • Managing the split between DB-based data and XML-based data is “accidental complexity” and can be a headache
joins between data models
Joins between data models
  • Can use XQuery to join the two data models
  • Could use XPath enhanced SQL to join the two models
  • Often, the two models are joined at the object level
databases have metadata
Databases Have Metadata
  • Column names, table names & constraints are all metadata
  • Defined at DB design time, which is usually app design time
  • App is often hard coded against the database meta data
pros vs cons
Pros vs. Cons
  • More flexible – XML can change at runtime
  • More complex at the application level
  • Database must be more abstract
  • Can make the application harder to extend
  • Don’t forget – a DB table can also serve the same purpose as an XML meta data doc