Xml and databases
1 / 48

XML and Databases - PowerPoint PPT Presentation

  • Uploaded on

XML and Databases. Ronald Bourret [email protected] http://www.rpbourret.com. Overview. Is XML a Database? Why Use XML with Databases? Data vs. Documents Storing and Retrieving Data Storing and Retrieving Documents. Is XML a Database?. Is XML a database?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' XML and Databases' - obedience-dunn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Xml and databases
XML and Databases

Ronald [email protected]://www.rpbourret.com


  • Is XML a Database?

  • Why Use XML with Databases?

  • Data vs. Documents

  • Storing and Retrieving Data

  • Storing and Retrieving Documents

Is xml a database1
Is XML a database?

  • This is really two questions

    • Is an XML document a database?

    • Are XML and its surrounding technologies adatabase management system (DBMS)?

Is an xml document a database
Is an XML document a database?

  • Yes, it is a collection of data

  • Pros

    • Self-describing

    • Portable (Unicode)

    • Can store directed graphs

  • Cons

    • Slow access

    • Verbose

Are xml and surrounding technologies a dbms
Are XML and surrounding technologies a DBMS?

  • Yes, they have:

    • Data storage (XML documents)

    • Schemas (DTDs, XML Schemas, RELAX, etc.)

    • Query languages (XPath, XQuery, XQL, etc.)

    • APIs (SAX, DOM)

Are xml and surrounding technologies a dbms cont
Are XML and surrounding technologies a DBMS? (cont.)

  • No, they don’t have:

    • Separation of logical and physical data

    • Efficient storage

    • Indexes

    • Transactions

    • Multi-user access

    • Security

    • ...

Using xml as a database
Using XML as a database

  • Good for small, single-user databases

    • .ini files

    • Simple address book

    • List of browser bookmarks

    • Catalog of MP3s stolen with the help of Napster

  • Almost useless for large or multi-user databases

Why use xml with databases1
Why use XML with databases?

  • Expose legacy data as XML

  • Transfer data between databases

  • Integrating data from a variety of sources

  • Store semi-structured data

  • Queue e-commerce messages

  • Manage and query large document collections

Data vs documents1
Data vs. documents

  • Are you storing documents or the data in them?

    <Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>Yellow = Data White + Yellow = Document

  • Helps determine the system you need

  • Look at your XML documents to decide

Data centric documents
Data-centric documents

  • Use XML primarily as a data transport

  • Designed for machine consumption

  • Sales orders, scientific data, dynamic Web pages

  • Characteristics

    • Regular structure

    • Fine-grained data

    • Little or no mixed content

    • Sibling order not significant

Example sales order
Example: Sales order

<Order> <Number>1234</Number> <Customer>Gallagher Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

Example dynamic web page
Example: Dynamic Web page



<title>Flight Schedule: SFO to FRA</title>



<p>Daily flights from SFO to FRA</p>



<tr><td>Air France</td><td>527</td><td>12:00</td><td>10:33</td></tr>







Document centric documents
Document-centric documents

  • Designed for human consumption

  • Use XML to provide structure, metadata

  • Books, presentations, email, static Web pages

  • Characteristics

    • Irregular or semi-regular structure

    • Large-grained data

    • Lots of mixed content

    • Sibling order significant

Example product description
Example: Product description


<Para><Name>XML-DBMS</Name> is <Summary>middleware for transferring data between XML documents and relational databases</Summary>. It is written by <Developer>Ronald Bourret</Developer>.</Para>

<Para>XML-DBMS uses an object-relational mapping in which complex element types are viewed as classes and simple element types, PCDATA, and attributes, as well as references to complex types, are viewed as properties.</Para>

<Para>You can:


<Item><Link URL="Readme.htm">Read more about XML-DBMS</Link></Item>

<Item><Link URL="jxmldbms.zip">Download Java version</Link></Item>

<Item><Link URL="pxmldbms.zip">Download PERL version</Link></Item>




Storing data and documents
Storing data and documents

  • Store data in traditional database

    • Use a native XML database under certain conditions

  • Store documents in native XML database

    • Use a traditional database under certain conditions

  • Boundary between data and documents not always clear in practice

Storing and retrieving data
Storing andRetrieving Data

Goals and non goals
Goals and non-goals

  • Goals

    • Preserve data and hierarchical order

    • Optionally preserve sibling order

    • One- or two-way data transfer

  • Non-goals

    • Preserve physical structure (entity use, encodings, ...)

    • Preserve DTD, comments, processing instructions...

    • Preserve document identity

Data transfer software
Data transfer software

  • May be middleware or integrated into DBMS

  • If integrated, DBMS is said to be XML-enabled

Mapping data in xml documents to databases
Mapping data inXML documents to databases

  • Most common mapping strategies

    • Template-driven

    • Model-driven

  • No mapping needed for native XML databases

Template driven mappings
Template-driven mappings

  • Commands embedded in template

  • Extremely flexible

    • Retrieve data with SQL or other query language

    • Place values almost anywhere in document

    • Parameterize subsequent SQL statements

    • Programming constructs such as if-then-else and for

  • Transfer from database to XML only

Example template
Example: Template

<?xml version="1.0"?>


<Intro>The following flights have available seats:</Intro>

<SelectStmt>SELECT Airline, FltNumber, Depart, Arrive

FROM Flights</SelectStmt>

<Conclude>We hope one of these meets your needs.</Conclude>


Example output
Example: Output

<?xml version="1.0"?>


<Intro>The following flights have available seats:</Intro>





<Depart>Dec 12, 1998 13:43</Depart>

<Arrive>Dec 13, 1998 01:21</Arrive>




<Conclude>We hope one of these meets your needs.</Conclude>


Model driven mappings
Model-driven mappings

  • Two mappings are common

    • Table-based

    • Object-relational

  • Data transferred according to model

  • Two-way data transfer

  • Simpler than templates, but less flexible

  • Often used with XSLT

Table based mapping
Table-based mapping

  • Map document with “table” structure to RDBMS




<column1>value 1</column1>

<column2>value 2</column2>


















Pros and cons
Pros and cons

  • Pros

    • Easy to understand

    • Code is simple and fast

    • Useful for serializing databases

  • Cons

    • Only works on a small subset of XML documents

Object relational mapping
Object-relational mapping

  • Map XML document to objects...


Customer Item


<Order SONumber="12345">

<Customer CustNumber="543">




<Item LineNumber="1">

<Part Name="Cherries">



<Qty Unit="ton">2</Qty>



Object relational mapping cont
Object-relational mapping (cont.)

  • ... and objects to tables















Customer Item


Objects are data specific
Objects are data-specific...

  • Different for each DTD (schema)

  • Model the content (data) of the document


Customer Item


<Order SONumber="12345">

<Customer CustNumber="543">




<Item LineNumber="1">

<Part Name="Cherries">



<Qty Unit="ton">2</Qty>



Not the dom
... not the DOM

  • Same for all XML documents

  • Model the structure of the document

Element Attr

(Order) (SONumber)

Element Element Element

(Customer) (OrderDate) (Item)

... ... ...

<Order SONumber="12345">

<Customer CustNumber="543">




<Item LineNumber="1">

<Part Name="Cherries">



<Qty Unit="ton">2</Qty>



Pros and cons1
Pros and cons

  • Pros

    • Can handle any XML document

    • Maps well to existing data structures

  • Cons

    • Very inefficient for mixed content

Data transfer issues
Data transfer issues

  • Data types

    • All XML data is string

    • Conversion problems due to many formats

  • Null data

    • Equivalent to missing element or attribute

Data transfer issues cont
Data transfer issues (cont.)

  • Binary data

    • No standard way to store in XML

    • Commonly stored as unparsed entities or Base64

  • Character sets

    • XML can use any encoding, including Unicode

    • Databases often require single encoding

    • Unicode is inefficient to store

Storing data in a native xml database
Storing data in anative XML database

  • Data stored in XML (document) format

  • Pros

    • Handles semi-structured data efficiently

    • Fast retrieving whole documents

    • Support for XML query languages, XLinks, etc.

Storing data in a native xml database cont
Storing data in anative XML database (cont.)

  • Cons

    • Slow retrieving views outside of document hierarchy

    • No referential integrity

    • Data not accessible by non-XML applications


  • Preserve entire document

    • Data: elements, attributes, PCDATA

    • Logical structure: element hierarchy, sibling order

    • Physical structure: entities, CDATA, encoding...

    • Other: DTD, comments, processing instructions...

  • Preserve document identity

Storing documents as blobs
Storing documents as BLOBs

  • Pros

    • Exploits existing capabilities: transactions, security...

    • Many databases have text search tools

  • Cons

    • Text-based searches of XML unreliable

Indexing xml blobs with side tables
Indexing XML BLOBswith “side tables”

  • Consider the following DTD

    <!ELEMENT Brochure (Title, Author, Content)><!ELEMENT Title (#PCDATA)><!ELEMENT Author (#PCDATA)> <!-- To be indexed --><!ELEMENT Content (%Inline;)> <!-- Inline entity from XHTML -->

  • Store complete documents in one table

    Brochures---------BrochureID INTEGER <--------- Index brochure IDsBrochure LONGVARCHAR <--------- Complete XML documents

Indexing xml blobs with side tables cont
Indexing XML BLOBswith “side tables” (cont.)

  • Store elements to be indexed in separate table

    Authors----------------------Author VARCHAR(50) <--------- Index authorsBrochureID INTEGER

  • Search index table and join to document table

    SELECT Brochure FROM Brochures WHERE BrochureID IN (SELECT BrochureID FROM Authors WHERE Author='Chen')

Storing documents in native xml databases
Storing documents innative XML databases

  • Store whole XML documents in “native” form

  • Define a (logical) model for an XML document

    • Minimal model is elements, attributes, PCDATA, and document order

    • Store and retrieve documents according to that model

  • Have normal database features

    • Query language, indexes, transactions, security, etc.

Implementation strategies for native xml databases
Implementation strategies for native XML databases

  • Text-based

    • Store documents as text

    • Proprietary or file-system storage

  • Model-based

    • Store pre-parsed documents according to model

    • Relational, object-oriented, hierarchical, or proprietary storage

Persistent doms pdoms
Persistent DOMs (PDOMs)

  • Implement DOM over persistent storage

  • Returned DOM tree is “live”

  • Used by DOM applications that process very large XML documents

  • Database is usually local

Content management systems
Content management systems

  • Manage document fragments (content)

  • Hide database from user

  • Maintain versions, document metadata

  • Include editors, publishing systems, etc.

  • Extensible through scripting or programming


  • Ronald Bourret’s Papers Page

    • http://www.rpbourret.com/xml/index.htm

  • XML:DB.org’s Resources Page

    • http://www.xmldb.org/resources.html

  • XML:DB Mailing List

    • http://www.xmldb.org/projects.html


Ronald [email protected]://www.rpbourret.com