ii xml data management
Download
Skip this Video
Download Presentation
II. XML Data Management

Loading in 2 Seconds...

play fullscreen
1 / 33

II. XML Data Management - PowerPoint PPT Presentation


  • 301 Views
  • Uploaded on

II. XML Data Management. A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data management. What is XML?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'II. XML Data Management' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ii xml data management

II. XML Data Management

A : XML refresher

using material from A. Silverschatz and M. Sapossnek

B: - XML-Data Management (1)

Query languages: XPATH, XQuery, SQLX

C: - Mapping XML data to databases

- Native XML Data management

what is xml
What is XML?
  • Acronym for eXtensible Markup Language
  • Syntax for structuring data and documents in human-readable form
  • THE "Syntax of the WEB"
  • Meta language for defining languages
  • Bases of many extensions
    • Namespaces
    • Stylesheets
    • Hyperlinks
    • Schemata
  • Standardized by W3Chttp://www.w3.org/TR/REC-xml

HS / DBSII-03-XML-1

what xml is not
What XML is Not..
  • No protocol
    • Language for describing data
    • Used as data format in protocols
    • Protocols may be syntactically defined by XML
  • No programming languagebut
    • XML documents may contain code fragments
    • New languages allow for XML – code as part of the language (Xen, a MS extension of C# )
    • Some XML extensions with superimposed PL semantics,rule semantics in XSLT
  • No magic semantics
    • Interpretation by humans, applications, standards derived from XML

HS / DBSII-03-XML-1

why xml
Why XML?
  • … not a question any more, since widely adopted
  • Simple
  • Extensible
  • Easy to process
  • Easy to generate
  • Data interchange critical for networked applications

"XML will be the ASCII of the Web: basic, essential, unexciting"

Tim Bray

... it is already

HS / DBSII-03-XML-1

xml example

Prologue

<?xml version="1.0"?>

<PURCHASE_ORDER>

<PO_NUM> PO-1234 </PO_NUM>

<CUST_ID> CUST001 </CUST_ID>

<ITEM ItemNum ="2">

< QUNTY > 2 </ QUNTY >

<PRICE> 14.53 </PRICE>

</ITEM>

</PURCHASE_ORDER>

Attribute

Elements

XML example
  • Pre-XML representation of data:
  • XML representation of the same data:

“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”

HS / DBSII-03-XML-1

xml example6

{ItemNum=X9876 }

ITEM

PRICE

2

14.53

XML example
  • Graphical representation

PURCHASE_ORDER

PO_NUM

Cust:_ID

PO-1234

CUST001

QUNTY

XML documents

- tree structured

- Data an metadata in the same document (as opposed to RDBS)

HS / DBSII-03-XML-1

xml usage
XML Usage
  • Two basic types of XML usage

Document centric (document oriented)

    • structuring a digital document, including logical layout
    • primary focus of SGML - predecessor of XML
  • Data centric
    • Description of data in a self describing form for later processing
  • Distinction not totally clear
    • See purchase order example: If typical document characteristic included (company addr.,customer addr, date, …, company logo) it would be a document oriented usage of XML

HS / DBSII-03-XML-1

document centric xml documents example
Document centric XML documents: example

<Product>

<Name>Variabler Maulschlüssel</Name>

<Developer> Full Fabrication Labs, Inc. </Developer>

<Summary> Großer, verstellbarer Schraubenschlüssel</Summary>

<Description>

<Para>Der Engländer besteht aus erstklassigem Stahl und besitzt einen gummierten Handgriff. Die Maulgröße liegt zwischen 0 und 32 mm. </Para>

<Para>Sie können..... </Para>

<List>

<Item> <Link URL="Order.html"> Bestellen </Link></Item>

<Item> <Link URL="Wrenches.htm"> Andere Werkzeuge ansehen</Link> </Item>

<Item> <Link URL="catalog.zip"> Den Katalog herunterladen</Link> </Item>

</List>

<Para>Der Schraubenschlüssel kostet 15.33 Euro inkl. MWSt. Wenn Sie jetzt bestellen, erhalten Sie zusätzlich unsere wertlose Hobbybastler-Fibel.</Para>

</Description>

</Product>

Typical:Long text elements

HS / DBSII-03-XML-1

data centric xml documents example
Data centric XML documents: example

<Orders>

<SalesOrder SONumber="12345">

<Customer CustNumber="543">

<CustName> ABC Industries</CustName>

<Street> 123 Main St.</Street>

<City>Chicago</City>

....

</Customer>

<Line LineNumber="1">

<Part PartNumber="123">

<Description>

<p><b> Turkey wrench:</b><br />

Stainless steel, one-piece construction,

lifetime guarantee.</p>

</Description>

<Price>9.95</Price>

</Part>

<Quantity>10</Quantity>

</Line> .......

</SalesOrder> </Orders>

HS / DBSII-03-XML-1

xml syntax
XML Syntax
  • One, and only one, root element
  • Sub-elements must be properly nested
    • A tag must end within the tag in which it was started
  • Attributes are optional
  • Attribute values must be enclosed in “” or ‘’
    • No data type but \'string\'
  • Processing instructions optional
  • XML is case-sensitive
    • <tag> and <TAG> are not the same type of element

HS / DBSII-03-XML-1

why hierarchical data model
Why hierarchical "data model"?
  • Hierachies (nesting) in data bases? Why not?
    • REDUNDANCY!

Multiple items, customers, … occur multiple times in different orders

Normalization replaces redundancies by foreign keys

OO / OR – Data bases??

  • Nesting useful in data transfer
    • External application does not have access to foreign key / to database.

HS / DBSII-03-XML-1

xml attributes vs elements
XML Attributes vs Elements
  • Distinction between subelement and attribute
    • In the context of documents:
      • attributes are part of markup
      • subelement contents part of the basic document contents
    • In the context of data representation: difference not clear, but confusing
      • Same information can be represented in two ways
        • <account account-number = “A-101”>

….

</account>

        • <account> <account-number> A-101 </account-number>

… </account>

    • Suggestion: use attributes for identifiers of elements use subelements for contents

HS / DBSII-03-XML-1

how to use xml data

DBMS

DBMS

How to use XML data?
  • Basic Idea

Applicationwith

XML-Generator

DOM

SAX

Receiving application

XML-Parser

Standard-

Interfaces

How does application know about

- syntactical correctness

- data semantics ?

HS / DBSII-03-XML-1

slide14

Different encodings

  • specified by encoding attribute

Correct or not correct ?

HS / DBSII-03-XML-1

correctness of xml documents
Correctness of XML documents
  • Syntactic correctness
    • Conformance to XML syntax
    • Document structured according to XML syntax is well-formed
    • Compare Syntax checker for program
  • Semantic correctness
    • Given Meta level description of XML documents:Document Type Definition (DTD) or XML Schema
    • Document is valid with respect to DTD (Schema) if all definitions and restrictions have been fulfilled
    • No DTD allowed, applications must know, what is meant
  • What is semantics??
    • Interpretation of tags is a matter of humans and/or the application program: <xyz> could mean "book title" or "first name" or…

HS / DBSII-03-XML-1

xml namespaces

xmlns: bk = “http://www.example.com/bookinfo/”

Namespace declaration

Prefix

URI (URL)

XML Namespaces
  • Part of XML’s extensibility
  • Allow autonomous users to differentiate between tags of the same name (using a prefix)
    • Frees author to focus on the data and decide how to best describe it
    • Allows multiple XML documents from multiple authors to be merged

HS / DBSII-03-XML-1

namespace
Namespace
  • Examples
  • No prefix: all elements belong to same namespace

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”>

<bk:TITLE>All About XML</bk:TITLE>

<bk:AUTHOR>Joe Developer</bk:AUTHOR>

<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”>

<TITLE>All About XML</TITLE>

<AUTHOR>Joe Developer</AUTHOR>

HS / DBSII-03-XML-1

dtd and xml schema
DTD and XML schema
  • Type of XML document defined as
    • DTD - not expressible in XML syntax
    • XML schema
  • Document Type Definition (DTD)
    • Does not constrain types: all values are strings in XML
    • Syntax

<!ELEMENT elem (subelement-spec)>

<!ATTLIST elem (attribute-specs) >

HS / DBSII-03-XML-1

dtd elements and attributes
DTD: elements and attributes
  • Example (element decl)

<!ELEMENT depositor (customer-name account-number)>

<!ELEMENT customer-name (#PCDATA) >

<!ELEMENT account-number (#PCDATA)>

  • Subelements
    • names of elements
    • #PCDATA (parsed character data), i.e., character strings
    • EMPTY (no subelements) or ANY (anything can be a subelement)
  • Subelement specification may have regular expressions

<!ELEMENT bank ( ( account | customer | depositor)+)>

      • Notation:
        • “|” : alternatives
        • “+” : 1 or more occurrences  "?" 0 or one
        • “*” : 0 or more occurrences

HS / DBSII-03-XML-1

dtd example
DTD example

<!DOCTYPE bank [

<!ELEMENT bank ( ( account | customer | depositor)+)>

<!ELEMENT account (account-number branch-name balance)>

<!ELEMENT customer (customer-name customer-street customer-city)>

<!ELEMENT depositor (customer-name account-number)>

<!ELEMENT account-number (#PCDATA)>

<!ELEMENT branch-name (#PCDATA)>

<!ELEMENT balance (#PCDATA)>

<!ELEMENT customer-name (#PCDATA)>

<!ELEMENT customer-street (#PCDATA)>

<!ELEMENT customer-city (#PCDATA)>

]>

HS / DBSII-03-XML-1

dtd attributes
DTD attributes
  • Attribute specification : for each attribute
    • Name
    • Type of attribute
      • CDATA
      • ID (identifier) or IDREF (ID reference) or IDREFS
        • more on this later
    • Whether
      • mandatory (#REQUIRED) has a default value (value),
      • or neither (#IMPLIED)
  • Examples
    • <!ATTLIST account acct-type CDATA “checking”>
    • <!ATTLIST customer

customer-id ID # REQUIRED

accounts IDREFS # REQUIRED>

HS / DBSII-03-XML-1

dtd attribute id
DTD attribute ID
  • At most one attribute of type ID per element
  • ID attribute value of each element in an XML document must be distinct
    • ID attribute value is object identifier
  • attribute of type IDREF must contain the ID value of an element in the same document
  • attribute of type IDREFS contains a set of (0 or more) ID values. ID value must contain the ID value of an element in the same document
  • ID, IDREF, IDREFS do not designate a particular domain (no type!)

HS / DBSII-03-XML-1

dtd declaration
DTD declaration

External DTD-declaration<?xml version="1.0"><!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd"><bank> ... </bank>

Internal DTD-declaration<!DOCTYPE custDesc [ <!ELEMENT custDesc (#PCDATA)> ]><custDesc> consumer rights protagonist </custDesc>

Mixed usage<!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd" [ <!ATTLIST bankDescr CDATA #REQUIRED>]><bank Descr=" mostly private customers and ATM"> ... </bank>

HS / DBSII-03-XML-1

dtd limits
DTD limits
  • No typing of text elements and attributes
    • All values are strings, no integers, reals, etc.
  • Difficult to specify unordered sets of subelements
    • Order is usually irrelevant in databases
    • (A | B)* allows specification of an unordered set, but
      • Cannot ensure that each of A and B occurs only once
      • How to express: a, b and c in arbitrary order? <!ELEMENT a ((b,c,d) | (c,b,d) | (b,d,c), ...)>
  • IDs and IDREFs are untyped
    • The owners attribute of an account may contain a reference to another account, which is meaningless
      • owners attribute should ideally be constrained to refer to customer elements

HS / DBSII-03-XML-1

xml schema
XML Schema
  • XML Schema (XSD): much more expressible Schema language compared to DTD schemas
    • Typing of values
      • E.g. integer, string, etc
      • constraints on min/max values
    • User defined types
    • specified in XML syntax, unlike DTDs
      • More standard representation, but verbose
    • namespace support
    • Many more features
      • List types, uniqueness and foreign key constraints, inheritance Ability to map to RDB,…
  • significantly more complicated than DTD syntax
  • Use of XSD recommended

HS / DBSII-03-XML-1

slide26

<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xsd:element name=“bank” type=“BankType”/>

<xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>

</xsd:element>

…..definitions of customer and depositor ….

<xsd:complexTypename=“BankType”><xsd:squence>

<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xsd:sequence>

</xsd:complexType>

</xsd:schema>

XSD example

(from Silverschatz)

using xml
Using XML
  • Data exchange 
  • Data management:
    • Store, retrieve, query large document sets efficiently
      • Today\'s solutions:
        • Mapping to RDB / ORDB / OODB
        • "Native" XML data management (not necessarily very different from storing in conventional DB)
  • Standardized data description: different extensions and applications
    • Bioinformatic Sequence Markup Language (BSML)
    • MathML
    • Scalable Vector Graphics (SVG).. And many, many more
    • Ressource Description in the web (RDF) …

HS / DBSII-03-XML-1

using xml rdf with xml syntax

[email protected]

emailOf

Encoded in XML:

<?xml version="1.0"?>

<RDF

xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:s="http://description.org/schema/">

<Description about="http://www.me.de/~fritz">

<s:Creator>Fritz Müller</s:Creator>

</Description>

<Description [email protected]> <s:emailOf> Fritz Müller </s:emailOf> </Description>

</RDF>

Using XML: RDF with XML syntax

RDF-Modell

www.me.de/~fritz

Homepage

Fritz Müller

Creator

Many of these triples form a graph

HS / DBSII-03-XML-1

using xml29

XML-Doc.(Layout-transf.)

XML-Doc.(device spec. Layout)

Standard Software(HTML-Browser)

Standard-Software(XSL-Processor)

XML-Doc.(Daten)

Using XML
  • Layout of documents?
    • XML documents have logical structure
    • Layout structure needed for output
      • Use transformation language to describe device specific transformations

Transformation into all kinds of languages (HTML, pdf, …)

on all kinds of devices

HS / DBSII-03-XML-1

xml transformation
XML transformation
  • XSLT: The language used for converting XML documents into other forms
  • Describes how the document is transformed
  • Expressed as an XML document (.xsl)
  • Template rules
    • Patterns match nodes in source document
    • Templates instantiated to form part of result document
  • XPath for querying, sorting, etc.
  • XSL-FO language for describing layout

XSL = XSLT + XPATH + XSL-FO

HS / DBSII-03-XML-1

xml transformation example 1
XML transformation: example (1)
  • Document

<sales>

<summary>

<heading>Scootney Publishing</heading>

<subhead>Regional Sales Report</subhead>

<description>Sales Report</description>

</summary>

<data>

<region>

<name>West Coast</name>

<quarter number="1" books_sold="24000" />

<quarter number="2" books_sold="38600" />

<quarter number="3" books_sold="44030" />

<quarter number="4" books_sold="21000" />

</region>

...

</data>

</sales>

HS / DBSII-03-XML-1

xml transformation example 2
XML transformation: example (2)
  • XSL style sheet - mapping to HTML

<xsl:param name="low_sales" select="21000"/>

<BODY>

<h1><xsl:value-of select="//summary/heading"/> </h1>

...

<table><tr><th>Region\Quarter</th>

<xsl:for-each select="//data/region[1]/quarter">

<th>Q<xsl:value-of select="@number"/></th>

</xsl:for-each>

...

<xsl:for-each select="//data/region">

<tr><xsl:value-of select="name"/></th>

<xsl:for-each select="quarter">

<td><xsl:choose>

<xsl:when test="number(@books_sold &lt;= $low_sales)">

color:red;</xsl:when>

<xsl:otherwise>color:green;</xsl:otherwise></xsl:choose>

<xsl:value-of select="format-number (@books_sold,\'###,###\')" /> </td>

...

<td><xsl:value-of

select="format-number(sum(quarter/@books_sold), \'###,###\')"/>

XPath expression

XPath: query language

on doc trees

HS / DBSII-03-XML-1

xml transformation example 233
XML transformation: example (2)
  • The result

HS / DBSII-03-XML-1

ad