Ii xml data management l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

II. XML Data Management PowerPoint PPT Presentation


II. XML Data Management. A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C:- Mapping XML data to databases - Native XML Data management. What is XML?.

Related searches for II. XML Data Management

Download Presentation

II. XML Data Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ii xml data management l.jpg

II. XML Data Management

A : XML refresher

using material from A. Silverschatz and M. Sapossnek

B: - XML-Data Management (1)

Query languages: XPATH, XQuery, SQLX

C:- Mapping XML data to databases

- Native XML Data management


What is xml l.jpg

What is XML?

  • Acronym for eXtensible Markup Language

  • Syntax for structuring data and documents in human-readable form

  • THE "Syntax of the WEB"

  • Meta language for defining languages

  • Bases of many extensions

    • Namespaces

    • Stylesheets

    • Hyperlinks

    • Schemata

  • Standardized by W3Chttp://www.w3.org/TR/REC-xml

HS / DBSII-03-XML-1


What xml is not l.jpg

What XML is Not..

  • No protocol

    • Language for describing data

    • Used as data format in protocols

    • Protocols may be syntactically defined by XML

  • No programming languagebut

    • XML documents may contain code fragments

    • New languages allow for XML – code as part of the language (Xen, a MS extension of C# )

    • Some XML extensions with superimposed PL semantics,rule semantics in XSLT

  • No magic semantics

    • Interpretation by humans, applications, standards derived from XML

HS / DBSII-03-XML-1


Why xml l.jpg

Why XML?

  • … not a question any more, since widely adopted

  • Simple

  • Extensible

  • Easy to process

  • Easy to generate

  • Data interchange critical for networked applications

"XML will be the ASCII of the Web: basic, essential, unexciting"

Tim Bray

... it is already

HS / DBSII-03-XML-1


Xml example l.jpg

Prologue

<?xml version="1.0"?>

<PURCHASE_ORDER>

<PO_NUM> PO-1234 </PO_NUM>

<CUST_ID> CUST001 </CUST_ID>

<ITEM ItemNum ="2">

< QUNTY > 2 </ QUNTY >

<PRICE> 14.53 </PRICE>

</ITEM>

</PURCHASE_ORDER>

Attribute

Elements

XML example

  • Pre-XML representation of data:

  • XML representation of the same data:

“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”

HS / DBSII-03-XML-1


Xml example6 l.jpg

{ItemNum=X9876 }

ITEM

PRICE

2

14.53

XML example

  • Graphical representation

PURCHASE_ORDER

PO_NUM

Cust:_ID

PO-1234

CUST001

QUNTY

XML documents

- tree structured

- Data an metadata in the same document (as opposed to RDBS)

HS / DBSII-03-XML-1


Xml usage l.jpg

XML Usage

  • Two basic types of XML usage

    Document centric (document oriented)

    • structuring a digital document, including logical layout

    • primary focus of SGML -predecessor of XML

  • Data centric

    • Description of data in a self describing form for later processing

  • Distinction not totally clear

    • See purchase order example: If typical document characteristic included (company addr.,customer addr, date, …, company logo) it would be a document oriented usage of XML

HS / DBSII-03-XML-1


Document centric xml documents example l.jpg

Document centric XML documents: example

<Product>

<Name>Variabler Maulschlüssel</Name>

<Developer> Full Fabrication Labs, Inc. </Developer>

<Summary> Großer, verstellbarer Schraubenschlüssel</Summary>

<Description>

<Para>Der Engländer besteht aus erstklassigem Stahl und besitzt einen gummierten Handgriff. Die Maulgröße liegt zwischen 0 und 32 mm. </Para>

<Para>Sie können..... </Para>

<List>

<Item> <Link URL="Order.html"> Bestellen </Link></Item>

<Item> <Link URL="Wrenches.htm"> Andere Werkzeuge ansehen</Link> </Item>

<Item> <Link URL="catalog.zip"> Den Katalog herunterladen</Link> </Item>

</List>

<Para>Der Schraubenschlüssel kostet 15.33 Euro inkl. MWSt. Wenn Sie jetzt bestellen, erhalten Sie zusätzlich unsere wertlose Hobbybastler-Fibel.</Para>

</Description>

</Product>

Typical:Long text elements

HS / DBSII-03-XML-1


Data centric xml documents example l.jpg

Data centric XML documents: example

<Orders>

<SalesOrder SONumber="12345">

<Customer CustNumber="543">

<CustName> ABC Industries</CustName>

<Street> 123 Main St.</Street>

<City>Chicago</City>

....

</Customer>

<Line LineNumber="1">

<Part PartNumber="123">

<Description>

<p><b> Turkey wrench:</b><br />

Stainless steel, one-piece construction,

lifetime guarantee.</p>

</Description>

<Price>9.95</Price>

</Part>

<Quantity>10</Quantity>

</Line> .......

</SalesOrder> </Orders>

HS / DBSII-03-XML-1


Xml syntax l.jpg

XML Syntax

  • One, and only one, root element

  • Sub-elements must be properly nested

    • A tag must end within the tag in which it was started

  • Attributes are optional

  • Attribute values must be enclosed in “” or ‘’

    • No data type but 'string'

  • Processing instructions optional

  • XML is case-sensitive

    • <tag> and <TAG> are not the same type of element

HS / DBSII-03-XML-1


Why hierarchical data model l.jpg

Why hierarchical "data model"?

  • Hierachies (nesting) in data bases? Why not?

    • REDUNDANCY!

      Multiple items, customers, … occur multiple times in different orders

      Normalization replaces redundancies by foreign keys

      OO / OR – Data bases??

  • Nesting useful in data transfer

    • External application does not have access to foreign key / to database.

HS / DBSII-03-XML-1


Xml attributes vs elements l.jpg

XML Attributes vs Elements

  • Distinction between subelement and attribute

    • In the context of documents:

      • attributes are part of markup

      • subelement contents part of the basic document contents

    • In the context of data representation: difference not clear, but confusing

      • Same information can be represented in two ways

        • <account account-number = “A-101”>

          ….

          </account>

        • <account> <account-number> A-101 </account-number>

          … </account>

    • Suggestion: use attributes for identifiers of elements use subelements for contents

HS / DBSII-03-XML-1


How to use xml data l.jpg

DBMS

DBMS

How to use XML data?

  • Basic Idea

Applicationwith

XML-Generator

DOM

SAX

Receiving application

XML-Parser

Standard-

Interfaces

How does application know about

- syntactical correctness

- data semantics ?

HS / DBSII-03-XML-1


Slide14 l.jpg

  • Different encodings

  • specified by encoding attribute

Correct or not correct ?

HS / DBSII-03-XML-1


Correctness of xml documents l.jpg

Correctness of XML documents

  • Syntactic correctness

    • Conformance to XML syntax

    • Document structured according to XML syntax is well-formed

    • Compare Syntax checker for program

  • Semantic correctness

    • Given Meta level description of XML documents:Document Type Definition (DTD) or XML Schema

    • Document is valid with respect to DTD (Schema) if all definitions and restrictions have been fulfilled

    • No DTD allowed, applications must know, what is meant

  • What is semantics??

    • Interpretation of tags is a matter of humans and/or the application program: <xyz> could mean "book title" or "first name" or…

HS / DBSII-03-XML-1


Xml namespaces l.jpg

xmlns: bk = “http://www.example.com/bookinfo/”

Namespace declaration

Prefix

URI (URL)

XML Namespaces

  • Part of XML’s extensibility

  • Allow autonomous users to differentiate between tags of the same name (using a prefix)

    • Frees author to focus on the data and decide how to best describe it

    • Allows multiple XML documents from multiple authors to be merged

HS / DBSII-03-XML-1


Namespace l.jpg

Namespace

  • Examples

  • No prefix: all elements belong to same namespace

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”>

<bk:TITLE>All About XML</bk:TITLE>

<bk:AUTHOR>Joe Developer</bk:AUTHOR>

<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”>

<TITLE>All About XML</TITLE>

<AUTHOR>Joe Developer</AUTHOR>

HS / DBSII-03-XML-1


Dtd and xml schema l.jpg

DTD and XML schema

  • Type of XML document defined as

    • DTD - not expressible in XML syntax

    • XML schema

  • Document Type Definition (DTD)

    • Does not constrain types: all values are strings in XML

    • Syntax

      <!ELEMENT elem (subelement-spec)>

      <!ATTLIST elem (attribute-specs) >

HS / DBSII-03-XML-1


Dtd elements and attributes l.jpg

DTD: elements and attributes

  • Example (element decl)

    <!ELEMENT depositor (customer-name account-number)>

    <!ELEMENT customer-name (#PCDATA) >

    <!ELEMENT account-number (#PCDATA)>

  • Subelements

    • names of elements

    • #PCDATA (parsed character data), i.e., character strings

    • EMPTY (no subelements) or ANY (anything can be a subelement)

  • Subelement specification may have regular expressions

    <!ELEMENT bank ( ( account | customer | depositor)+)>

    • Notation:

      • “|” : alternatives

      • “+” : 1 or more occurrences  "?" 0 or one

      • “*” : 0 or more occurrences

HS / DBSII-03-XML-1


Dtd example l.jpg

DTD example

<!DOCTYPE bank [

<!ELEMENT bank ( ( account | customer | depositor)+)>

<!ELEMENT account (account-number branch-name balance)>

<!ELEMENT customer (customer-name customer-street customer-city)>

<!ELEMENT depositor (customer-name account-number)>

<!ELEMENT account-number (#PCDATA)>

<!ELEMENT branch-name (#PCDATA)>

<!ELEMENT balance (#PCDATA)>

<!ELEMENT customer-name (#PCDATA)>

<!ELEMENT customer-street (#PCDATA)>

<!ELEMENT customer-city (#PCDATA)>

]>

HS / DBSII-03-XML-1


Dtd attributes l.jpg

DTD attributes

  • Attribute specification : for each attribute

    • Name

    • Type of attribute

      • CDATA

      • ID (identifier) or IDREF (ID reference) or IDREFS

        • more on this later

    • Whether

      • mandatory (#REQUIRED) has a default value (value),

      • or neither (#IMPLIED)

  • Examples

    • <!ATTLIST account acct-type CDATA “checking”>

    • <!ATTLIST customer

      customer-id ID # REQUIRED

      accounts IDREFS # REQUIRED>

HS / DBSII-03-XML-1


Dtd attribute id l.jpg

DTD attribute ID

  • At most one attribute of type ID per element

  • ID attribute value of each element in an XML document must be distinct

    • ID attribute value is object identifier

  • attribute of type IDREF must contain the ID value of an element in the same document

  • attribute of type IDREFS contains a set of (0 or more) ID values. ID value must contain the ID value of an element in the same document

  • ID, IDREF, IDREFS do not designate a particular domain (no type!)

HS / DBSII-03-XML-1


Dtd declaration l.jpg

DTD declaration

External DTD-declaration<?xml version="1.0"><!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd"><bank> ... </bank>

Internal DTD-declaration<!DOCTYPE custDesc [ <!ELEMENT custDesc (#PCDATA)> ]><custDesc> consumer rights protagonist </custDesc>

Mixed usage<!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd" [<!ATTLIST bankDescr CDATA #REQUIRED>]><bank Descr=" mostly private customers and ATM"> ... </bank>

HS / DBSII-03-XML-1


Dtd limits l.jpg

DTD limits

  • No typing of text elements and attributes

    • All values are strings, no integers, reals, etc.

  • Difficult to specify unordered sets of subelements

    • Order is usually irrelevant in databases

    • (A | B)* allows specification of an unordered set, but

      • Cannot ensure that each of A and B occurs only once

      • How to express: a, b and c in arbitrary order? <!ELEMENT a ((b,c,d) | (c,b,d) | (b,d,c), ...)>

  • IDs and IDREFs are untyped

    • The owners attribute of an account may contain a reference to another account, which is meaningless

      • owners attribute should ideally be constrained to refer to customer elements

HS / DBSII-03-XML-1


Xml schema l.jpg

XML Schema

  • XML Schema (XSD): much more expressible Schema language compared to DTD schemas

    • Typing of values

      • E.g. integer, string, etc

      • constraints on min/max values

    • User defined types

    • specified in XML syntax, unlike DTDs

      • More standard representation, but verbose

    • namespace support

    • Many more features

      • List types, uniqueness and foreign key constraints, inheritance Ability to map to RDB,…

  • significantly more complicated than DTD syntax

  • Use of XSD recommended

HS / DBSII-03-XML-1


Slide26 l.jpg

<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xsd:element name=“bank” type=“BankType”/>

<xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>

</xsd:element>

…..definitions of customer and depositor ….

<xsd:complexTypename=“BankType”><xsd:squence>

<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xsd:sequence>

</xsd:complexType>

</xsd:schema>

XSD example

(from Silverschatz)


Using xml l.jpg

Using XML

  • Data exchange 

  • Data management:

    • Store, retrieve, query large document sets efficiently

      • Today's solutions:

        • Mapping to RDB / ORDB / OODB

        • "Native" XML data management (not necessarily very different from storing in conventional DB)

  • Standardized data description: different extensions and applications

    • Bioinformatic Sequence Markup Language (BSML)

    • MathML

    • Scalable Vector Graphics (SVG).. And many, many more

    • Ressource Description in the web (RDF) …

HS / DBSII-03-XML-1


Using xml rdf with xml syntax l.jpg

[email protected]

emailOf

Encoded in XML:

<?xml version="1.0"?>

<RDF

xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:s="http://description.org/schema/">

<Description about="http://www.me.de/~fritz">

<s:Creator>Fritz Müller</s:Creator>

</Description>

<Description [email protected]> <s:emailOf> Fritz Müller </s:emailOf> </Description>

</RDF>

Using XML: RDF with XML syntax

RDF-Modell

www.me.de/~fritz

Homepage

Fritz Müller

Creator

Many of these triples form a graph

HS / DBSII-03-XML-1


Using xml29 l.jpg

XML-Doc.(Layout-transf.)

XML-Doc.(device spec. Layout)

Standard Software(HTML-Browser)

Standard-Software(XSL-Processor)

XML-Doc.(Daten)

Using XML

  • Layout of documents?

    • XML documents have logical structure

    • Layout structure needed for output

      • Use transformation language to describe device specific transformations

Transformation into all kinds of languages (HTML, pdf, …)

on all kinds of devices

HS / DBSII-03-XML-1


Xml transformation l.jpg

XML transformation

  • XSLT: The language used for converting XML documents into other forms

  • Describes how the document is transformed

  • Expressed as an XML document (.xsl)

  • Template rules

    • Patterns match nodes in source document

    • Templates instantiated to form part of result document

  • XPath for querying, sorting, etc.

  • XSL-FO language for describing layout

    XSL = XSLT + XPATH + XSL-FO

HS / DBSII-03-XML-1


Xml transformation example 1 l.jpg

XML transformation: example (1)

  • Document

<sales>

<summary>

<heading>Scootney Publishing</heading>

<subhead>Regional Sales Report</subhead>

<description>Sales Report</description>

</summary>

<data>

<region>

<name>West Coast</name>

<quarter number="1" books_sold="24000" />

<quarter number="2" books_sold="38600" />

<quarter number="3" books_sold="44030" />

<quarter number="4" books_sold="21000" />

</region>

...

</data>

</sales>

HS / DBSII-03-XML-1


Xml transformation example 2 l.jpg

XML transformation: example (2)

  • XSL style sheet - mapping to HTML

<xsl:param name="low_sales" select="21000"/>

<BODY>

<h1><xsl:value-of select="//summary/heading"/> </h1>

...

<table><tr><th>Region\Quarter</th>

<xsl:for-each select="//data/region[1]/quarter">

<th>Q<xsl:value-of select="@number"/></th>

</xsl:for-each>

...

<xsl:for-each select="//data/region">

<tr><xsl:value-of select="name"/></th>

<xsl:for-each select="quarter">

<td><xsl:choose>

<xsl:when test="number(@books_sold &lt;= $low_sales)">

color:red;</xsl:when>

<xsl:otherwise>color:green;</xsl:otherwise></xsl:choose>

<xsl:value-of select="format-number (@books_sold,'###,###')" /> </td>

...

<td><xsl:value-of

select="format-number(sum([email protected]_sold), '###,###')"/>

XPath expression

XPath: query language

on doc trees

HS / DBSII-03-XML-1


Xml transformation example 233 l.jpg

XML transformation: example (2)

  • The result

HS / DBSII-03-XML-1


  • Login