400 likes | 479 Views
Explore the evolution of data exchange through markup languages like XML and SGML, their advantages, adoption challenges, and the shift from HTML. Learn about XML rules, processing benefits, and schema design to enhance structured data management.
E N D
XML:Managing data exchange Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used. David Lehman, End of the Word, 1991.
Central problems of data management • Capture • Storage • Retrieval • Exchange
EDI • Electronic exchange of standard documents • In use for some 20 years • Standards • ANSI X.12 (US and Canada) • EDIFACT (International)
EDI: Advantages • Paper handling is reduced, saving time and money • Data can be exchanged in real time • There are fewer errors since data are keyed only once • Enhanced data sharing enables greater coordination of activities between business partners • Money flows are accelerated and payments received sooner
EDI: Adoption • Much information flow is still on paper • Electronic exchange is the exception rather than the rule • The Internet is a lower cost solution than EDI using dedicated networks • EDI designed in the 60s • Slow EDI message standardization process • Complex and expensive standards
SGML • Document management consumes • 15% of company revenue • 25% of labor costs • 10 - 60% of an office worker’s time • Standard generalized markup language (SGML) was designed to reduce the cost of document management
Markup language • Embedded information within text about the meaning of the text <cdliner>This uniquely creative collaboration between Miles Davis and Gil Evans has already resulted in two extraordinary albums—<cdtitle>Miles Ahead</cdtitle><cdid>CL 1041></cdid> and <cdtitle>Porgy and Bess</cdtitle><cdid>CL 1274</cdid>.</cdliner>
SGML • A vendor independent standard for publication of all media • Cross system • Portable • Defines the structure of a document • The parent of HTML and XML
SGML: Advantages • Re-use • Same advantage as with word processing • Flexibility • Generate output for multiple media • Revision • Version control
SGML code <chapter> <no>16</no> <title>XML: Managing Data Exchange</title> <section> <quote><emph type = "2">Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used.</emph></quote> … </section> … </chapter>
HTML code <html> <body> <h1><b>16</b></h1> <h1><b>XML: Managing Data Exchange</b></h1> <p> <i>Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used.</i> </p> </body> </html>
The problem with HTML • Presentation not meaning • Reader has to infer meaning • Machines are not very good at inferring meaning
XML • Extensible markup language • is SGML for electronic commerce • A meta-language • A language to generate languages • Will steadily replace HTML
Structured text User-definable structure Context-sensitive retrieval Greater hypertext linkage Formatted text Pre-defined format Limited retrieval Limited hypertext linking XML vs. HTML
XML rules • Elements must have both an opening and closing tag • Elements must follow a strict hierarchy with only one root element • Elements may not overlap other elements • Element names must obey XML naming conventions • XML is case sensitive
Processing shift • From server to browser • Browser can ‘read’ meaning of the data • Less data transmitted
Searching • Search engines look for appropriate tags in the XML code • Faster • More precise
Expected gains • Store once and format many times • Hardware and software independence • Capture once and exchange many times • Accelerated targeted searching • Less network congestion
XML language design • Designers must define • Allowable tags • Rules for nesting tags • Which tagged elements can be processed
XML Schema • The schema defines • The names and contents of all elements that are permissible in a certain document • The structure of the document • How often an element might appear • The order in which the elements must appear • The type of data the element contains
DOM • Document object model • The data model for an XML document • A tree (1:m)
Schema (cdlib.xsd) • XML declaration and root of all schema documents <?xml version="1.0" encoding="UTF-8"?> <xsd:schemaxmlns:xsd='http://www.w3.org/2001/XMLSchema'>
Schema (cdlib.xsd) • CD library definition <xsd:elementname="cdlibrary"> <xsd:complexType> <xsd:sequence> <xsd:elementname="cd"type="cdType” minOccurs="1”maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Schema (cdlib.xsd) • CD definition <xsd:complexType name="cdType"> <xsd:sequence> <xsd:element name="cdid"type="xsd:string"/> <xsd:element name="cdlabel"type="xsd:string"/> <xsd:element name="cdtitle"type="xsd:string"/> <xsd:element name="cdyear"type="xsd:integer"/> <xsd:element name="track"type="trackType"minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>
Schema (cdlib.xsd) • Track definition <xsd:complexTypename="trackType"> <xsd:sequence> <xsd:elementname="trknum"type="xsd:integer"/> <xsd:elementname="trktitle"type="xsd:string"/> <xsd:elementname="trklen"type="xsd:time"/> </xsd:sequence> </xsd:complexType>
Common datatypes • string • boolean • uriReference • decimal • float • integer • time • date
XML (cd.xml) <?xml version = "1.0” encoding=“UTF-8”?> <cdlibrary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cdlib.xsd"> <cd> <cdid>A2 1325</cdid> <cdlabel>Atlantic</cdlabel> <cdtitle>Pyramid</cdtitle> <cdyear>1960</cdyear> <track> <trknum>1</trknum> <trktitle>Vendome</trktitle> <trklen>2:30</trklen> </track> … </cd> </cdlibrary>
XSL • Extensible stylesheet language • Defines how an XML document is rendered • An XML file
XSL • Results of applying cd.xsl Pyramid, Atlantic, 1960 [A2 1325] 1 Vendome 00:02:30 2 Pyramid 00:10:46 Ella Fitzgerald, Verve, 2000 [D136705] 1 A tisket, a tasket 00:02:37 2 Vote for Mr. Rhythm 00:02:25 3 Betcha nickel 00:02:52
cd.xsl <?xml version="1.0" encoding="UTF-8”?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output encoding="UTF-8" indent="yes" method="html" version="1.0" /> <xsl:template match="/"> <html> <head> <title>Complete List of Songs</title> </head> <body> <h2>Complete List of Songs</h2> <xsl:apply-templates select= "cdlibrary" /> <xsl:for-each select="cd"> <font color="maroon"> <xsl:value-of select="cdtitle" /> , <xsl:value-of select="cdlabel" /> , <xsl:value-of select="cdyear" /> [ <xsl:value-of select="cdid" />] </font> <br />
cd.xsl (continued) <table> <xsl:for-each select= "track"> <tr> <td align="left"> <xsl:value-of select= "trknum" /> </td> <td> <xsl:value-of select= "trktitle" /> </td> <td align= "center" > <xsl:value-of select= "trklen" /> </td> </tr> </xsl:for-each> </table> <br /> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet>
Converting XML • Transformation and manipulation • XSLT • One XML vocabulary to another • FPML to finML • Re-ordering, filtering, and sorting • Rendering • XSLT • e.g., XML to WAP
XML and databases • XML is a data management tool • XML documents will have to be stored for the long-term • Need a DBMS
DBMS requirements • Store a large number of documents; • Store large documents • Support access to portions of a document (e.g., the data for a single CD in a library of 20,000 CDs) • Concurrent access • Version control • Integrate data from other sources
RDBMS • Document-centric • Store as CLOB • Data-centric • Object-relational extensions to support element retrieval and update • Expect RDBMS vendors to offer extensions to support XML
Database to XML • A significant proportion of Web pages are generated from databases • Instead of converting to HTML these should be converted to XML • Render with XSL • Need tools for converting relational data to XML
OODBMS • A good fit to the DOM • Little development at this stage • Vendors have fewer resources
XML database • Special purpose XML database • Tamino • This is a new area and you will need to monitor developments • http://www.rpbourret.com/
Conclusion • XML is a significant technological development • Its main purpose is to support data exchange • It will lower the cost of business transactions • It will be a critical data management technology