1 / 95

XML: The Big Picture and Some Gory Details A brief tutorial with an eye towards e-records and archival

XML Tutorial, Bertram Lud

rad
Download Presentation

XML: The Big Picture and Some Gory Details A brief tutorial with an eye towards e-records and archival

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 1 XML: The Big Picture and Some Gory Details (A brief tutorial with an eye towards e-records and archival) Bertram Ludaescher ludaesch@sdsc.edu Data Intensive Computing Environments (DICE) Group San Diego Supercomputer Center, UCSD

    2. XML Tutorial, Bertram Ludäscher 2 DICE Members Staff Reagan Moore Chaitan Baru Amarnath Gupta Bertram Ludäscher Richard Marciano Arcot Rajasekar Wayne Schroeder Michael Wan Ilya Zaslavsky Bing Zhu + NN * 4 Students Pratik Mukhopadhyay Azra Mulic Kevin Munroe Paul Nguyen Michail Petropolis Nicholas Puz Pavel Velikhov +/- NN

    3. XML Tutorial, Bertram Ludäscher 3 Tutorial Outline Roadmap & Overview What about XML vs. E-records and Archives? (or: why it’s good to be here ;-) XML 101 XML 232 Querying & Transforming XML Mediation of Information using XML (MIX) Other Projects...

    4. XML Tutorial, Bertram Ludäscher 4 Some History (or: from fat via lean… SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange Metalanguage for defining languages (through DTDs) A famous SGML language: HTML!! Separation of content and display Used in U.S. gvt. & contractors, large manufacturing companies, technical info. Publishers,... SGML reference is 600 pages long XML (eXtensible Markup Language) W3C (World Wide Web Consortium) -- http://www.w3.org/XML/) recommendation in 1998 Simple subset (80/20 rule) of SGML: “ASCII of the Web”, “Semantic Web”. XML specification is 26 pages long

    5. XML Tutorial, Bertram Ludäscher 5 … to skinny and back! ) Canonical XML “normalization”, equivalence testing of XML documents SML (Simple Markup Language) “Reduce to the max”: No Attributes / No Processing Instructions (PI) / No DTD / No non-character entity-references / No CDATA marked sections / Support for only UTF-8 character encoding / No optional features XML Schema XML Schema definition language Back to complex: Part I (Structures), Part II (Data Types), Part III aehm 0 (Primer) X-Zoo (Xoo?), “Brave New X-World” Specifications CSS • Digital Signatures • ebxml Project Teams • ebXML • IETF Specifications • Internationalization • IOTP (Internet Open Trading Protocol) • OASIS • Requirements Documents • SMIL • SVG (Scalable Vector Graphics) • Topic Maps • W3C Activity Pages • W3C Notes • W3C Standards • W3C Standards-in-progress • WAP • WebDAV • XHTML • XLink • XPath • XSLT Vocabularies DTDs • Music • P3P • RDF • RSS • SMIL • W3C Standards • W3C Standards-in-progress • WML • XHTML • XSL FO's • XSLT • XUL Vertical Industries Advertising • Commerce • Consortiums • Construction • Food • Insurance • Legal • Medical • Music • OASIS • Real Estate • Science • Space Exploration • Telecommunications • Travel • Weather

    6. XML Tutorial, Bertram Ludäscher 6 … but … FEAR NOT!

    7. XML Tutorial, Bertram Ludäscher 7 Back to the Future (or Archival for the Past...) A time traveler sends a message in the virtual bottle, containing parts of the universal library of human and post-human mankind back into the last third of the 20th century... ... when the Web, XML, WAP, B2B, and Petabytes were unheard of ... RAM was so precious that it was ok to deal with nibbles ... MS-DOS was still called CP/M ... and in fact Bill hadn’t moved into the garage yet but worked on a homework assignment by Christos, trying to sort pancakes faster (Gates, W.H. and Papadimitriou, C. "Bounds for Sorting by Prefix Reversal." Discr. Math. 27, 47-57, 1979.) Task: make sense out of the futuristic message in the past!

    8. XML Tutorial, Bertram Ludäscher 8 Our past futurist’s (future archeologist’s?) supercomputer looked like this …

    9. XML Tutorial, Bertram Ludäscher 9 Message in the bottle: 1 ŠĻ^Qą”±^Zį^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@>^@^C^@ž’ ^@^F^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@#^@^@^@^@^@^@^@^@^P^@^@%^@^@^@^A^@^@^@ž’’’^@^@^@^@"^@^@^@’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’ģ„Į^@q^@^D^@^@^@^Ræ^@^@^@^@^@^@^P^@^@^@^@^@^D^@^@Ē^G^@^@^N^@bjbjt+t+^@^@^@ ^@Some Quotations from the Universal Library^M1 Famous Quotes^M1.1 By William I^M[2, Sonnet XVIII]^MShall I compare thee to a summer's day?^MThou art more lovely and more temperate.^MRough winds do shake the darling buds of May,^MAnd summer's lease hath all too short a date.^MSometime too hot the eye of heaven shines,^MAnd often is his gold complexion dimmed.^MAnd every fair from fair some declines,^MBy chance or nature's changing course untrimmed.^MBut thy eternal summer shall not fade,^MNor lose possession of that fair thou owest,^MNor shall Death brag thou wander'st in his shade^MWhile in eternal lines to time thou growest.^MSo long as men can breathe, or eyes can see,^MSo long live this, and this gives life to thee.^M1.2 By William II^M[1, p.265]^M\223The obvious mathematical breakthrough would be development of^Man easy way to factor large prime numbers."^MReferences^M[1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.^M[2] W. Shakespeare. The Sonnets of Shakespeare.609.^M^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ ’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’^A^@ž’^C^@^@’’’’^F^B^@^@^@^@^@Ą^@^@^@^@^@^@F^X^@^@^@Microsoft Word Document^@^@^@^@MSWordDoc^@^P^@^@^@Word.Document.8^@ō9²q^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^

    10. XML Tutorial, Bertram Ludäscher 10 Message in the bottle: 2 {\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose02020603050405020304}Times New Roman;}\ {\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}^M {\f17\froman\fcharset238\fprq2 Times New Roman CE;}{\f18\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f20\froman\fcharset161\fprq2 Times New R\ oman Greek;}{\f21\froman\fcharset162\fprq2 Times New Roman Tur;}^M … Some Quotations from the Universal Library^M \par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 1 Famous Quotes^M \par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.1 By William I^M \par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [2, Sonnet XVIII]^M \par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 Shall I compare thee to a summer's day?^M \par Thou art more lovely and more temperate.^M \par Rough winds do shake the darling buds of May,^M … \par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.2 By William II^M \par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [1, p.265]^M \par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 \ldblquote The obvious mathematical breakthrough would be development of^M \par an easy way to factor large prime numbers."^M \par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 References^M \par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 [1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.^M \par [2] W. Shakespeare. The Sonnets of Shakespeare. 1609.}{\fs28 ^M \par }}

    11. XML Tutorial, Bertram Ludäscher 11 Message in the bottle: 3 %!PS-Adobe-2.0 %%Creator: dvipsk 5.58f Copyright 1986, 1994 Radical Eye Software %%Title: msg.dvi %%Pages: 1 … /X{S N}B /TR{translate}N /isls false N /vsize 11 72 mul N /hsize 8.5 72 mul N /landplus90{false}def /@rigin{isls{[0 landplus90{1 -1}{-1 1} ifelse 0 0 0]concat}if 72 Resolution div 72 VResolution div neg scale … TeXDict begin 39158280 55380996 1000 600 600 (msg.dvi) @start /Fa 16 117 df<0000000001C0000000000003C0000000000003C00000000000 07C000000000000FC000000000000FC000000000001FC000000000001FE000000000003F E000000000003FE000000000007FE00000000000FFE00000000000EFE00000000001EFE0 0000000001CFE000000000038FE000000000038FE000000000070FE000000000070FE0 … %%EndSetup 1 0 bop 659 872 a Ff(Some)44 b(Quotations)f(from)f(the)i(Univ)l(ersal)h (Library)515 1470 y Fe(1)134 b(F)-11 b(amous)45 b(Quotes)515 1669 y Fd(1.1)112 b(By)37 b(William)d(I)515 1822 y Fc([2)o(,)d(Sonnet)h (XVI)s(I)s(I])722 2004 y Fb(Shall)c(I)g(compare)e(thee)i(to)f(a)g (summer's)g(da)n(y?)722 2104 y(Thou)h(art)f(more)f(lo)n(v)n(ely)h(and)g (more)g(temp)r(erate.)722 2204 y(Rough)g(winds)h(do)f(shak)n(e)g(the)h (darling)e(buds)i(of)g(Ma)n(y)-7 b(,)722 2303 y(And)28 b(summer's)g(lease)e(hath)i(all)f(to)r(o)h(short)f(a)g(date.)722 2403 y(Sometime)h(to)r(o)f(hot)h(the)g(ey)n(e)f(of)h(hea)n(v)n(en)e (shines,)722 2503 y(And)i(often)g(is)g(his)f(gold)g(complexion)g

    12. XML Tutorial, Bertram Ludäscher 12 Message in the bottle: 4 \documentclass{article} \begin{document} \title{Some Quotations from the Universal Library} ... \section{Famous Quotes} \subsection{By William I} \textbf{\cite[Sonnet XVIII]{shakespeare-sonnets-1609}} \begin{verse} Shall I compare thee to a summer's day?\\ Thou art more lovely and more temperate. \\ Rough winds do shake the darling buds of May, \\ And summer's lease hath all too short a date. \\ Sometime too hot the eye of heaven shines, \\ And often is his gold complexion dimmed. \\ … \qquad So long as men can breathe, or eyes can see,\\ \qquad So long live this, and this gives life to thee. \\ \end{verse} ... \bibliographystyle{abbrv} \bibliography{msg} \end{document}

    13. XML Tutorial, Bertram Ludäscher 13 Message in the bottle: 5 <HTML> <HEAD> <TITLE>Some Quotations from the Universal Library</TITLE> </HEAD> <BODY> <B><FONT FACE="Arial" SIZE=5><P>Some Quotations from the Universal Library</P> </FONT><I><FONT FACE="Arial"><P>1 Famous Quotes</P> </B></I><P>1.1 By William I</P> <B><P>[2, Sonnet XVIII]</P></B> <P>Shall I compare thee to a summer's day?</P> <P>Thou art more lovely and more temperate.</P> <P>Rough winds do shake the darling buds of May,</P> <P>And summer's lease hath all too short a date.</P> <P>Sometime too hot the eye of heaven shines,</P> <P>And often is his gold complexion dimmed.</P> ... <P>So long as men can breathe, or eyes can see,</P> <P>So long live this, and this gives life to thee.</P> <P>1.2 By William II</P> <B><P>[1, p.265]</P> </B><P>"The obvious mathematical breakthrough would be development of</P> <P>an easy way to factor large prime numbers."</P> <B><I><P>References</P> </B></I><P>[1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.</P> <P>[2] W. Shakespeare. The Sonnets of Shakespeare. 1609.</P></FONT></BODY> </HTML>

    14. XML Tutorial, Bertram Ludäscher 14 Message in the bottle: 6 <?xml version="1.0"?> <universal_library> <books> <book> <title>Some Quotations from the Universal Library</title> <section> <title>Famous Quotes</title> <subsection> <title>By William I</title> <quote bibref="shakespeare-sonnets-1609"> <title>Sonnet XVIII</title> <verse> <line>Shall I compare thee to a summer's day?</line> <line>Thou art more lovely and more temperate. </line> <line>Rough winds do shake the darling buds of May, </line> </verse> … <subsection> <title>By William II</title> <quote bibref="gates-road-ahead-1995"> <title>Page 265</title> <line>``The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers.’’</line> </quote> </subsection> </section> </book> … </books> </universal_library>

    15. XML Tutorial, Bertram Ludäscher 15 XML as a Self-Describing Format can be “understood” using any (archaic CP/M) editor can be parsed easily contains its own structure (=parse tree) in the data => allows the e-archeologist to rediscover schema and content (=semantics!?) may also include an explicit schema description (DTD) => “meta-model”: definition of a language w.r.t. which it is valid allows separation of marked-up content from presentation (=>style sheets) as a self-describing format good for “archival into the past” => not bad for archival into the future

    16. XML Tutorial, Bertram Ludäscher 16 Some thoughts on how XML can help with e-record management... Assumption: represent e-records in XML => self-describing format (good for archival) => get a semistructured data model (flexible: encode regular tables, nested structures, objects, or even (cleaned up) HTML) => many tools (and many more to come -- (re)use code): parsers, validators, query languages, storage => standards (good for interoperation, integration, etc): generic standards (XML, DTDs, XML Schema, XPath,...) community/industry standards (=specific markup languages)

    17. XML Tutorial, Bertram Ludäscher 17 ...thoughts continued “E-Record Quality Assurance”: by “subscribing” to a certain XML DTD/XML Schema/XML ???, you can make sure that “the same language is spoken” validation using DTDs provides a first simple quality control: are the right tags used? is the nesting of elements ok w.r.t. the DTD? is the order and multiplicity of element ok? if you need more => use validation w.r.t. an XML Schema now: check also data types use specialization and other mechanisms from object-oriented modeling more integrity checking possible (cardinalities,…) still want more integrity checks (ICs) or even “policies”? => use a declarative rule language for specifying the constraints and policies at design time. Implement them at run time, e.g., by adding the ICs to the XML DTD/Schema/… => checking ICs and policies is similar to issuing specific queries against the data => use query processors (relational DBs, XML DBs, XML tools) for integrity checking when possible => for evolution of records, look at versioning models for data bases and temporal database models and query languages

    18. XML Tutorial, Bertram Ludäscher 18 Back to XML: Different Perspectives Document (SGML) Community data = linear text documents mark up (annotate) text pieces to describe context, structure, semantics of the marked text Database Community XML as a (most prominent) example of the semistructured data model => captures the whole spectrum from highly structured, regular data to unstructured data

    19. XML Tutorial, Bertram Ludäscher 19 More Perspectives on XML "XML is the cure for your data exchange, information integration, e-commerce, [x-2-y, U name it] problems” (“snake oil/silver bullet theory”) "XML is nothing but (another) syntax (for Lisp, trees,…)” (“nothing new under the sun”) (books (book (author “Shakespeare” ) (title “Sonnets”) (verse (line “Shall I compare…” ) (line …) …)))

    20. XML Tutorial, Bertram Ludäscher 20 So what is XML (all about)? Executive Summary: XML = HTML – idiosyncrasies (simplified syntax) + user-definable ("semantic") tags Separation of data and its presentation => simple, very flexible data exchange format: semistructured data model => new applications: Information exchange (B2B), sharing (diglib), integration ("mediation"), archival, ... Web site mangement (XML+XSL stylesheets), ...

    21. XML Tutorial, Bertram Ludäscher 21 Many X-cellent(?) Acronyms... XML (Extensible Markup Language) XML Namespaces XML DTDs, XML Schema RDF (Resource Description Framework) XSL (Extensible Style Sheet Language) XPath (=XSLT? XPointer), XLink XQL, XML-QL (XML Query Language) XMAS (XML Matching And Structuring language) eXcelon, ... => XML++ (i.e. += X-tensions) >> just syntax => a family of technologies (XML extensions, tools, ... ) => generic standards and industry/community standards

    22. XML Tutorial, Bertram Ludäscher 22 XML Applications & Industry Initiatives http://www.oasis-open.org/cover/xml.html#applications Advertising: adXML place an ad onto an ad network or to a single vendor Literature: Gutenberg convert the world’s great literature into XML Directories: dirXML Novell’s Directory Services Markup Language (DSML) Web Servers: apacheXML parsers, XSL, web publishing Travel: openTravel information for airlines, hotels, and car rental places News: NewsML creation, transfer and delivery of news Human Resources: XML-HR standardization of HR/electronic recruiting XML definitions International Dvt: IDML improve the mgt. and exchange of info. for sustainable development Voice: VoxML markup language for voice applications Wireless: WAP (Wireless Application Protocol) wireless devices on the World Wide Web Weather: OMF Weather Observation Markup Format (simulation) Geospatial: ANZMETA distributed national directory for land information Banking: MBA Mortgage Bankers Association of America --> credit report, loan file, underwriting… Healthcare: HL7 DTDs for prescriptions, policies & procedures, clinical trials Math: MathML (Mathematical Markup Language) Surveys: DDI (Data Documentation Initiative) “codebooks” in the social and behavioral sciences

    23. XML Tutorial, Bertram Ludäscher 23 XML E-commerce Initiatives CommerceNet eCo Framework XML specs. to support interoperability among e-businesses Commerce One Common Business Library (CBL): set of business components, docs. In DTD, XDR, SOX BizTalk Microsoft spec. based on XML schemas cXML (Commerce XML) -- tag-sets for e-procurement into BizTalk Electronic Data Interchange (EDI) RosettaNet Common format for online ordering FpML (Financial products Markup Language): sharing of financial data (interest rate & foreign exchange products) Open Buying on the Internet (OBI) OBI high volume b2b purchasing transactions over the Internet (Office Depot, Lockheed, barnesandnoble, AX... E-commerce and XML VISA Invoices The Visa Extensible Markup Language (XML) Invoice Specification provides a comprehensive list of data elements contained in most invoices, including: Buyer/Supplier, Shipping, Tax, Payment, Currency, Discount, and Line Item Detail. B2B Integration code360 XML-Broker is middleware software that manages XML based transactions Bluestone XML Suite Enables to develop and deploy e-commerce, electronic data interchange, application integration and supply chain management applications. Bluestone XML Suite products include: XML-Server, Visual-XML, XML-Contact and XwingML. webMethods Provides companies with integrated direct links to buyers and suppliers

    24. XML Tutorial, Bertram Ludäscher 24 What’s Wrong with HTML?

    25. XML Tutorial, Bertram Ludäscher 25 ...What’s Wrong with HTML...

    26. XML Tutorial, Bertram Ludäscher 26 ... And Some Repercussions Lack of schema/semantics when querying the Web (HTML): "find documents (books, papers, ...) where author = Michael Jackson" (... and learn how software engineering meets the moon walker ...) "create a list of M. Jackson's books and (if available) their prices" => HTML is inappropriate for data exchange automation of information management (retrieval, manipulation, integration)

    27. XML Tutorial, Bertram Ludäscher 27 XML is Based on Markup

    28. XML Tutorial, Bertram Ludäscher 28 Elements and their Content

    29. XML Tutorial, Bertram Ludäscher 29 Element Attributes

    30. XML Tutorial, Bertram Ludäscher 30 XML = Labeled Ordered Trees

    31. XML Tutorial, Bertram Ludäscher 31 In Search of the Lost Structure & Semantics

    32. XML Tutorial, Bertram Ludäscher 32 Adding Structure and Semantics XML Document Type Definitions (DTDs): define the structure of "allowed" documents (i.e., valid wrt. a DTD) ? database schema => improve query formulation, execution, ... XML Schema defines structure and data types allows developers to build their own libraries of interchanged data types XML Namespaces identify your vocabulary

    33. XML Tutorial, Bertram Ludäscher 33 XML DTDs as Extended CFGs

    34. XML Tutorial, Bertram Ludäscher 34 Document Type Definitions (DTDs)

    35. XML Tutorial, Bertram Ludäscher 35 Element Declarations

    36. XML Tutorial, Bertram Ludäscher 36 Element Content Declarations

    37. XML Tutorial, Bertram Ludäscher 37 Attributes

    38. XML Tutorial, Bertram Ludäscher 38 Attribute Types

    39. XML Tutorial, Bertram Ludäscher 39 Uses of XML Entities Physical partition size, reuse, "modularity", … (both XML docs & DTDs) Non-XML data unparsed entities ? binary data Non-standard characters character entities Shorthand for phrases & markup

    40. XML Tutorial, Bertram Ludäscher 40 Entities & Physical Structure

    41. XML Tutorial, Bertram Ludäscher 41 External Text Entities

    42. XML Tutorial, Bertram Ludäscher 42 Types of Entities Internal (to a doc) vs. External (? use URI) General (in XML doc) vs. Parameter (in DTD) Parsed (XML) vs. Unparsed (non-XML)

    43. XML Tutorial, Bertram Ludäscher 43 Internal Text Entities

    44. XML Tutorial, Bertram Ludäscher 44 Unparsed (& "Binary") Entities

    45. XML Tutorial, Bertram Ludäscher 45 From Docs to Data: XML Schema XML DTDs (part of the XML spec.) flexible, semistructured data model (nesting, ANY, ?, *, |, ...) but document-oriented (SGML heritage) no support for namespaces, datatypes, inheritance (e.g., type of book.title may be different from poem.title) XML Schema (W3C working draft) schema definition language in XML data-oriented: data types extends capabilities of DTD

    46. XML Tutorial, Bertram Ludäscher 46 XML Schema: Example <type name="Order" > <element name="name" type="string" /> <element name="street" type="string" /> <element name="zip" type="integer" /> <...> <attribute name="orderDate" type="date" /> </type>

    47. XML Tutorial, Bertram Ludäscher 47 XML Schema: Example

    48. XML Tutorial, Bertram Ludäscher 48 W3C Work on XML Schemas Structures: Specify complex element structure and Set constraints on the permitted values of the content of those elements Datatypes: Sets forth a standard of content datatypes and Sets rules for generating new types from them

    49. XML Tutorial, Bertram Ludäscher 49 Further Approaches RELAX (REgular LAnguage description for XML) Standardized by INSTAC XML SWG of Japan. Compared with DTD, RELAX has new features: RELAX grammars are represented in the XML instance syntax RELAX borrows rich data types of XML Schema Part 2 RELAX is namespace-aware many others XML-Data, XML-DR, DCD, SOX, DDML, DSD, Schematron...

    50. XML Tutorial, Bertram Ludäscher 50 Normalized Data/Metadata Representation Resource Description Framework (RDF) Metadata model The designer can describe objects, add properties to define and describe them, and also make complicated statements about the objects (statements about relationships between resources). The specification comes in two sections: Model & Syntax (viewed as directed, labeled graphs) RDF Schemas (using an XML vocabulary)

    51. XML Tutorial, Bertram Ludäscher 51 Resource Description Framework (RDF) Metadata is useful for information retrieval (esp. if no other schema info or semantics is available) Idea: representation independent encoding of metadata as triples (Resource, PropertyType, Value): (uri1, DC:creator, uri2), (uri2, vCard:name, smith), ... "Semantic Net"

    52. XML Tutorial, Bertram Ludäscher 52 Identifying Vocabularies My element may not be your element: geometry context: <element>line</element> chemistry context: <element>oxygen</element> SGML/XML context: .... use XML namespaces to identify the vocabulary

    53. XML Tutorial, Bertram Ludäscher 53 XML Namespaces mechanism for globally unique tag names: <h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head><h:title>Book Review</h:title></h:head> ... <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> ... </h:html> mix of different tag vocabularies without confusion namespaces only identify the vocabulary; additional mechanisms required for structure and meaning of tags

    54. XML Tutorial, Bertram Ludäscher 54 Processing XML Non-validating parser: checks that XML doc is syntactically well-formed Validating parser: checks that XML doc is also valid w.r.t. a given DTD Parsing yields tree/object representation: Document Object Model (DOM) API Or a stream of events (open/close tag, data): Simple API for XML (SAX)

    55. XML Tutorial, Bertram Ludäscher 55 DOM Structure Model and API hierarchy of Node objects: document, element, attribute, text, comment, ... language independent programming DOM API: get... first/last child, prev/next sibling, childNodes insertBefore, replace getElementsByTagName ... alternative event-based SAX API (Simple API for XML) does not build a parse tree (reports events when encountering begin/end tags) for (partially) parsing large documents

    56. XML Tutorial, Bertram Ludäscher 56 DOM Summary Object-Oriented approach to traverse the XML node tree Automatic processing of XML docs Manipulation & Updating of XML on client & server Database interoperability mechanism Memory-intensive

    57. XML Tutorial, Bertram Ludäscher 57 SAX Event-Based API Pros: The whole file doesn’t need to be loaded into memory XML stream processing Simple and fast Allows you to ignore less interesting data Cons: limited expressive power (query/update) when working on streams => application needs to build (some) parse-tree when necessary

    58. XML Tutorial, Bertram Ludäscher 58 Querying XML What can be done to XML so far: generation: from HTML, DBs, manually, … parsing: with/without DTD (valid/well-formed XML) accessing: APIs for XML applications: DOM (in memory, tree-based), SAX (event-based) Now: Query languages for XML XML-QL, XMAS, XPath, XSL(T), XQL, ...

    59. XML Tutorial, Bertram Ludäscher 59 Querying XML Why not just query XML with SAX or DOM? SAX: very simple “event-based” queries: ok DOM: simple navigational queries (getChildNodes, getNextSibling, getElementsByTagName,…): ok But: these are “low-level” APIs ? iterator/cursor API for RDBs (but more powerful!) used to write XML applications “high-level” querying, restructuring and transformation (and updates??) is tedious => analogue to high-level relational query languages (SQL, QBE, Logic (Datalog), …) => Query languages for XML

    60. XML Tutorial, Bertram Ludäscher 60 Querying XML No "official" W3C XML QL yet (but bits and pieces) numerous quite different XML QLs are popping up some XML QL overviews, comparisons, and resources: XML Query Languages: Experiences and Exemplars (co-authored by several XML QL gurus) XML and Query Languages (Oasis Cover Pages) Comparative Analysis of Five XML Query Languages (A. Bonifati, S. Ceri) A Data Model and Algebra for XML Query (Philip Wadler et.al. “functional (Haskell) perspective”) XML-QL vs XSLT queries (Geert Jan Bex and Frank Neven; for (future) XSLT experts only ;-) Introduction to XMAS (the XML QL of the MIX project) children of the “(semistructured) database(s) crowd”: XML-QL, YaTL, Lorel, … … from the “functional crowd”: … from the “document processing folks”: XQL, XSL(T), XPath, ... XPath: W3C Recommendation Powerful pattern language for selecting parts of XML docs Used by XSL(T), XPointer, and XQL XQL based on XPath, Browser:IE5 XML DBs: Excelon, Tamino, Perl, …

    61. XML Tutorial, Bertram Ludäscher 61 Querying XML Different XML QL paradigms depending on the community: (relational, oo, semistructured) database perspective Lorel, YaTL, XML-QL, XMAS, FLORA/FLORID, ... document processing perspective XQL, XSL(T), XPath, ... functional programming perspective QLs with structural recursion, …

    62. XML Tutorial, Bertram Ludäscher 62 Important QL Features (DB Perspective) typical parts of a query: (match) pattern (selects parts of the source XML tree without looking at data) filter condition (selects further, now looking at the data) answer construction (putting the results together, possibly reordered, grouped, etc.) reordering based on nested queries, grouping, sorting, or Skolem functions tag variables, path expressions for defining the patterns without requiring knowledge of the DTD

    63. XML Tutorial, Bertram Ludäscher 63 Selection Queryies with XQL/XPath Find the root element (bookstore) of this document: /bookstore Find all author elements anywhere within the current document: //author Find all books where the value of the style attribute on the book is equal to the value of the specialty attribute of the bookstore element at the root of the document: //book[/bookstore/@specialty = @style]

    64. XML Tutorial, Bertram Ludäscher 64 Sample Queries with XQL/XPath Find the root element (bookstore) of this document: /bookstore Find all author elements anywhere within the current document: //author Find all books where the value of the style attribute on the book is equal to the value of the specialty attribute of the bookstore element at the root of the document: //book[/bookstore/@specialty = @style] Find all books with author/first-name equal to 'Bob' and all magazines with price less than 10: //(book[author/first-name = 'Bob'] $union$ magazine[price $lt$ 10])

    65. XML Tutorial, Bertram Ludäscher 65 Presenting XML: Extensible Stylesheet Language (XSL) Why Stylesheets? separation of content (XML) from presentation (XSL) Why not just CSS for XML? XSL is far more powerful: selecting elements transforming the XML tree content based display (result may depend on data) separation => XML for apps/the machine, XSL to produce human readable output selections: regular path expressions, select n-th child, string operations, transformations: filter, reorder, restructure the tree (=query capabilities), e.g. TOC, strip details, inline results of PIs, eg sorted table results of a query content based: if value < 0 then red else blackseparation => XML for apps/the machine, XSL to produce human readable output selections: regular path expressions, select n-th child, string operations, transformations: filter, reorder, restructure the tree (=query capabilities), e.g. TOC, strip details, inline results of PIs, eg sorted table results of a query content based: if value < 0 then red else black

    66. XML Tutorial, Bertram Ludäscher 66 XSL Overview XSL stylesheets are denoted in XML syntax XSL components: 1. a language for transforming XML documents (XSLT: integral part of the XSL specification) 2. an XML formatting vocabulary (Formatting Objects: >90% of the formatting properties inherited from CSS) 1. will be the focus 2. FOs denote typographic abstractions such as page, paragraph finer level control (=properties): word- and letter-spacing; and widow, orphan, and hyphenation control, 1. will be the focus 2. FOs denote typographic abstractions such as page, paragraph finer level control (=properties): word- and letter-spacing; and widow, orphan, and hyphenation control,

    67. XML Tutorial, Bertram Ludäscher 67 XSLT Processing Model

    68. XML Tutorial, Bertram Ludäscher 68 XSLT Processing Model XSL stylesheet: collection of template rules template rule: (pattern ? template) main steps: match pattern against source tree instantiate template (replace current node “.” by the template in the result tree) select further nodes for processing control can be a mix of recursive processing ("push": <xsl:apply-templates> ...) program-driven ("pull": <xsl:foreach> ...)

    69. XML Tutorial, Bertram Ludäscher 69 But first: some syntactic sugar, PLEASE... instead of something complicated like y=f(x) in the brave new XSLT world you can “simply” write this as: <xsl:variable name="y"> <xsl:call-template name="f"> <xsl:with-param name="x"/> </xsl:call-template> </xsl:variable name="y">

    70. XML Tutorial, Bertram Ludäscher 70 Template Rule: Example

    71. XML Tutorial, Bertram Ludäscher 71 Match/Select Patterns match patterns ? select patterns = defined in http://w3.org/TR/xpath Examples: /mybook/chapter[2]/section/* chapter|appendix chapter//para div[@class="appendix" and position() mod 2 = 1]//para ../@lang

    72. XML Tutorial, Bertram Ludäscher 72 XSLT Processing Flavors: Recursive Descent Processing

    73. XML Tutorial, Bertram Ludäscher 73 Creating the Result Tree... Literal result elements: non-XSL elements (e.g., HTML) appear “literally” in the result tree Constructing elements: (similar for xsl:attribute, xsl:text, xsl:comment,…) Generating text:

    74. XML Tutorial, Bertram Ludäscher 74 Creating the Result Tree... Further XSL elements for ... Numbering <xsl:number value="position()" format="1 "> Conditions <xsl:if test="position() mod 2 = 0"> Repetition...

    75. XML Tutorial, Bertram Ludäscher 75 Creating the Result Tree: Repetition

    76. XML Tutorial, Bertram Ludäscher 76 Creating the Result Tree: Sorting <xsl:template match="employees"> <ul> <xsl:apply-templates select="employee"> <xsl:sort select="name/last"/> <xsl:sort select="name/first"/> </xsl:apply-templates> </ul> </xsl:template> <xsl:template match="employee"> <li> <xsl:value-of select="name/first"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last"/> </li> </xsl:template> sort by last name, then by first name sort by last name, then by first name

    77. XML Tutorial, Bertram Ludäscher 77 More on XSL XSL(T): Conflict resolution for multiple applicable rules Modularization <xsl:include> <xsl:import> … XSL Formatting Objects a la CSS XPath (navigation syntax + functions) = XSLT ? XPointer ... FO: pagination and layout, block formatting, tables, list Properties: background, border, font XPointer: provides for specific reference to elements, character strings, selections, and other parts of XML documents, whether or not they bear an explicit ID attribute, using traversals of a document"s structure and choice of parts based on their properties such as element types, attribute values, character content, and relative position, containment, and order. XPointer defines the meaning of the "selector" or "fragment identifier" portion of URIs that locate resources of MIME media types "text/xml" and "application/xml".FO: pagination and layout, block formatting, tables, list Properties: background, border, font XPointer: provides for specific reference to elements, character strings, selections, and other parts of XML documents, whether or not they bear an explicit ID attribute, using traversals of a document"s structure and choice of parts based on their properties such as element types, attribute values, character content, and relative position, containment, and order. XPointer defines the meaning of the "selector" or "fragment identifier" portion of URIs that locate resources of MIME media types "text/xml" and "application/xml".

    78. XML Tutorial, Bertram Ludäscher 78 The MIX Project: Mediation of Information using XML Joint effort between SDSC and the UCSD CSE Department

    79. XML Tutorial, Bertram Ludäscher 79 Mediation of Information using XML (MIX)

    80. XML Tutorial, Bertram Ludäscher 80 Integrated / Mediated views

    81. XML Tutorial, Bertram Ludäscher 81 A Typical Mediation Scenario

    82. XML Tutorial, Bertram Ludäscher 82 MIX Components MIXm Mediator tool-kit allows definition of views across multiple resources views are expressed in a declarative query language query engine to execute queries on views XML Matching And Structuring (XMAS) query language operates on a given set of XML documents to produce a new XML document, using XMAS algebra

    83. XML Tutorial, Bertram Ludäscher 83 An XML Query (XMAS)

    84. XML Tutorial, Bertram Ludäscher 84 MIX components... DOM-VXD: DOM Virtual XML Document extension a “lazy” implementation of DOM. Supports browsing/ navigation of XML documents with a server-side, “compute as you go” model Blended Browsing and Querying (BBQ) interface supports navigation and querying of XML documents generates XMAS queries on mediator views generates XMAS queries modified by DOM-VXD operations to incrementally evaluate the result set, to support navigation of XML documents

    85. XML Tutorial, Bertram Ludäscher 85 Navigation driven evaluation In general: A lazy mediator computes the result of each client navigation of the virtual result by issuing navigations to the sources and processing the results of source navigations. Thus the mediator acts as a translator of navigational commands (DOM)In general: A lazy mediator computes the result of each client navigation of the virtual result by issuing navigations to the sources and processing the results of source navigations. Thus the mediator acts as a translator of navigational commands (DOM)

    86. XML Tutorial, Bertram Ludäscher 86 Blended Browsing and Querying UI (BBQ)

    87. XML Tutorial, Bertram Ludäscher 87 Another MIX Example: CDL/AMICO Mediator Prototype

    88. XML Tutorial, Bertram Ludäscher 88 XSL Stylesheet for AMICO Answer Docs

    89. XML Tutorial, Bertram Ludäscher 89 ... and the Result (+BBQ)

    90. XML Tutorial, Bertram Ludäscher 90 Projects at DICE/SDSC National Archives and Records Administration, NARA Persistent Archives and Electronic Records NHPRC/NARA XML and GIS aXioMap I2T: An Information Integration Testbed for Digital Government

    91. XML Tutorial, Bertram Ludäscher 91 Projects at SDSC (… cont) AMICO In conjunction with the California Digital Library (CDL) Part of the NSF DLI-2 project ESRI Community of Science, Inc. Networked Earthquake Engineering Simulation (NEES) NSF program

    92. XML Tutorial, Bertram Ludäscher 92 Information Based Computing

    93. XML Tutorial, Bertram Ludäscher 93 Integrating Data Set Management Model-Based Information Management Rule-based ontology mapping, conceptual-level mediation - CMIX Data Grid Data federation across multiple libraries - MIX Digital Library Interoperable services for information discovery and presentation - SDLIP Data Collection Tools for managing data set collections on databases - MCAT Data Handling Systems for data retrieval from remote storage - SRB Persistent Archives Storage of data collections for 30 years

    94. XML Tutorial, Bertram Ludäscher 94 Model-Based Mediation Knowledge-based mediation conceptual-level integration Rule-based ontology maps map source XML to CM to FL (ontologies, views) Models for exporting rules integrity constraints query capabilities data & schema (XML/DTDs)

    95. XML Tutorial, Bertram Ludäscher 95 Federation of Brain Data

    96. XML Tutorial, Bertram Ludäscher 96 Further Information xml.com w3.org xml.org ibm.com/xml ... Mediation of Information using XML (MIX): www.npaci.edu/DICE/MIX/ www.db.ucsd.edu/Projects/MIX/

More Related