1 / 68

XML 101: Hype or Hoax ?

XML 101: Hype or Hoax ?. IST 597 Nov. 10, 2005 (Guest lecture in James Wang’s class) Dongwon Lee, Ph.D. Overview. HTML Model XML Model RDBMS vs. XML XML Schema XML Tools …. www.w3c.org. www.oasis-open.org. History of HTML. HTML: Hyper-Text Markup Language

macy
Download Presentation

XML 101: Hype or Hoax ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML 101: Hype or Hoax ? IST 597 Nov. 10, 2005 (Guest lecture in James Wang’s class) Dongwon Lee, Ph.D.

  2. Overview • HTML Model • XML Model • RDBMS vs. XML • XML Schema • XML Tools • … Nov. 10, 2005

  3. www.w3c.org Nov. 10, 2005

  4. www.oasis-open.org Nov. 10, 2005

  5. History of HTML • HTML: Hyper-Text Markup Language • Invented by Tim Berners-Lee and Robert Caillau at CERN in 1991 • What is hyper-text? • A document that contains links to other documents (and text, sound, images...) • Invented around 1945 by Vannevar Bush • What is a markup language? • A notation for writing text with markup tags (<tag>) • Tags indicate the structure of the text • Tags have names and attributes • Tags may enclose a part of the text • Invented around 1970 by Charles F. Goldfarb (SGML) Nov. 10, 2005

  6. HTML • HTML was designed to display data and to focus on how data looks. Nov. 10, 2005

  7. XML • What does XML stand for (GRE question)? • X-rated Modular 3 Language • eXtensible Standard ML Programming Language • eXtensible UML (Unified Modeling Language) for the Web • eXtensible Meta Language for the Web • eXtensible Markup Language originating from SGML • None of the above Nov. 10, 2005

  8. XML • XML is a framework for defining markup languages: • XML was designed to describe data, not format. • There is no fixed collection of markup tags. One must define your own tags, tailored for our kind of information • Allow tailor-made markup for any imaginable application domain • XML uses a schema language (eg, DTD, XML-Schema) to formally describe the data. • XML separates syntax from semantics to provide a common framework for structuring information • Web browser rendering semantics is separately defined by stylesheets Nov. 10, 2005

  9. XML • XML is not a replacement for HTML: • HTML should ideally be just another XML language • in fact, XHTML is just that • XHTML is a (very popular) XML language for hypertext markup • HTML is about displaying information, but XML is about describing information. Nov. 10, 2005

  10. <center> <h1>SIGMOD</h1> <p><b> <u>ACM</u> <a href=“sigmod02.org”>SIGMOD Conference</a>, Madison, WI, 2002 </b></p> </center> <event eID=“sigmod02”> <acronym>SIGMOD</acronym> <society>ACM</society> <url>www.sigmod02.org</url> <loc> <city>Madison</city> <state>WI</state> </loc> <year>2002</year> </event> HTML vs. XML XML HTML Nov. 10, 2005

  11. Nov. 10, 2005

  12. HTML vs. XML • Need a stylesheet to define browser presentation semantics HTML XML CSS/XSL Browser Browser Nov. 10, 2005

  13. HTML vs. XML • Need a stylesheet to define browser presentation semantics Data/Format Data Format Browser Browser Nov. 10, 2005

  14. Database Perspective • DB must support • Capture • Storage • Retrieval • Exchange • XML originally as the language to exchange data over Web • Replacing EDI (Electronic Data Interchange) Nov. 10, 2005

  15. RDBMS vs. XML Example • AFV receives 100+ videos every week • [IST210 HW#2, 2004] Build a DB to be able to answer the following queries: • Who sent which videos? • Show me all videos about Cat category • How many videos in a database since Jan 1, 2003? • Which is the video with the best rating for the 1st week of Jan? • Where does the sender James live? Phone? Gender? • How many videos does James send so far? • Show me all the ghost videos (ones without sender information) Nov. 10, 2005

  16. ER Model Category Videos VID Date Rating Sends Phone Name Owners Gender Address Nov. 10, 2005

  17. ER => RDBMS Category Videos Videos VID Date Rating Sends Sends Owners Phone Name Owners Gender Address Nov. 10, 2005

  18. RDBMS Video Sends Owners Nov. 10, 2005

  19. Changes #1 • VHS video => VHS, CD, DVD • 100+ videos => 1 million videos Nov. 10, 2005

  20. RDBMS Video Sends Owners Nov. 10, 2005

  21. Changes #2 • Arbitrary name formats for owners • Eg, J. Doe vs. Dr. “Jonny” John Jay Doe Jr • 100+ different ways to capture owners’ information • “100 E. Foster Ave #200, State College, PA, 16801” vs. • adr1=“100 E. Foster #210”, adr2=“State College, PA, 16801” • street=“100 E. Foster, #200”, city=“State College”, state=“PA”, zip=“16801” • 100+ different video formats with varying properties => 1000+ attributes for Videos Nov. 10, 2005

  22. RDBMS: Finest Granularity Video Sends Owners Nov. 10, 2005

  23. RDBMS: Coarsest Granularity Video Sends Owners Nov. 10, 2005

  24. Violation Of 1NF RDBMS: Ideal Case Video Sends Owners Nov. 10, 2005

  25. XML Video <VideoTable> <Video vid=“100” category=“comedy” date=“2005/1/1” rating=“5” att2=“10” att1000=“T1” /> <Video vid=“200” category=“action” date=“2005/1/10” rating=“4” att1=“20” /> <Video vid=“300” category=“SF” date=“2004/12/31” rating=“5” att1000=“S20” /> </VideoTable> Nov. 10, 2005

  26. Address Adr1 Adr2 State Address Street City State Zip XML Owners <OwnerTable> <Owner phone=“564-3456” gender=“F”> <Name FN=“Jenny” /> <Address> <Street>310 N. Atherton</Street><City>State College</City> <State>PA</State><Zip>16801</Zip> </Address> </Owner> <Owner phone=“123-4567” gender=“M”> <Name Prefix=“Dr.” NN=“Jonny” FN=“John” MN=“Jay” LN=“Doe” Suffix=“Jr.” /> <Address> <Adr1>10 S, Beaver</Adr1> <Adr2><State>PA</State></Adr2> </Addres> </Owner> </OwnerTable> Nov. 10, 2005

  27. Structured model Large scale data Limited semantics Focus: how to handle large size data efficiently? Unstructured or Semi-structured model Small to Medium scale data Document vs. data Flexible and rich semantics Focus: how can one handle large number of small size data with various formats efficiently? RDBMS vs. XML RDBMS XML Nov. 10, 2005

  28. A conceptual view of XML • An XML document is an ordered, labeled tree • Character data leaf nodes contain the actual data (text strings) • Elements nodes are each labeled with • a name (often called the element type), and • a set of attributes, each consisting of a name and a value, • and can have child nodes Nov. 10, 2005

  29. A concrete view of XML • An XML document is a text with markup tags and other meta-information. • Markup tags denote elements:   ..<foo attr="val" ...>bar</foo>...     |     |                | |     |     |                | a matching element end tag     |     |               the contents “bar” of the element     |     an attribute with name attr and value val, enclosed by ' or "an element start tag with name foo • An XML document must be well-formed: • start and end tags must match • element tags must be properly nested Nov. 10, 2005

  30. Example of XML document <?xml version="1.0"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • The XML declaration (1st line) should always be included; root element is <note> • In XML all elements must have a closing tag: • <p>foo<p>bar => <p>foo</p><p>bar</p> • XML tags are case sensitive: • <Message>This is incorrect</message> • Must be properly nested within each other: • <b><i>This is incorrect</b></i> Nov. 10, 2005

  31. Element vs. Attribute • The same information can be captured by either Element or Attribute in XML <event eID=“sigmod02”> <acronym>SIGMOD</acronym> <society>ACM</society> <url>www.sigmod02.org</url> <loc> <city>Madison</city> <state>WI</state> </loc> <year>2002</year> </event> <event eID=“sigmod02” acronym=“SIGMOD” society=“ACM url=“www.sigmod02.org” city=“Madison” state=“WI” year=“2002”/> Nov. 10, 2005

  32. Applications of XML • XML is a meta-language to create another languages; the main application of XML is making new languages • XHTML: W3C's XMLization of HTML 4.0. <?xml version="1.0" encoding="UTF-8"?> <html xmlns=http://www.w3.org/1999/xhtml xml:lang="en"> <head><title>Hello world!</title></head> <body><p>foobar</p></body> </html> Nov. 10, 2005

  33. Applications of XML (cont.) • CML: Chemical Markup Language • There are +1000 new markup languages made by XML (eg, www.schema.net) <molecule id="METHANOL"> <atomArray> <stringArray builtin="elementType">C O H H H H</stringArray> <floatArray builtin="x3" units="pm"> -0.748 0.558 -1.293 -1.263 -0.699 0.716 </floatArray> </atomArray> </molecule> Nov. 10, 2005

  34. Why is XML NOT important? • Some people believe that XML is NOT important • The best answer that I have gotten so far is (from SIGMOD 2001 Panel Talk): XML is NOT important because … • XML is a wire language • But, world is going for wireless • What do you think? Nov. 10, 2005

  35. Why is XML important? • Technically, … Nothing; Just old simple tree model… • Non-technically, … • Hot ($$$) • The standard for representation of Web information • The real force of XML is generic languagesandtools! • By building on XML, you get a massive (standard) infrastructure for free Nov. 10, 2005

  36. XML Usages… Nov. 10, 2005

  37. XML Offers • Common extensions to the core XML specification: a namespace mechanism, document inclusion, etc. • Schemas: grammars to define classes of documents • Linking between documents: a generalization of HTML anchors and links • Transformation: conversion from one document class to another • Querying: extraction of information, generalizing relational databases • More… Nov. 10, 2005

  38. Nov. 10, 2005

  39. XML Schema Languages • A schema is a formal description of structures and constraints • Relational schema consists of table, column, constraints, etc. • A schema language is a formal language for expressing schemas. • Schema processing: Given an XML document and a schema, a schema processor checks for validity, i.e. that the document conforms to the schema requirements Nov. 10, 2005

  40. XML Schema Languages (cont) Nov. 10, 2005

  41. Why Bother? • Why bother formalizing the syntax with a schema? • A formal definition provides a precise but human-readable reference • Schema processing can be done with existing implementations • Your own tools for your language can benefit: by piping input documents through a schema processor, you can assume that the input is valid and defaults have been inserted Nov. 10, 2005

  42. Which Schema Language? • Many proposals competing for acceptance • W3C Proposals: DTD, XML Data, DCD, DDML, SOX, XML-Schema, … • Non-W3c Proposals: Assertion Grammars, Schematron, DSD, TREX, RELAX, XDuce, RELAX-NG, … • Different applications have different needs from a schema language Nov. 10, 2005

  43. Expressive Power (content model) DTD XML-Schema XDuce, RELAX-NG Nov. 10, 2005

  44. Closure (content model) DTD XML-Schema XDuce, RELAX-NG Closed under INTERSECT Closed under INTERSECT, UNION, DIFFERENCE Nov. 10, 2005

  45. DTD: Document Type Definition • XML DTD is a subset of SGML DTD • XML DTD is the standard XML Schema Language of the past (and present maybe…) • It is one of the simplest and least expressive schema languages proposed for XML model • It does not use XML tag notation, but use its own weird notation • It cannot express relatively complex constraint (eg, key) well • It is being replaced by XML-Schema of W3C and RELAX-NG of OASIS Nov. 10, 2005

  46. DTD: Elements • <!ELEMENT element-namecontent-model>associates a content model to all elements of the given name content models: • EMPTY: no content is allowed • ANY: any content is allowed • (#PCDATA|element-name|...): "mixed content", arbitrary sequence of character data and listed elements Nov. 10, 2005

  47. DTD: Elements • Regular Expression over element names: sequence of elements matching the expression • choice: (...|...|...) • sequence: (...,...,...) • optional: ...? • zero or more: ...* • one or more: ...+ Nov. 10, 2005

  48. DTD: Elements • Eg: “Name” element consists of an • optional FirstName, followed by • mandatory LastName elements, where • Both are text string <!ELEMENT Name (FirstName? , LastName) <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA) Nov. 10, 2005

  49. DTD: Attributes • <!ATTLIST element-nameattr-nameattr-typeattr-default ...> declares which attributes are allowed or required in which elements attribute types: • CDATA: any value is allowed (the default) • (value|...): enumeration of allowed values • ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them obsolete) Nov. 10, 2005

  50. DTD: Attributes • Attribute defaults: • #REQUIRED: the attribute must be explicitly provided • #IMPLIED: attribute is optional, no default provided • "value": if not explicitly provided, this value inserted by default • #FIXED "value": as above, but only this value is allowed Nov. 10, 2005

More Related