1 / 26

XML

XML. SNU OOPSLA Lab. October 2005. Contents. Semistructured Data Introduction History XML Application DTD & XML Schema DOM & SAX Summary Online Resources. Semistructured Data(1/3). Semistructured Data and XML Integration of heterogeneous sources Data sources with non-rigid structure

Download Presentation

XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML SNU OOPSLA Lab. October 2005

  2. Contents • Semistructured Data • Introduction • History • XML Application • DTD & XML Schema • DOM & SAX • Summary • Online Resources

  3. Semistructured Data(1/3) • Semistructured Data and XML • Integration of heterogeneous sources • Data sources with non-rigid structure • Biological data • Web data • Characteristics of Semistructured Data • Missing or additional attributes • Multiple attributes • Different types in different objects • Heterogeneous collections self-describing, irregular data, no a priori structure

  4. Semistructured Data(2/3) Data Model Bib &o1 complex object paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &96 1997 last firstname atomic object first firstname lastname lastname &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Object Exchange Model (OEM)

  5. Semistructured Data(3/3) Bib: &o1 { paper: &o12 { … }, book: &o24 { … }, paper: &o29 { author: &o52 “Abiteboul”, author: &o96 { firstname: &243 “Victor”, lastname: &o206 “Vianu”}, title: &o93 “Regular path queries with constraints”, references: &o12, references: &o24, pages: &o25 { first: &o64 122, last: &o92 133} } } Syntax for Semistructured Data

  6. Introduction(1/4) • XML • An acronym for ‘eXtensible Markup Language’ • A meta-language that describes other languages • A data format for storing structured and semi-structured text for dissemination and ultimate publication, perhaps on a variety of media

  7. Introduction(2/4) • Properties • Tags enclose identifiable parts of the document • Self-describing • Physical/logical structure • Physical structure : allows components of the document, called entities • Logical structure : allows a document to be divided into named units and sub-units, called elements

  8. Introduction(3/4) Physical Structure Logical Structure Document entities Unit (internal) (separate) Sub-unit elements

  9. Introduction(4/4) XML markup <warning> <para> This substance if hazardous to health </para> <para> See procedure 12A. 7 for information on protective clothing required. </para> <logo …/> </warning> <transaction> <time date=“19980509”/> <amount>123</amount> <currency type=“pounds”/> <from id=“x98765”> J. Smith</from> <to id=“x56565>M. Jones</to> </transaction> XML document

  10. History(1/2) XML 1997 WWW HTML 1992 SGML 1986 GM Internet GM = Generalized Markup 1960

  11. History(1/2) • 1960’s, IBMGML(GeneralizedMarkup Language) • 1980’s, ISO 8879, SGML(Standard GeneralizedMarkup Language) • Early 1990’s, HTML(HyperText Markup Language) • 1996, W3C’sXML • 1998, XML 1.0 • 1999, RDF(Resource Description Framework)

  12. Application DBMS XML ASP, Java, VB SAX Events Parser HTML Browser DTD XSL Processor Tree DOM DOM API DOM(Document Object Model) SAX(Simple APIs for XML) XSL(eXtensible Stylesheet Language) ASP(Active Server Page) Data exchange applications

  13. An XML Document <?xml version=“1.0”?> <!DOCTYPE sigmodRecord SYSTEM sigmodRecord.dtd”> <sigmodRecord><issue> <volume>1</volume> <number>1</number> <articles><articles> <title> XML Research Issues</title> <initPage> 1 </initPage> <endPage> 5 </endPage> <authors> <author AuthorPosition=“00”> Tom Hanks </author> … </authors></article></articles></issue> </sigmodRecord>

  14. DTD(1/2) • DTD(Document Type Definition) • An optional but powerful feature of XML • Comprises a set of declarations that define a document structure tree • Some XML processors read the DTD and use it to build the document model in memory • Establishes formal document structure rules • It define the elements and dictates where they may be applied in relation to each other

  15. DTD(2/2) • Declare Vs. Define • Declare  “This document is a concert poster” • Define  “A concert poster must have the following features” • DTD define • Element type + Attribute + Entities • Valid Vs. Invalid • Valid  conforms to DTD • Invalid  fail to conform to DTD Well formed XML Document Valid XML Document

  16. XML Schema • Schema • W3C standard : specifies structure of XML documents • Data types for elements/attributes • String, int, float • Unordered set is also allowed • Derivation of types are allowed • Replaces DTDs • Removes syntactic distinctions between DTD and XML • Richer types compared to DTD

  17. XML Schema Example <xsd:element name=“article” minOccurs=“0” maxOccurs=“unbounded”> <xsd:complexType><xsd:sequence> <xsd:element name=“title” type=“xsd:string”/> <xsd:element name=“initPage” type=“xsd:string”/> <xsd:element name=“endPage” type=“xsd:string”/> <xsd:element name=“author” type=“xsd:string”/> </xsd:sequence></xsd:complexType> <xsd:element> DTD <!ELEMENT article (title,initPage,endPage,author)> <!ELEMENT title (#PCDATA)> <!ELEMENT initPage (#PCDATA)> <!ELEMENT endPage (#PCDATA)> <!ELEMENT author (#PCDATA)>

  18. DOM(1) • Characteristics • Hierarchical (tree) object model for XML documents • Associate list of children with every node • Preserves the sequence of the elements in the XML documents sigmodRecord issue volume number articles XML document title initPage endPage

  19. DOM(2) • DOM interfaces • Node : The base data type of the DOM. • Element : The vast majority of the objects you’ll deal with are Elements. • Attr : Represents an attribute of an element. • Text : The actual content of an Element or Attr. • Document : Represents the entire XML document

  20. SAX(1) • DOM : expensive to materialize for a large XML collection • Characteristics • Event-driven : fire an event for every open tag/end tag • Does not require full parsing • Enables custom object model building Document Handler <!……………> <-> …………. </-> create startDocument() Application startElement() characters() endElement() Feedback When event driven give endDocument() parsing Parser Event driven

  21. SAX(2) • The SAX API actually defines four interfaces for handling events • EntityHandler • TDHandler • DocumentHandler • ErrorHandler • All of these interfaces are implemented by HandlerBase.

  22. DOM vs SAX(1/3) • Why use DOM? • Need to know a lot about the structure of a document • Need to move parts of the document around • Need to use the information in the document more than once • Why use SAX? • Only need to extract a few elements from an XML document

  23. DOM vs SAX(2/3) <book id="1"><verse> Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so were the counsels of Jove fulfilled from the day on which the son of Atreus, king of men, and great Achilles, first fell out with one another.</verse><verse> And which of the gods was it that set them on to quarrel? It was the son of Jove and Leto; for he was angry with the king and sent a pestilence upon ... • Doing this with the DOM would take a lot of memory • SAX API would be much more efficient

  24. DOM vs SAX(3/3) ... <address><name> <first-name>Mary</first-name> <last-name>McGoon</last-name> </name><street>1401 Main Street</street> <city>Anytown</city> <state>NC</state> <zip>34829</zip> </address> <address> <name>….. <street> ….. </address> <address> <name>….. <street> ….. </address> If we were parsing an XML document containing 10,000 addresses, and we wanted to sort them by last name?? DOM would automatically store all of the data. We could use DOM functions to move the nodes n the DOM tree

  25. Summary • XML • eXtensible Markup Language • A data format for storing structured and semi-structured text • physical/logical structure • DTD& XML Schema • Establishes formal document structure rules • DOM & SAX API • DOM: Need to know a lot about the structure of a document • SAX: Need to extract a few elements from an XML document

  26. Online Resources • XML tutorial • http://www.xml.com • http://www.w3c.org • http://www.w3schools.com/ • http://www.xmltraining.com/course-search-xml+online+tutorials • http://xmlfiles.com/

More Related