1 / 63

XML Introduction

XML Introduction. Introducing XML. XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. Because it is extensible, XML can be used to create a wide variety of document types. Introducing XML.

Download Presentation

XML Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Introduction

  2. Introducing XML • XML stands for Extensible Markup Language. A markup language specifies the structure and content of a document. • Because it is extensible, XML can be used to create a wide variety of document types.

  3. Introducing XML • XML is a subset of a the Standard Generalized Markup Language (SGML) which was introduced in the 1980s. SGML is very complex and can be costly. • These reasons led to the creation of Hypertext Markup Language (HTML), a more easily used markup language. XML can be seen as sitting between SGML and HTML – easier to learn than SGML, but more robust than HTML.

  4. The Limits of HTML • HTML was designed for formatting text on a Web page. It was not designed for dealing with the content of a Web page. Additional features have been added to HTML, but they do not solve data description or cataloging issues in an HTML document. • Because HTML is not extensible, it cannot be modified to meet specific needs. Browser developers have added features making HTML more robust, but this has resulted in a confusing mix of different HTML standards.

  5. Introducing XML • HTML cannot be applied consistently. Different browsers require different standards making the final document appear differently on one browser compared with another.

  6. Introduction to XML Markup • XML document (intro.xml) • Marks up message as XML • Commonly stored in text files • Extension .xml

  7. 1 <?xml version = "1.0"?> Document begins with declaration that specifies XML version 1.0 2 3 <!-- Fig. 5.1 : intro.xml --> Element message is child element of root elementmyMessage 4 <!-- Simple introduction to XML markup --> Line numbers are not part of XML document. We include them for clarity. 5 6 <myMessage> 7 <message>Welcome to XML!</message> 8 </myMessage>

  8. Introduction to XML Markup (cont.) • XML documents • Must contain exactly one root element • Attempting to create more than one root element is erroneous • Elements must be nested properly • Incorrect:<x><y>hello</x></y> • Correct:<x><y>hello</y></x> • Must be well-formed

  9. XML Parsers • An XML processor (also called XML parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax. • XML parsers are strict. It is this rigidity built into XML that ensures XML code accepted by the parser will work the same everywhere.

  10. XML Architecture

  11. Structure of a Well-formed XML Document <?xml version="1.0" ?> <!DOCTYPE publication [ <!ELEMENT publications (journals, conferences, books)> ... <!ELEMENT author (#PCDATA)> <!ELEMENT issue (#PCDATA)> <!ATTLIST issue pages CDATA #REQUIRED> <!ENTITY JSI " <journal>Journal of Systems Integration</journal> <publisher>Kluwer Academic Publishers</publisher>"> ]> <publications> <journals> ... &JSI; ... </publications>

  12. XML Parsers • Microsoft’s parser is called MSXML and is built directly in IE versions 5.0 and above. • Netscape developed its own parser, called Mozilla, which is built into version 6.0 and above.

  13. Parsers and Well-formed XML Documents (cont.) • XML parsers support • Document Object Model (DOM) • Builds tree structure containing document data in memory • Simple API for XML (SAX) • Generates events when tags, comments, etc. are encountered • (Events are notifications to the application)

  14. Parsing an XML Document with MSXML • XML document • Contains data • Does not contain formatting information • Load XML document into Internet Explorer 5.0 • Document is parsed by msxml. • Places plus (+) or minus (-) signs next to container elements • Plus sign indicates that all child elements are hidden • Clicking plus sign expands container element • Displays children • Minus sign indicates that all child elements are visible • Clicking minus sign collapses container element • Hides children • Error generated, if document is not well formed

  15. XML document shown in IE6.

  16. Character Set • XML documents may contain • Carriage returns • Line feeds • Unicode characters • Enables computers to process characters for several languages

  17. Characters vs. Markup • XML must differentiate between • Markup text • Enclosed in angle brackets (< and >) • e.g,. Child elements • Character data • Text between start tag and end tag • Welcome to XML! • Elements versus Attributes

  18. White Space, Entity References and Built-in Entities • Whitespace characters • Spaces, tabs, line feeds and carriage returns • Significant (preserved by application) • Insignificant (not preserved by application) • Normalization • Whitespace collapsed into single whitespace character • Sometimes whitespace removed entirely <markup>This is character data</markup> after normalization, becomes <markup>This is character data</markup>

  19. White Space, Entity References and Built-in Entities (cont.) • XML-reserved characters • Ampersand (&) • Left-angle bracket (<) • Right-angle bracket (>) • Apostrophe (’) • Double quote (”) • Entity references • Allow to use XML-reserved characters • Begin with ampersand (&) and end with semicolon (;) • Prevents from misinterpreting character data as markup

  20. White Space, Entity References and Built-in Entities (cont.) • Build-in entities • Ampersand (&amp;) • Left-angle bracket (&lt;) • Right-angle bracket (&gt;) • Apostrophe (&apos;) • Quotation mark (&quot;) • Mark up characters “<>&” in element message <message>&lt;&gt;&amp;</message>

  21. Document Object Model (DOM) • XML Document Object Model (DOM) • Build tree structure in memory for XML documents • DOM-based parsers parse these structures • Exist in several languages (Java, C, C++, Python, Perl, C#, VB.NET, VB, etc)

  22. Document Object Model (DOM) • DOM tree • Each node represents an element, attribute, etc. <?xml version ="1.0"?><message from = "Paul" to = "Tem"> <body>Hi, Tim!</body></message> • Node created for element message • Element message has child node for body element • Element body has child node for text "Hi, Tim!" • Attributes from and to also have nodes in tree

  23. DOM Implementations • DOM-based parsers • Microsoft’s msxml • Microsoft.NET System.Xml Namspace • Sun Microsystem’s JAXP

  24. Creating Nodes • Create XML document at run time

  25. Traversing the DOM • Use DOM to traverse XML document • Output element nodes • Output attribute nodes • Output text nodes

  26. DOM Components • Manipulate XML document

  27. XPATH • XML Path Language (XPath) • Syntax for locating information in XML document • e.g., attribute values • String-based language of expressions • Not structural language like XML • Used by other XML technologies • XSLT

  28. XPATH - Nodes • XML document • Tree structure with nodes • Each node represents part of XML document • Seven types • Root • Element • Attribute • Text • Comment • Processing instruction • Namespace • Attributes and namespaces are not children of their parent node • They describe their parent node

  29. XPath node types

  30. XPath node types. (Part 2)

  31. Location Paths • Location path • Expression specifying how to navigate XPath tree • Composed of location steps • Each location step composed of • Axis • Node test • Predicate

  32. Axes • XPath searches are made relative to context node • Axis • Indicates which nodes are included in search • Relative to context node • Dictates node ordering in set • Forward axes select nodes that follow context node • Reverse axes select nodes that precede context node

  33. Node Tests • Node tests • Refine set of nodes selected by axis • Rely upon axis’ principle node type • Corresponds to type of node axis can select

  34. Node-set Operators and Functions (cont.) • Location-path expressions • Combine node-set operators and functions • Select all head and body children element nodes head | body • Select last bold element node in head element node head/title[ last() ] • Select third book element book[ position() = 3 ] • Or alternatively book[ 3 ] • Return total number of element-node children count( * ) • Select all book element nodes in document //book

  35. Sample Data for Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  36. bib Data Model for XPath The root The root element book book publisher author . . . . Addison-Wesley Serge Abiteboul

  37. XPath: Simple Expressions Result: <year> 1995 </year> <year> 1998 </year> Result: empty (there were no papers) /bib/book/year /bib/paper/year

  38. XML Document Type Definitions • Declarations Definition of element and attribute • Content Model (regular expressions) • Association of attributes with elements • Association of elements with other • Order and cardinality constraints

  39. Element Declarations • Basic form – <!ELEMENT elementname (contentmodel)> – Contentmodel determines which – Given by a regular expression • Atomic contents • Element content <!ELEMENT example ( a )> • Text content <!ELEMENT example (#PCDATA)> – Empty Element <!ELEMENT example EMPTY> – Arbitrary content <!ELEMENT example ANY>

  40. Element Declarations • Sequence <!ELEMENT example ( a, b )> • Alternative <!ELEMENT example ( a | b )> • Optional (zero or one) <!ELEMENT example ( a )?> • Optional and repeatable (zero or more) <!ELEMENT example ( a )*> • Required and repeatable (one or more) <!ELEMENT example ( a )+> • Mixed content <!ELEMENT example (#PCDATA | a)*> • Content model can be grouped by parentheses • Cyclic element containment is allowed

  41. Attribute Declarations • Each element can be associated with an arbitrary number of attributes • Basic form – <!ATTLIST Elementname Attributename Type Default Attributename Type Default ... > • Example: Document Type Definition <!ELEMENT shipTo ( #PCDATA)> <!ATTLIST shipTo country CDATA #REQUIRED "US" state CDATA #IMPLIED version CDATA #FIXED "1.0" payment (cash|creditCard) "cash"> Document <shipTo country="Switzerland" version="1.0" payment="creditCard"> … </shipTo>

  42. Attribute Declarations - Types • CDATA – String – <!ATTLIST example HREF CDATA • Enumeration – Token from given set of values, Default – <!ATTLIST example selection ( • Possible Defaults – Required attribute: #REQUIRED – Optional attribute: #IMPLIED – Fixed attribute: #FIXED – Default for enumeration: "value" • Other attribute types: IF, IDREF, ENTITY, ENTITIES, NOTATION, NAME, NAMES, NMTOKEN, NMTOKENS

  43. ID/IDREFExample: ID/IDREF • ID, IDREF – ID is a unique identifier within the document – IDREF is a reference to an ID – Referential integrity checked by the parser – ID's determined by the application – <!ATTLIST example identity ID #IMPLIED reference IDREF #IMPLIED>

  44. Inclusion of XML Document Type Definitions • External DTD Declaration <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE test PUBLIC "-//Test AG//DTD test V1.0//EN" SYSTEM "http://www.test.org/test.dtd"> <test> "test" is a document element </test> • Internal DTD Declaration <!DOCTYPE test [ <!ELEMENT test EMPTY> ]> <test/> • Mixed usage <!DOCTYPE test SYSTEM "http://www.test.org/test.dtd" [ <!ENTITY hello "hello world"> ]> <test>&hello;</test>

  45. Working with Namespaces • Name collision occurs when elements from two or more documents share the same name. • Name collision isn’t a problem if you are not concerned with validation. The document content only needs to be well-formed. • However, name collision will keep a document from being validated.

  46. Name Collision This figure shows two documents each with a Name element

  47. Using Namespaces to Avoid Name Collision This figure shows how to use a namespace to avoid collision

  48. Declaring a Namespace • A namespace is a defined collection of element and attribute names. • Names that belong to the same namespace must be unique. Elements can share the same name if they reside in different namespaces. • Namespaces must be declared before they can be used.

  49. Declaring a Namespace • A namespace can be declared in the prolog or as an element attribute. The syntax to declare a namespace in the prolog is: <?xml:namespace ns=“URI” prefix=“prefix”?> • Where URI is a Uniform Resource Identifier that assigns a unique name to the namespace, and prefix is a string of letters that associates each element or attribute in the document with the declared namespace.

  50. Declaring a Namespace • For example, <?xml:namespace ns=http://uhosp/patients/ns prefix=“pat”> • Declares a namespace with the prefix “pat” and the URI http://uhosp/patients/ns. • The URI is not a Web address. A URI identifies a physical or an abstract resource.

More Related