1 / 78

XML

XML. An introduction and DTD coverage. xml. XML like HTML is created from the Standard Generalized Markup Language, SGML. A brief introduction to XML: A simple xml doc. <?xml version =“1.0”?> <!– a simple xml example…this is a comment --!> <mymessage> <message>Welcome to XML!</message>

gerry
Download Presentation

XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML An introduction and DTD coverage

  2. xml • XML like HTML is created from the Standard Generalized Markup Language, SGML

  3. A brief introduction to XML: A simple xml doc <?xml version =“1.0”?> <!– a simple xml example…this is a comment --!> <mymessage> <message>Welcome to XML!</message> </mymessage>

  4. In validator: file is in examples\ch05\intro.xml

  5. XML documents and format • An XML document contains data, not formatting information. As we’ll learn, there are ways (xsl and fo files, for example) to provide formatting for xml analogous to that in which css provided formatting for html.

  6. XML • XML are typically stored in a file with suffix .xml, though this is not required. They can be created with any editor (save as ASCII text). Many packages like MS Word can save files as type .xml • An xml document contains a single root which contains other elements, Anything appearing before the root is called the prolog. Elements directly under the root are its children. The structure is recursive. • In the example, the root’s child message contains the text “Here is some message”.

  7. The character set • XML characters are CR, LF and Unicode. • An XML document consists of markup and character data. • Markup is enclosed in angle brackets (like html): <> • Character data appears between the start and end tag. • An xml parser passes whitespace characters to the application. Insignificant whitespace can be collapsed in a process called normalization. • It is a good idea to add whitespace to an xml document for readability. • &, <, >, ‘ and “ are reserved characters. An “entity reference” makes it possible to use these as characters in the character data part of an xml document. • Entity references begin with & and end with ; • In this way character data is not confused with markup. • Single and double quote are used to delimit attribute values.

  8. More on syntax • There must be exactly one root. • Proper nesting of elements is required. • Start tags require close tags. • Unlike HTML, the author can define her own tags in XML. • Tags are case sensitive • Parser needs to distinguish markup from character data • Typically whitespace is normalized – reduced to 1 whitespace char. • Entity references are marked with an ampersand and allow us to use meta characters (‘<‘, ‘>’ and so on) which are part of the language syntax. • Entity references (for example, &lt) allow us to represent and distinguish the reserved characters <,>,& in XML. • They may only appear as an entity reference in character data

  9. XML intro continued • A DOM-based parser returns a tree structure. A DOM parser must process the entire document to create a (java) object which may be 3 or 4X the size of the original. Not advisable if there are storage size constraints. • A SAX (Simple-API for XML) -based parser returns events. SAX parsers have a smaller footprint. • Many parsers can be downloaded for free and several come with java 1.4+

  10. A brief introduction to XML • An xml validator parses an XML document and indicates if it is correct. • A number of free “Validators” are available, including one form MS which I downloaded and used.

  11. Validator Microsoft provides a validating program free for download (with javascript and VBscript versions) at http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/samples/internet/xml/xml_validator/default.asp Or search MSDN+validator

  12. Validator links in my internet programming directory • http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm • This is a link for javascript validator • http://employees.oneonta.edu/higgindm/internet%20programming/validate_vbs.htm • This is a link for vbscript validator

  13. MS Validator:http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm

  14. Parser continued • The parser will indicate if the document is well-formed. • In DOM-based parsing, a ‘+’ in the left margin indicates a node has children and a’ –’ indicates all child nodes have been expanded. • The MS Validator uses color coding to indicate child nodes can be expanded • An element that stores other elements is called a container element. • The parser makes the document content available for further processing if it is well-formed.

  15. Validator example

  16. Validator

  17. Reserved characters • <message>&lt;&gt;&amp;</message> would enable a character data message to contain characters: <>&

  18. DTD: document type definition • a dtd file may contain the definition of an xml structure. • XML files may refer back to a dtd. • If an XML document that has a DTD or Schema a validating parser can determine if it is not merely well-formed XML, but valid. • Valid means conforming to a dtd or schema.

  19. Another example: Unicode • Lang.xml (next slide) uses unicode entity references to represent arabic words. • lang.dtd (also shown in a later slide) is used to generate unicode characters (arabic) for some entity references in the XML file.

  20. DTD: document type definition: a dtd file may contain the definition of an xml structure. <?xml version = "1.0"?> <!-- Fig. 5.4 : lang.xml --> <!-- Demonstrating Unicode --> <!DOCTYPE welcome SYSTEM "lang.dtd"> <welcome> <from> <!-- Deitel and Associates --> &#1583;&#1575;&#1610;&#1578;&#1614;&#1604; &#1571;&#1606;&#1583; <!-- entity --> &assoc; </from> <subject> <!-- Welcome to the world of Unicode --> &#1571;&#1607;&#1604;&#1575;&#1611; &#1576;&#1603;&#1605; &#1601;&#1610;&#1616; &#1593;&#1575;&#1604;&#1605; <!-- entity --> &text; </subject> </welcome>

  21. Lang.dtd <!-- lang.dtd --> <!ELEMENT welcome ( from, subject )> <!ELEMENT from ( #PCDATA )> <!ELEMENT subject ( #PCDATA )> <!ENTITY assoc "&#1571;&#1587;&#1617;&#1608;&#1588;&#1616;&#1610;&#1614;&#1578;&#1618;&#1587;"> <!ENTITY text "&#1575;&#1604;&#1610;&#1608;&#1606;&#1610;&#1603;&#1608;&#1583;">

  22. Lang.xml in validator

  23. Lang.xml in IE

  24. About the example • The DTD reference contains: DOCTYPE, the name of the root, the SYSTEM flag indicating the DTD file is external, and the name of that file. • Root element welcome contains two elements: from and subject. • Some lines contain entity references for unicode. • The DTD also defines some other entity references.

  25. More about markup • XML end tags may consist of /> if there is an empty element as in <emptyelt xxxx /> • but otherwise must consist of a complete end-tag as in: <sometag> xxxxxxxxxxx </sometag> • Elements may or may not have content (child elements or character data) • Elements may have 0 or more attributes associated with them. Attributes appear in the element’s start tag: <car doors =“4”/> • Attribute values must appear in single or double quotes. • Element and attribute names may not contain blanks. • Here, element car has attribute doors with value 4. • Attributes may contain any characters and be of any length but must start with a letter or underscore.

  26. Usage.xml uses a stylesheet <?xml version = "1.0"?> <!-- Fig. 5.5 : usage.xml --> <!-- Usage of elements and attributes --> <?xml:stylesheet type = "text/xsl" href = "usage.xsl"?> <book isbn = "999-99999-9-X"> <title>Deitel&apos;s XML Primer</title> <author> <firstName>Paul</firstName> <lastName>Deitel</lastName> </author> <chapters> <preface num = "1" pages = "2">Welcome</preface> <chapter num = "1" pages = "4">Easy XML</chapter> <chapter num = "2" pages = "2">XML Elements?</chapter> <appendix num = "1" pages = "9">Entities</appendix> </chapters> <media type = "CD"/> </book>

  27. Usage.xls In notes <? Xxxxx ?> in usage.xml represents a pi (that is, a processing instruction). PI consist of a PI target (xml:stylesheet, in this example) and a PI value. Note syntax. PI can be used to help authors embed application-specific data in an xml document. If the application processing the xml doesn’t use the PI, then it has no effect on the xml document content.

  28. Usage.xml in validator

  29. Usage.XML document loaded into IE: Browser uses stylesheet to generate HTML

  30. CData • CData sections of an XML text, reserved chars or whitespace. The character data appearing in CData sections is ignored by the xml parser. • CData might be used for JavaScript or VBScript. • CData starts with <![CData[ and ends with ]]> • CData may contain reserved characters, but not the text: ]]>

  31. Text example 5.7 <?xml version = "1.0"?> <!-- Fig. 5.7 : cdata.xml --> <!-- CDATA section containing C++ code --> <book title = "C++ How to Program" edition = "3"> <sample> // C++ comment if ( this-&gt;getX() &lt; 5 &amp;&amp; value[ 0 ] != 3 ) cerr &lt;&lt; this-&gt;displayError(); </sample> <sample> <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr << this->displayError(); ]]> </sample> C++ How to Program by Deitel &amp; Deitel </book>

  32. CData example from text 5.7

  33. Cdata.xml in MS validator (file is in examples\ch05)

  34. letter.xml - I removed blank lines to get it to fit <?xml version = "1.0"?> <letter> <contact type = "from"> <name>Jane Doe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag gender = "F"/> </contact> <contact type = "to"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag gender = "M"/> </contact> <salutation>Dear Sir:</salutation> <paragraph>It is our privilege to inform you about our new database managed with <bold>XML</bold>. This new system allows you to reduce the load on your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <paragraph>The data in an XML element is normalized, so plain-text diagrams such as /---\ | | \---/ will become gibberish.</paragraph> <closing>Sincerely</closing> <signature>Ms. Doe</signature> </letter>

  35. letter.xml in Validator

  36. namespaces • Naming collisions can occur when xml authors use the same tag names • Namespaces provide a mechanism for making tag references unambiguous. • A namespace reference appears with the start and end tags followed by a colon. So, • <movie:character>Scrooge</movie:character> can be differentiated from <ascii:character>colon</ascii:character> • Namespace prefixes are tied to unique URI in the xml document. Almost any name can be used to create a namespace prefix. • In this example ascii and movie are namespace prefixes. Namespace prefixes can precede element and attribute values to avoid collisions. • A URL may be used for a URI. The only requirement though is uniqueness as the URLs are not visited by the parser.

  37. Namespace example 5.8 <?xml version = "1.0"?> <!-- Fig. 5.8 : namespace.xml --> <!-- Namespaces --> <text:directory xmlns:text = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <text:file filename = "book.xml"> <text:description>A book list</text:description> </text:file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </text:directory>

  38. Namespace.xml in validator: file is in examples\ch05

  39. Namespace.xml example 5.8 in IE

  40. Namespaces continued • Providing a prefix can be tedious. A default namespace can be created and elements and attributes used in the xml document from this namespace do not need prefixes.

  41. Default namespaces <?xml version = "1.0"?> <!-- Fig. 5.9 : defaultnamespace.xml --> <!-- Using Default Namespaces --> <directory xmlns = "urn:deitel:textInfo" xmlns:image = "urn:deitel:imageInfo"> <file filename = "book.xml"> <description>A book list</description> </file> <image:file filename = "funny.jpg"> <image:description>A funny picture</image:description> <image:size width = "200" height = "100"/> </image:file> </directory>

  42. Default namespaces • Now, file is in the default namespace. • Compare this example to the earlier namespace example where text and image were distinct namespaces.

  43. Defaultnamespace.xml in IE

  44. DTD: document type definition • A DTD is defined using EBNF (extended BNF) and can be used to specify allowable elements and attributes for an XML document. • There is a move away from DTD currently, toward Schema. Schema documents have XML (not BNF) syntax. • Some parsers can check an XML document against its DTD and determine if it is valid. These are called validating parsers. A document which is syntactically correct but does not correspond to its DTD is well-formed. Non-validating parsers can’t check documents against their DTD and can thus only determine if the document is well-formed.

  45. Document Type Declaration <DOCTYPE….> in an XML document prolog is used to specify DTD appearing within or outside the document. These are referred to as the internal or external subset. <DOCTYPE thingy [ <!ELEMENT thingy (#PCDATA)> ]> Declares a dtd called thingy with one element in the internal subset. PCDATA refers to “parseable character data” meaning reserved characters <,> and & within the PCDATA will be treated as markup. The parentheses contain the content specification for the element.

  46. MS XML validator • We can check an xml document for adherence to an external DTD using MS XML validator. Here’s the xml: <?xml version = "1.0"?> <!-- Fig. 6.1: intro.xml --> <!-- Using an external subset --> <!DOCTYPE myMessage SYSTEM "intro.dtd"> <myMessage> <message>Welcome to XML!</message> </myMessage> And here’s the DTD: <!-- Fig. 6.2: intro.dtd --> <!-- External declarations --> <!ELEMENT myMessage ( message )> <!ELEMENT message ( #PCDATA )>

  47. MS Validating parser can validate against schema or dtd

  48. Invalid xml • In the next slide we use the MS XML validator to check an xml (appearing below) like intro.xml but missing the message element: <?xml version = "1.0"?> <!-- Fig. 6.3 : intro-invalid.xml --> <!-- Simple introduction to XML markup --> <!DOCTYPE myMessage SYSTEM "intro.dtd"> <!-- Root element missing child element message --> <myMessage> </myMessage>

  49. If xml doc does not match dtd/schema

  50. Sequences, pipes and occurrences • The comma can be used to indicate a sequence in which elements must appear. <!ELEMENT class (prof, student)> • Indicates the order and number of elements making up a class: one prof and one student, in that order. Content may specify any number of elements. <!ELEMENT sidedish (coleslaw|chips)> • Indicates just one of the choices must be selected. • +, *, and ? Indicate frequency of element occurrences. • + means 1 or more occurences, * means 0 or more occurences, ? Means 0 or 1 occurrence. <!ELEMENT class (prof, student+)> Might be appropriate for a class DTD meaning just one professor and one or more students.

More Related