1 / 37

Introduction to XML: What is a Markup Language?

Learn about XML, a textual language where significant elements are indicated by markers. Understand its advantages over HTML and SGML and its applications in web development.

lemerson
Download Presentation

Introduction to XML: What is a Markup Language?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CP3024 Lecture 6 XML: Extensible Markup Language

  2. What is a markup language? • Textual (i.e. person readable) language where significant elements are indicated by markers • <TITLE>XML</TITLE> • Examples are RTF, HTML, VRML, TEX etc. • Easy to process and can be manipulated by a variety of application programs

  3. What does the Web use? • HTML • Hypertext Markup Language • Defined as the original Web language • Based on SGML (see later) • Suited for hypertext, multimedia, small simple documents • Currently at version 4.01 (the last?)

  4. Why change? - 1 • Change in Web usage • no longer a mechanism for exchanging scientific papers • presentational aspects are now seen as of greater importance • extracting the meaning of a document using a program will be a new growth area • HTML can't grow much more!

  5. Why change? - 2 • Extensibility • HTML does not allow users to specify their own tags • Structure • HTML cannot represent database schemas or object-oriented hierarchies • Validation • HTML does not allow applications to check that the structure of data is valid

  6. What is SGML? • Standard Generalised Markup Language • ISO 8879 • Can define any document format of any complexity • Enables, extensibility, structure and validation • Too many optional features for the Web

  7. What is XML? • Simplified subset of SGML designed for Web applications • Differs from HTML • Can define new tags • Structures may be nested to any level of complexity • XML documents may define a grammar which enables structural validation of that document

  8. Where has XML come from? • Emanates from the Word Wide Web consortium (W3C) • Developed by XML working group chaired by Jon Bosak (Sun Microsystems) • Group includes representatives from Microsoft, Netscape, HP, Adobe, etc. • Last bastion against proprietary markup and Web fragmentation

  9. Design Goals for XML - 1 • XML shall be straightforwardly usable over the Internet • XML shall support a wide variety of applications • XML shall be compatible with SGML • It shall be easy to write programs which process XML documents • The number of optional features is to be kept to the absolute minimum

  10. Design Goals for XML - 2 • XML documents should be human-legible • The XML design should be prepared quickly • The design of XML shall be formal and concise • XML documents shall be easy to create • Terseness in XML markup is of minimum importance

  11. The XML View of a Document Taken from an example given by Jon Bosak

  12. Structured Publishing Taken from an example given by Jon Bosak

  13. XML Example <?xml version="1.0"?> <sweepjoke> <harry>Say <quote>Bye Bye </quote>, Sweep </harry> <sweep> <quote>Bye Bye, Sweep</quote></sweep> <laughter/> </sweepjoke>

  14. XML Markup • Elements • Entity references • Comments • Processing Instructions • Marked sections • Document type declarations (DTD)

  15. Elements • Commonest form of markup • Delimited by angle brackets (<, >) • May be empty but normally consist of start tag and end tag • Start tag may contain attributes • <a href="www.scit.wlv.ac.uk">

  16. Entity References • In XML (and HTML) certain characters are reserved e.g. < • Entity references are used to insert these into documents • Entity references begin with an ampersand (&) and end with a semicolon (;) • You can define your own entities • Can be used to insert Unicode characters

  17. Comments • Begin with <!-- • End with --> • Can contain any data except -- • XML processors are not required to pass comments to an application

  18. Processing Instructions (PIs) • Provide information to an application • XML processors required to pass them on • Have the form <?name pidata?> • The name (PI target) identifies the PI • Data is optional and meaningful to an application that recognises the target

  19. Marked Sections • Parsers ignore everything in CDATA sections <![CDATA[ <head>if p < &lt;</head> ]]> • Only character string not allowed is ]]> • Data is passed on to the application

  20. Document Type Declarations • Optional in XML (not in SGML) • Specify constraints on the sequence and nesting of tags • Communicates meta-information to the parser about content • Sequence and nesting of tags, attribute values, external files, entities

  21. Kinds of Declaration • Element type declarations • Attribute list declarations • Entity declarations • Notation declarations

  22. Element Type Declaration <!ELEMENT sweepjoke (harry+, sweep, laughter?)> • A sweepjoke consists of a harry element followed by a sweep element and a laughter element • The harry element may be repeated (+) • + indicates one or more • The laughter element is optional (?)

  23. Sweepjoke Declaration <!ELEMENT sweepjoke (harry+, sweep, laughter?)> <!ELEMENT harry (#PCDATA | quote)*> <!ELEMENT sweep (#PCDATA | quote)*> <!ELEMENT quote (#PCDATA)*> <!ELEMENT laughter EMPTY> • PCDATA indicates parseable character data • | indicates 'or' • * indicates 'zero or more'

  24. Attribute List Declaration • Identifies • which elements may have attributes • what attributes they may have • what values are permitted for an attribute • what value is the default <!ATTLIST sweepjoke name ID #REQUIRED label CDATA #IMPLIED status ( funny | notfunny ) 'funny'>

  25. Entity Declarations • Allow a name to be associated with some other content • Internal entities associate a name with a string of literal text (e.g. &lt;) • External entities associate a name with the content of another file • Parameter entities enable text replacement within the DTD

  26. Adding a DTD to an XML File • Inline • External • <?xml version="1.0"?> • <!DOCTYPE sweepjoke SYSTEM “sweep.dtd">

  27. Links in XML • HTML anchors are a very limited form of hypertext • XML introduces • XPointers • XLinks • These standards are outside the scope of the XML standard

  28. Presentation Issues • Use of a stylesheet is implicit • Possible standards: • DSSSL Document Style and Semantics Specification Language (ISO 10179) • CSS Cascading Stylesheet Specification • XSL Extensible Style Language (uses XML syntax)

  29. XSL • XSL is an XML sylesheet language • XSLT is a language for transforming XML documents • XSL formatting objects specify formatting semantics • A set of rules to transform a document • XML can be transformed into HTML

  30. XML Application Areas • Mediation between heterogeneous databases on the Web • Client centric web applications • Applications requiring different views of the same data • Information discovery tailored to the needs of differing individuals

  31. Languages based on XML • MathML • SMIL • RDF • XHTML • CML

  32. RDF • Resource Description Framework • Integrates a variety of web-based metadata activities • Provides interoperability between applications that exchange metadata • Allows machine readable description of Web resources

  33. RDF Example <?xml version="1.0"?> <?xml:namespace ns = "http://www.w3.org/RDF/RDF/" prefix ="RDF" ?> <?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?> <RDF:RDF> <RDF:Description RDF: HREF = "http://uri-of-Document-1"> <DC:Creator>John Smith</DC:Creator> </RDF:Description> </RDF:RDF>

  34. XHTML • New Web languages are defined using XML • HTML 4.0 cannot be defined using XML • XHTML is XML compliant HTML

  35. Major Changes • Documents must be well-formed • Elements and attributes must have lower case names • End tags required in non-empty elements • Attribute values must be in quotes • Empty tags must be terminated • Scripts will be processed by XHTML

  36. XHTML Compatibility • Current browsers unlikely to understand all XHTML • E.g. <br/> may cause an error • Compatibility guidelines defined in XHTML standard • See http://www.w3.org/TR/xhtml1/ Appendix C

  37. Summary • XML significantly expands what is possible on the Web • XML preserves the basic Web ideas • Using XML is an order of magnitude more difficult than writing HTML • Software is out there and more will soon follow • The opportunities are endless!

More Related