1 / 36

Introduction to XML 1. The XML Language

Introduction to XML 1. The XML Language. Tim Brailsford. Markup Languages. The word “Markup” is derived from the printing industry Detailed stylistic instructions for typesetting Usually hand-written on the copy (eg underlining some text that is to be set in italics).

jonco
Download Presentation

Introduction to XML 1. The XML Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XML1. The XML Language Tim Brailsford

  2. Markup Languages • The word “Markup” is derived from the printing industry • Detailed stylistic instructions for typesetting • Usually hand-written on the copy (eg underlining some text that is to be set in italics). • Markup languages do the same job for computerised documentation systems. • Markup adds logical structure to a document, or indicates how it is to be laid out (on paper or screen). • Markup languages are a set of instructions that are amenable to automatic processing.

  3. Markup Languages (cont.) • Usually a sequence of characters in a text file that indicate structure or behaviour of the content. • For example (in HTML) • This is <B>bold</B> and this is <I>italic</I> • <TITLE>This is the title.</TITLE> • Markup may be created by directly editing the symbols, but is more usually hidden from end-users. • Examples • HTML • RTF • Hytime

  4. Generalised Markup Languages • Proprietary markup languages are problematic. • Generalised markup languages are langauges for defining markup languages. • Metalanguages • SGML

  5. SGML - History • Standard Generalised Markup Language • 1969 - GML from IBM • text editing • formatting • information retrieval • 1980 SGML first published • 1980’s SGML adopted by US IRS & DOD • 1986 - ISO standardISO 8879: Information processing--Text and office systems--Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986).

  6. SGML • SGML defines a system of tag markup <TAG>This is a pair of SGML tags</TAG> • SGML is a standard for how to specify a tag set. • Document Type Definition (DTD) • SGML documents contain structural elements that can be described without consideration of how they are displayed. • SGML application. • HTML is an SGML application.

  7. Benefits of SGML • Documents are created by thinking in terms of structure rather than appearance (which may change over time). • Documents are portable because any SGML compliant software can interpret them by reference to the DTD. • Documents originally intended for one medium can easily be re-purposed for other media, such as the computer display screen.

  8. What is XML? • XML is based upon SGML, but is substantially simplified for use on the WWW. • Like SGML, XML is a metalanguage • arbitrary definition of elements • <TITLE> <PARAGRAPH> <ChapterHeading> <PRICE><PARTNUMBER> <MANUFACTUER> <ExamGrade> • Syntax may optionally be described by a DTD • Valid documents - have a DTD • Well formed documents do not have a DTD • Style and content are completely separate • XML documents contain content • Style is specified by stylesheets

  9. Example XML Applications • MathML - maths • CML - chemistry • SVG - vector graphics • XHTML - WWW • SMIL - synchronised multimedia • MusicML - sheet music • FpML - financial products • RETML - real estate transactions • and many, many others

  10. XML Elements • XML documents consist of one or more elements. • Elements consist of a pair of tags and (optionally) enclosed text.<TITLE>The XML Companion</TITLE> • Elements may have attributes.<TITLE type=“book”>The XML Companion</TITLE> • Elements may contain other elements.<REFERENCE> <TITLE type=“book”>The XML Companion</TITLE></REFERENCE> • Empty elements may be self closing.<PICTURE src=“mypic.jpg”> </PICTURE><PICTURE src=“mypic.jpg” />

  11. Contents vs Style • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.

  12. Contents vs Style In HTML <P>The <I>scientific</I> name of the domestic dog is <I>Canis familiaris</I>, and of the domestic cat is <I>Felis catus.</I></P> NB there is no distinction between scientific names and emphasis. • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.

  13. Contents vs Style In XML <P>The <emph>scientific</emph> name of the domestic dog is <sci>Canis familiaris</sci>, and of the domestic cat is <sci>Felis catus.</sci></P> NB emphasis and scientific names are different tags. They may both be displayed as italic, but they can be treated separately. • XML tags contain meaning not appearance. • This allows extra information to be extracted • Consider the example of the scientific names of animals. • scientific names are in latin • by convention they are always printed in italics The scientific name of the domestic dog is Canis familiaris, and of the domestic cat is Felis catus.

  14. Rendering of XML • XML files contain content not appearance • Stylesheets contain appearance and behaviour • XML data is rendered by being transformed into some form suitable for display • RTF (for simple printing) • PDF or PostScript (for printing or display) • HTML (for display over the web) • HTML 4.0 / DHTML (for complex interfaces) • The transformation is defined by a stylesheet • Rendering may be done by standalone software, or by a web browser, or on a web server.

  15. Standalone Rendering XML HTML XSL

  16. Client Side Rendering XML XML HTML XSL XSL Server (any) Browser with XSL engine (eg MS IE > 5.0)

  17. Server Side Rendering XML HTML HTML XSL Server with XSL engine eg Apache/Tomcat/Cocoon Browser (any)

  18. Client vs Server Stylesheets • Client side stylesheets are processed in client • XML is delivered to the client • XSL/CSS must be supported by client • MS IE supports CSS & XSLT (non-standard in 5.x mostly standard in 6.x) • Netscape 7 & Mozilla supports CSS and possibly XSLT via plugins. • Server side stylesheets are processed in server • XML is not delivered to the client, it is transformed usually to HTML or PDF • XSL/CSS must be supported by server • Cocoon is an Open Source project, implementing XSL as a Java servlet • Any browser can then be used

  19. Cocoon on Nottingham Servers • Any file placed in a directory called public_html is accessible within the Nottingham networkwith the url:http://www.cs.nott.ac.uk/~username/filename • Files with the .xml extension are automatically processed by cocoon. • Providing that they have an XSL stylesheet and the correct Cocoon processing instructions they will be “transformed” into (usually) HTML.

  20. A Simple XML Document <?xml version="1.0" ?> <booklist title="Some XML Books"> </booklist>

  21. A Simple XML Document <?xml version="1.0" ?> <booklist title="Some XML Books"> </booklist> XML declaration Root element (one per document)

  22. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <booklist title="Some XML Books"> </booklist>

  23. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <booklist title="Some XML Books"> </booklist> Define root element and specify DTD.

  24. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <booklist title="Some XML Books"> </booklist>

  25. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <booklist title="Some XML Books"> </booklist> This is a comment (as SGML / HTML)

  26. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?> <booklist title="Some XML Books"> </booklist>

  27. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href=”iti-xml2.xsl"?> <booklist title="Some XML Books"> </booklist> This defines the XSL stylesheet

  28. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?> <booklist title="Some XML Books"> </booklist>

  29. A Simple XML Document <?xml version="1.0" ?> <!DOCTYPE booklist SYSTEM "books.dtd" > <!-- This is a comment --> <?xml-stylesheet type="text/xsl" href="books3.xsl"?> <?cocoon-process type="xslt"?> <booklist title="Some XML Books"> </booklist> This is a Cocoon processing directive (NB not standard XML, but required by Cocoon 1.7.4).

  30. Adding Content <booklist title="Some XML Books"> <book> <author> <name>St. Laurent</name> <initial>S</initial> </author> <date>1998</date> <title edition="Second">XML: A Primer</title> <publisher>MIS Press</publisher> <website href="http://www.simonstl.com/xmlprim/" /> <rating stars="4"/> </book> </booklist>

  31. Benefits of a DTD • DTDs are optional in XML • DTD allows validation of documents • DTD defines the application • Vital for collaborative development • IPR implications • DTD allows entity definitions (ie symbols, shortcuts, “foreign” characters etc.).

  32. XML Namespaces • Namespaces are mechanisms to ensure that elements are unique • Namespaces in XML are optional • Consider the following: <title>The Title</title> <title text=“The Title” /> <title> <text>The Title</text> </title>

  33. Ensuring uniqueness • Unique element names • Unique attribute content <title-one>The Title</titleone> <title-two text=“The Title” /> <title-three> <text>The Title</text> </title-three> <title ns=“one”>The Title</title> <title ns=“two” text=“The Title” /> <title ns=“3”> <text>The Title</text> </title>

  34. xmlns attribute • The xmlns attribute is used to declare namespaces • This must be a URI <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-one”> The Title </title> <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” text=“The Title” /> <title xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-three”> <text>The Title</text> </title>

  35. Namespace Abbreviations • If an element doesn’t have a namespace defined it inherits that of its parent. • Where multiple namespaces are used together aliases may be declared <demo xmlns=“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <title text=“The Title” /> </demo> <demo xmlns:first =“http://www.cs.nott.ac.uk/~tjb/NSdemo-one” xmlns:second =“http://www.cs.nott.ac.uk/~tjb/NSdemo-two” > <first:title>The Title</first:title> <second:title text=“The Title” /> </demo>

  36. XML Namespaces • Namespaces in XML are optional • Namespaces ensure that elements are unique • In different contexts a given tag might mean different things - eg consider <BOOK> • To me it might mean a book in a bibliography • To a bookshop it might contain stock details • To a travel agent it might contain information about flight bookings! • Namespaces attach unique labels to a given tag set. • URLs are usually used as namespace labels.

More Related