650 likes | 784 Views
XML. Craig Stewart Dr. Alexandra I. Cristea ( http://www.dcs.warwick.ac.uk/~acristea/ ). XML history. Inception: circa 1996 The eXtensible Markup Language (XML) v1.0 became a W3C Recommendation 10. February 1998. Currently v1.0 is in it’s fifth version v1.1 published 2004
 
                
                E N D
XML Craig Stewart Dr. Alexandra I. Cristea (http://www.dcs.warwick.ac.uk/~acristea/)
XML history • Inception: circa 1996 • The eXtensible Markup Language (XML) v1.0 became a W3C Recommendation 10. February 1998. • Currently v1.0 is in it’s fifth version • v1.1 published 2004 • End of line issues • > Unicode v2.0 character sets • Other non-Unicode special characters
What is XML? • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML is more of a standard and supporting structure than a standalone programming language – wrong!
How does XML work? • XML tags are not predefined. You must define your own tags • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data • Also Relax NG (ISO DSDL) • XML with a DTD or XML Schema is designed to be self-descriptive
Main Difference XML, HTML • XML was designed to carry data. • XML is not a replacement for HTML.XML and HTML were designed with different goals: • XML was designed to describe data and to focus on what data is. • HTML was designed to display data and to focus on how data looks. • HTML is about displaying information, while XML is about describing information. • Syntax: XML is well formed, just like XHTML
XML does not DO anything • XML was created to structure, store and to send information <note> <to>John</to> <from>Jane</from><heading>Reminder</heading><body>Don't forget the book!</body> </note>
XML is Free and Extensible • XML tags are not predefined. You must "invent" your own tags. • The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). • XHTML is an application of XML but not vice-versa.
Benefits XML • extensibility and structured nature of XML allows it to be used for communication between different systems • from one source of XML-based information you can format and distribute it via a multitude of different channels • XSL files act as templates, allowing a single stylesheet to be used to format multiple pages or the same content for multiple distribution channels
XML is a Complement to HTML • XML is not a replacement for HTML. • In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. • XML is a cross-platform, software and hardware independent tool for transmitting information.
XML in Future Web Development • XML is going to be everywhere. • the XML standard has been developed quickly and a large number of software vendors have adopted it. • XML might be the most common tool for all data manipulation and data transmission.
XML Can be Used to Create New Languages • XML is the mother of WAP and WML. • The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML. • And many others …
Viewing XML • to view XML documents hierarchically or view their output, you need an XML parser and processor. • there are a number of these tools available: • See examples at: • http://www.stylusstudio.com/xml_download.html • http://www.w3schools.com/xml/xml_parser.asp • Please note, however: XML was not designed to display data.
XML Rules • Every start-tag must have a matching end-tag. • Tags cannot overlap. Proper nesting is required. • XML documents can only have one root element. • Element names must obey the following XML naming conventions: • Names must start with letters or the "_" character. Names cannot start with numbers or punctuation characters. • After the first character, numbers and punctuation characters are allowed.
XML Rules (cont.) • Names cannot contain spaces. • Names should not contain the ":" character as it is a "reserved" character. • Names cannot start with the letters "xml" in any combination of case. • The element name must come directly after the "<" without any spaces between them. • XML is case sensitive. • XML preserves white space within text. • Elements may contain attributes. If an attribute is present, it must have a value, even if it is an empty string "".
Spot the error! <?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002> <to>Tove</to> <from>Jani</from> </note>
Spot the error! <?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note>
With XML, CR / LF is converted to LF • Windows: CR + LF • Unix: LF • Macintosh: CR
There is Nothing Special About XML • plain text with XML tags • Software that can handle plain text can also handle XML. • In an XML-aware application, the XML tags can be handled specially: • Visibility, • Functional meaning, etc.
Is this an error? <note> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> <heading>Reminder</heading>
XML Elements have Relationships • Elements are related as parents and children. • Root element / Parents • Children / Siblings
Elements • An element consists of all the information from the beginning of a start-tag to the end of an end-tag including everything in between. • E.g. from (X)HTML, all of the following would be the equivalent of one element, named h1: <h1>This is a heading.</h1> • Where, <h1> is the start tag, </h1> is the end tag, and the content is in between. • Each XML document has a root element within which all other elements are nested.
Examples • See at: • http://www.intranetjournal.com/articles/200402/ij_02_10_04a.html • http://prolearn.dcs.warwick.ac.uk/caf/gipfBegIntAdv.xml • http://www.intranetjournal.com/articles/200402/ij_02_10_04a.html • http://prolearn.dcs.warwick.ac.uk/caf/gipfBegIntAdv.xml • Search more by yourself and familiarize yourself with the syntax!
XML Attributes • XML elements can have attributes. • From HTML you will remember this: <IMG SRC="computer.gif"> • The SRC attribute provides additional information about the IMG element.
Attributes versus Elements • <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> • <person><sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname></person>
Comments • same as in any other languages with line(s) of code whose sole purpose is to provide the developer, and anyone reading the code in the future, information about the code. <!-- all the comments go in here -->
XML declaration • Every XML document begins with a declaration (not mandatory, but good practice) <?xml version=“1.0”?> • Or, using optional attributes:<?xml version=“1.0” encoding=“UTF-16” standalone=“yes”?>
Document Type Definition (DTD) • which tags and attributes are allowed, • where they can be placed, • whether or not they can be nested within a given document and • what additional entity definitions are required
Document Type Declaration (DOCTYPE) • <!DOCTYPE MovieCatalog SYSTEM "movie_catalog.dtd"> URL to DTD (external subset via a system identifier) Root document
Internal vs External DTD declaration Internal: <!DOCTYPE foo [ <!ENTITY greeting "hello"> ]> External, public: <!DOCTYPE html PUBLIC "//W3C//DTD HTML 4.01//EN” >
Valid XML Documents • A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "InternalNote.dtd"> <note> <to>Tom</to> <from>Jane</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Validator • syntax-check any XML file • Also at: http://www.validome.org/xml/validate/ http://www.w3.org/2001/03/webdata/xsv
Internal DTD <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY cs “Craig Stewart”> ]> <note> <to>Tove</to> <from>&cs;</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
External DTD <?xml version="1.0"?> <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY cs “Craig Stewart”> >> saved as file note.dtd
Character Entities • What are they? • How would you write an XML element called ‘summary’ for the following data: • The result is <17% of the original • <summary> The result is <17% of the original</summary> • ??
Character Entities • Character entities are a way to solve this problem and get around the limitations of computer character sets (old ones) and keyboards. • < < • > > • ' ‘ • " “ • & & • Are all standard XML entities and can be used without fear of compatibility issues.
Numeric Character Reference • “A numeric character reference (NCR) is a common markup construct used in SGML and other SGML-based markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set (UCS) of Unicode” • Wikipedia • Eg: • Σ Σ Σ Σ • All represent "Σ"
Defined Entities in DTDs • Three types: • Internal <!ENTITY cs “Craig Stewart”> • External <!ENTITY mypicture SYSTEM "pic01.gif" GIF> • Parameter • For parameterizing the DTD • Start with a % not a & • Entirely different to other entities http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
XML Schema (XSD) • XML Schema is an XML based alternative to DTD. • W3C supports an alternative to DTD called XML Schema: http://www.w3.org/XML/Schema
Displaying your XML Files with CSS? • It is possible to use CSS to format an XML document. • Example: • XML file: The CD catalog • style sheet: The CSS file • product: The CD catalog formatted with the CSS file • Below is a fraction of the XML file. The second line, <?xml-stylesheet type="text/css" href="cd_catalog.css"?>, links the XML file to the CSS file
Displaying XML with XSL • XSL is the preferred style sheet language of XML. • XSL (the eXtensible Stylesheet Language) is far more sophisticated than CSS. • examples: • View the XML file, the XSL style sheet, and View the result. <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href=“simple.xsl"?>
XML Conclusions • We have learned: • XML history • What it is • How it works • Differences to (X)HTML • XML flow • XML Rules • XML Elements, Relationships, Attributes, Comments • Well-formed-ness concept • XML supporting frame: XML Schema or DTD • Generics on displaying XML
Next we are looking into more specific information about how to display XML and more, with …
XSL • XSL is an XML-based language used for stylesheets that can be used to transform XML documents into other document types and formats. • XSL is a family of recommendations for defining XML document transformation and presentation. • It consists of three parts.
XSL parts: • XSL Transformations (XSLT) • a language for transforming XML • XML Path Language (XPath) • an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification) • XSL Formatting Objects (XSL-FO) • an XML vocabulary for specifying formatting semantics
Conclusion XSL • We have learned: • What is XSL • What are its parts • Next: • Not all parts are equally important • We look at the most important one …
XSLT • XSLT became a W3C Recommendation 16. November 1999. • most important part of XSL • transforms input document (source tree) into a particular way in a specified output document (result tree). • built on a structure known as an XSL template: <xsl:template> • e.g., <xsl:template match=“/movie/title"> <xsl:value-of select="."/></xsl:template> this selects one/all movies • Multiple templates: first the root template; if that doesn’t match, the next, etc. XPath
XSLT Browsers • nearly all major browsers support XML and XSLT. • Mozilla Firefox • v 1.0.2, Firefox has support for XML and XSLT (and CSS). • Mozilla • XML + CSS. Namespaces. Available with an XSLT implementation. • Netscape • v 8, uses the Mozilla engine. • Opera • v 9, XML, XSLT (and CSS). V 8 only XML + CSS. • Internet Explorer • v 6, XML, Namespaces, CSS, XSLT, and XPath. V 5 NOT ! compatible