1 / 33

ITR3 lecture 2: XML

This lecture covers the concept of Uniform Resource Identifiers (URIs) and their importance in identifying abstract or physical resources. It also introduces XML, its benefits, and its relationship with HTML and SGML.

abbiej
Download Presentation

ITR3 lecture 2: XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITR3 lecture 2: XML Thomas Krichel 2002-10-16

  2. Structure • URIs (we will come back to them in lecture 3) • XML • Sofix xml example

  3. Literature • Castro, Elizabeth (2001) “XML for the World Wide Web” Peachpit Press • RFC 2396 • http://openlib.org/home/krichel/lis900gp02i

  4. Uniform Resource Identifiers URI • A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource. • They provide a simple and extensible means for identifying a resource.

  5. Universal concept of “resource” • A resource can be anything that has identity. Not all resources are network ``retrievable''. • The resource identifier identifies a resource, not necessarily the state in which the resource is in at a particular point in time.

  6. Benefits of uniformity • it allows different type of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ • it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers

  7. Benefits of extensibility • allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are used • it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.

  8. transcribability The URI syntax was designed with global transcribability as one of its main concerns. • A URI is a sequence of characters, not a sequence of bytes • A URI may be transcribed from a non-network source, and thus should consist of characters that are most likely to be able to be typed into a computer • A URI often needs to be remembered by people, and it is easier for people to remember a URI when it consists of meaningful components. Therefore it has a restricted set of characters, only US ASCII.

  9. XML • Stands for eXtensible Markup Language • It is a recommendation by the World Wide Web Consortium (W3C). It is a new (1998) markup language that will transport a lot of contents over the Internet in the future. • As its level of complexity goes it sits in between HTML and SGML.

  10. Importance of XML • XML will be, for the information industry, what the container is for international shipping. • A uniform syntactic convention for the encoding of any piece of information expressed as textual data (i.e. as characters) • Default character set is the UTF-8 encoding of Unicode.

  11. HTML and XML • HTML comes with predefined tags such as HTML, HEAD, TITLE, BODY, H1, H2, P, UL, LI, IMG, A, EM, B etc • XML allows to use any tags. • XML has not yet replaced HTML. It lacks native support for images and links.

  12. XML and SGML • SGML is the standard general markup language developed by an industry consortium • Very complicated, to extent that there is no full implementation software ever written • XML specs written by SGML aficionados who were aware of its problems

  13. Original design goals • XML shall be straightforwardly usable over the Internet. • XML shall support a wide variety of applications. • XML shall be compatible with SGML. • It shall be easy to write programs which process XML documents. • The number of optional features in XML is to be kept to the absolute minimum, ideally zero. • XML documents should be human-legible and reasonably clear. • The XML design should be prepared quickly. • The design of XML shall be formal and concise. • XML documents shall be easy to create. • Terseness in XML markup is of minimal importance

  14. Well-formed & valid XML • Every piece of data that wants be be xml has to obey a set of rules. Otherwise it is just not XML • These rules ensure that the document is “well-formed”. • In addition, the XML document may obey to other rules, in that case it is called “valid”.

  15. XML element • Syntax <name>contents</name> • Where name is the name of the element and contents is the contents of the element. • <name> is called the opening tag • </name> is called the closing tag • Examples • <sex>F</sex> • <story>Once upon a time there was…. </story> • Element names are case-sensitive. They must start with a letter or “_”. • Element names must not start with “xml” in any capitalization.

  16. Attributes to XML elements • Are name/value pairs that further qualify element contents • Syntax <name attribute_name=“attribute_value”> contents</name> • Example • <temperature unit=“F”>64</temperature> • <swearword language=“fr”>con</swearword> • Attribute names have to obey the same rules as element names. • Attribute values must be surrounded by single or double quotes.

  17. Empty elements • Elements that are empty may be written as <name/>. This is a shorthand for <name></name>. • Empty names may have attributes. • Example: • <grade value=‘A’/>

  18. Processing instructions • They are instructions to the software reading the XML. • General syntax is <?nameattribute_name1=“attribute_value1” attribute_name2=“attribute_value2” …?>

  19. comments • Start with <!-- • End with --> • May not contain a double hyphen • Comments may not be nested i.e. no comments inside other comments.

  20. Nesting elements • Elements are allowed to contain other elements. • Elements that contain other elements are called parent elements. • Elements that are contained in another element are children of that element. • Elements must be properly nested, i.e. child element closing tag must appear before parent element closing tag.

  21. Root and prolog • There must be one root element that contains all other element is the document. • The prolog is what appears before the root element. • The prolog may contain the XML declaration.

  22. XML declaration • The XML declaration is a special case of a processing instruction, it is written as <?xml version=“1.0”?> • If the XML declaration is there, it must be the first line. • You can declare your character set in the XML declaration, like <?xml version=“1.0” encoding=“ucs-2”?>

  23. Quote special symbols • & is written as &amp; • < is written as &lt; • > is written as &gt; • “ is written as &quot; • ‘ is written as &apos; • Example <story content=“she pronounced the &quot;l-word&quot;”/>

  24. Document Type Definition DTD • DTDs are a legacy SGML tool to further define and refine the contents of an XML document. XML can be defined by an SGML • Still in use by the technologically retarded. • Not covered here, because there are more powerful replacements.

  25. Example application: sofix • Sofix is an XML based cataloging format for classical music CDs. • It is named after Sophie C. Rigny. • It is a creation of Thomas Krichel. • Used for teaching purposes only.

  26. Key concepts in Sofix • Item: an individual CD or a collection of CDs kept physically together (i.e. sold together) • Work: a piece of music as recorded on a CD. For simplicity, we do not distinguish between composition and recording of that composition. • Track: semantics associated with physical separation of tracks on the disk

  27. Sofix in XML <item> <work> <track> </track> </work> <item>

  28. Sofix general rules • Record all titles in English. If no English title provided, use a translation if it is obvious. If the translation is not obvious, use original language. • All personal names as Lastname, Firstname • Translatable names in English.

  29. Contents of <item> <labelname>nameof label</labelname> <number>number of the CD</number> (followed by the works on the CD)

  30. Contents of <work> • <title>title of the work</title> • <compositionyear> year when work was composed</compositionyear> • <recordingyear> year when the recording was made </recordingyear> • <contributor role=“contributor role”> name of contributor </contributor> • Possibly many contributor, followed by a series of tracks

  31. Contributor roles alto, alto_sax, bariton, bass, bassoon, chamber orchestra, cello, choir, choir_master, clarinett,composer, conductor, flute, french_horn, horn, oboe, orchestra, organ, piano, piano_trio, prepared_piano, recorder, soprano, speaker, string_orchestra, string_quartett, viola, violin, xylophone

  32. Attributes of <track> • <title> full title as given on CD</title> • <time> minutes:seconds</time> where minutes and seconds are numbers.

  33. http://openlib.org/home/krichel Thank you for your attention!

More Related