330 likes | 454 Views
XML Lecture 1. XML Motivation & Syntax Monica Farrow email : M.Farrow@hw.ac.uk. XML Topics. This lecture Motivation Storing XML Programming and XML Syntax Describing the document DTD, XML Schema Accessing the elements using XPath Transforming XML using XSLT. XML in One Slide.
E N D
XML Lecture 1 XML Motivation & Syntax Monica Farrow email : M.Farrow@hw.ac.uk
XML Topics • This lecture • Motivation • Storing XML • Programming and XML • Syntax • Describing the document • DTD, XML Schema • Accessing the elements using XPath • Transforming XML using XSLT XML - Motivation & Syntax
XML in One Slide • Basically, XML is an annotated text file. The data (an element) is surrounded by descriptive start and end tags. Elements can have attributes listed in the start tag. • Example: <person> <name id = “42”> Lisa Simpson</name> <tel> 0131-828-1234 </tel> <tel> 078-4701-7775 </tel> <email> lisa@macs.hw.ac.uk</email> </person> XML - Motivation & Syntax
Motivation • XML allows us to create machine-readable text files. • In the file with Lisa’s data, without XML tags, how can we easily specify a semi-structured format? E.g. • Compulsory name • Between 0 and 4 telephone numbers • Optional email • Using XML, the data is labelled with tags, so can be easily identified. • The next few slides show some uses of XML: XML - Motivation & Syntax
Application data • Applications can use XML to store, transmit, and display data. • E.g. To keep track of the updates which have been downloaded • Version number, file names, installation time etc • E.g. To specify start-up settings or parameters • These can be very extensive, can be generated by ‘wizards’ and modified by humans • E.g. To send data between the server and the client during web applications (jquery and javascript) • More about this later XML - Motivation & Syntax
Web services • “A Web service is a software function provided at a network address over the web or the cloud, it is a service that is "always on” ”(wikipedia) • It’s not used through a GUI by a person • A software developer could use a web service within an application. • They use XML to tag the data. • Protocols based on XML are used to: • Transfer the data (SOAP) • Describe the service (WSDL) • List available services (UDDI) XML - Motivation & Syntax
Web services - SOAP • SOAP Simple Object Access Protocol • For exchanging data between any web applications <?xml version="1.0" encoding="UTF-8"?> <soap:Envelopexmlns:soap= "http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> SOAP Example </soap:Header> <soap:Body> <desks:NumberInStock> 200 </desks:NumberInStock> </soap:Body> </soap:Envelope> XML - Motivation & Syntax
Write Once Use Everywhere • Separation of content from presentation • “Write once read anywhere” • The same document can be transformed using XSL (eXtensible stylesheet language) into different formats XML document XSL XSL XSL XHTML (browser for mobile) TEXT (Excel) XHTML (web browser on PC) XML - Motivation & Syntax
Some existing XML-based languages • XHTML • XML compatible version of HTML • DocBook • For any documentation. Tags such as title, chapter, para etc • ODF (OpenDocument Format) • For office documents such as word processing or spreadsheets . Used by OpenOffice. • MathXML • To describe mathematical formulae XML - Motivation & Syntax
XML data file Storage – 3 options • As a text file – simple – used in this course • In a ‘native’ XML database (NXD) • Designed especially for XML, holds a collection of XML documents • Many different ones on the market – non standard • Extract data with XPath, XSLT (introduced in 3rd XML lecture) or the XML query language FLOWR (not covered in course) • Using a relational DBMS (now SQL has XML functions too) • EITHER store the XML document as the value of some field within a row • OR store the XML in a shredded form across a number of fields and tables XML In and Out
XML and Programming • To read an XML document in a programming language, the processing steps are: • Reading the raw data as a stream of characters • Parsing the raw data • Recognising tags, content, attribute pairs • Passing the result to a client class or function for application specific processing • Many programming languages have a library of functions using Document Object Model [DOM], a tree-based interface • The programmer can navigate up and down the tree. • Details not covered in the course XML In and Out
XML Syntax
XML Overview • XML is a ‘human-legible’ simplified subset of the Standardised General Markup Language, on which HTML is also based • Data is divided into elements and attributes. Each element is surrounded by a start tag and an end tag. The end tag resembles the start tag but includes a backslash before the tagname. • <tel>0131–444 7777</tel> • Tagnames are chosen to reflect the meaning of the element content • (In html, tagnames are chosen to indicate page structure) SGML XML HTML XML - Motivation & Syntax
element, Contains text Elements • The segment of an XML document between an opening and a corresponding closing tag is called an element • Elements may contain text or other elements Element contains other elements <person> <name>Bart Simpson</name> <tel>0131–444 7777</tel> <tel>078–4011 6022</tel> <email>bart@ed.ac.uk</email> </person> Can be >1 element with the same tagname XML - Motivation & Syntax
person name tel tel email XML Document is a Tree Bart Simpson 0131-444 7777 078–4011 6022 bart@ed.ac.uk • XML documents are abstractly modeled as trees, as reflected by their nesting • Sometimes, XML documents are graphs (by using IDs and IDREFs to link elements) XML - Motivation & Syntax
Elements Can Be Nested <addresses> <person> <name>Donald Duck</name> <tel>0131-8281345</tel> <tel>0131-8281374</tel> <email> donald@macs.hw.ac.uk </email> </person> <person> <name> Mickey Mouse</name> <tel> 0141-4261142 </tel> </person> </addresses> XML - Motivation & Syntax
Semi-structured data • XML is ideal for semi-structured data • If an extra telephone number, add it in • If no email at all, leave it out • No need for empty fields or multiple tables. • In a corresponding database for up to 4 telephone numbers, the database design would include spaces for 4 numbers, or a separate phone number table. XML - Motivation & Syntax
Attributes • An opening tag may contain attributes • These are typically used to describe the contents of an element <entry> <wordlanguage = “en”>cheese</word> <wordlanguage = “fr”>fromage</word> <wordlanguage = “ro”>branza</word> <meaning>A food made …</meaning> </entry> XML - Motivation & Syntax
When to Use Attributes • It’s not always clear when to useattributes, • How should ssno (social security number, american) be stored? <person ssno= “123 4589”> <person> <name>L. Simpson </name> <ssno> 123 4567</ssno> <email> <name> L. Simpson</name> lisa@macs.hw.ac.uk <email> </email> lisa@macs.hw.ac.uk ... </email> </person> ... </person> XML - Motivation & Syntax
When to Use Attributes • Using an attribute rather than elements might make the structure more difficult to alter in the future. In attributes: • Multiple values are not permitted • Tree structures are not permitted • General rule – avoid using attributes unless there is a good reason for using them • Use an attribute to describe how the data should be interpreted (e.g. language, currency) • Use an attribute for “IDs”, i.e., identifying data (covered later) XML - Motivation & Syntax
A Complete XML Document <?xml version ="1.0" encoding="UTF-8" ?> <addresses> <person ssno = “113”> <name>Lisa Simpson</name> <tel> 0131-828 1234 </tel> <tel> 078-4701 7775 </tel> <email> lisa@macs.hw.ac.uk </email> </person> </addresses> Required XML - Motivation & Syntax
Empty element, and case • There is a special shortcut for tags that have only attributes, with no text or sub-elements in between them (empty element, bachelor tag) • <imgsrc=“myPic.jpg” /> instead of • <imgsrc=“myPic.jpg” > </img> • XML is case-sensitive, i.e., the following are different: <person>, <Person>, <PERSON> XML - Motivation & Syntax
Well Formed Documents • A document is well-formed if it has • One top-level element (root element) • Tags come in properly nested case-sensitive pairs • Empty elements may use the accepted shortcut / • Attribute values must be enclosed in quotes • Attribute names must not be repeated within a tag XML - Motivation & Syntax
Are these valid xml files? • <?xml version=“1.0”?> • <Question> Here is a question</Question> • <?xml version=“1.0”?> • <Question> Here is a question</Question> • <Answer> Here is an answer</Answer> XML - Motivation & Syntax
Why is this not well-formed? <?xml version ="1.0" encoding="UTF-8" ?> <person phone= 0131-828 1234 phone=078-4701 7775 > <Name> <first>Homer <second>Simpson </first></second> </name> <person phone= 0131-828 1235 > <Name> <first>Lisa <second>Simpson </first></second> </name> XML - Motivation & Syntax
XML Authoring • There are many authoring tools available to facilitate the creation of XML documents. • VisualStudio for Windows is in the lab • However, you may as well start off using a simple text editor (not Word) which allows access to line numbers, ideally XML aware • XML is after all just a text file. • E.g. Notepad++ for Windows • Most linux text editors are ok • You are then responsible for checking that the XML is correct! XML - Motivation & Syntax
Viewing and checking XML • If well formed XML is loaded into your browser it will be displayed as a tree structure • This is perhaps simplest way to check that XML is well formed XML - Motivation & Syntax
Viewing and checking XML • If incorrect XML is loaded into your browser then error messages will be displayed XML - Motivation & Syntax
Exercise 1 • An XML file holds information about holiday homes for rent. Write an example of such an XML file which containing 2 or 3 records. Invent appropriate element and attribute names. • Each home has an id, a name,a location and optional url • Additionally, each home has one or more sets of contact details. Contact details consist of a name and a phone number, and optionally an email address. • People do not own more than one holiday home. • In your example, demonstrate optional or repeated elements. • How would you hold this information in a relational database? XML - Motivation & Syntax
Referencing other elements • Unique elements (identified here by an attribute) can be referred to from other elements • In this way, relationships between elements can be shown without repetition • E.g. • Books and authors can be listed. But each book may have >1 author, each author might write >1 book. So the book can contain a reference to the author. See books.xml XML - Motivation & Syntax
Extract from books.xml <bookbookID = "222KK"year="2000"> ** an id <title>Data on the Web</title> <Author>4</Author> **** element references an id <Author>2</Author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> ..... <authorauthID = "4"> **** an id <firstName>Mary</firstName> <lastName>Thomson</lastName> <Book>222KK</Book> ** element references an id </author> Asterisks show links between the data (in the same file) XML - Motivation & Syntax
Exercise – 2 (using ids) • An XML file holds information about holiday homes for rent. Write an example of such an XML file which containing 2 or 3 records. Invent appropriate element and attribute names. Use books.xml as an example. • Each home has an id, a name, a location and optional url • Each contact has a name, phone and optional email address • Each person can own many homes • Each home can be owned by more than one person • How would you hold this information in a relational database? XML - Motivation & Syntax
Defining the structure of an XML file • We can check if an XML file is well-formed • by looking at it, maybe • By loading it into a browser • If well-formed, it will be displayed • However, how can we check that the well-formed file contains the correct elements in the correct quantities? E.g. • Musn’t contain tagnames that aren’t expected • Must contain tagnames that are expected • Must contain the correct number of tags with the same tagname • We need to write a specification for the XML file • See the next lecture XML - Motivation & Syntax