1 / 26

XML: It’s a Good Thing

XML: It’s a Good Thing. Richard N. Taylor & Eric M. Dashofy ICS 123 S2002. Motivation. “I'll never go hungry again!” – Scarlett O’Hara “I’ll never write a parser again!” – Anonymous XML User Data encoding is a perpetual problem in computer applications

reese
Download Presentation

XML: It’s a Good Thing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML: It’s a Good Thing Richard N. Taylor & Eric M. Dashofy ICS 123 S2002

  2. Motivation • “I'll never go hungry again!” –Scarlett O’Hara • “I’ll never write a parser again!” – Anonymous XML User • Data encoding is a perpetual problem in computer applications • Lots of time is wasted writing parsers, lexers, marshalers, unmarshalers, data bindings, even meta-languages!

  3. Existing Problems File Exchange App1 File Format 1 3rd Party Converter Import Converter App3 App2 Export Converter File Format 3 File Format 2

  4. Why is this a problem? • Everybody has a proprietary format • Converters must be maintained by various parties • This is an n2 problem! • Something is usually lost in the translation • Note: Same problems with data exchange across networked apps

  5. In-memory Representation Another Problem Defining a File or Data Format Helps to generate Data Bindings edits Parser Serializer Helps to generate Disk Meta- Language Net

  6. Why is this a problem? • Parsers, serializers, data bindings all have to be developed • This development takes time • Conflicting tools for assistance • How do you evolve the file format?

  7. Potential Solution • To too many file formats: • Intermediate format • Even better: Common format • An agreed-upon meta-language • Ability to extend language and ignore unknown constructs • To tool-building: • Choose a suitable meta-language • Build tools surrounding that meta-language • Port those tools to different environments, but keep the APIs semi-standard

  8. What is XML • Stolen from xml-computing.com: • eXtensible Markup Language • A way to represent structured data • a World Wide Web Consortium (W3C) standard • platform-independent • a way to create your own custom languages • license-free and well-supported • the future of computing? • Buzzword-compliant!

  9. Origins of XML • From SGML • Standard Generalized Markup Language • cf. HTML • A document markup language • For annotating documents with metadata to make them easier to interpret Hi! My name is <NAME><FIRST>Eric</FIRST> <LAST>Dashofy</LAST></NAME>. You can email me at <EMAIL>edashofy@ics.uci.edu</EMAIL>.

  10. The Times, They are a Changin’ • XML is arguably more useful to simply encode data, outside the strict context of a document <PERSON> <NAME> <FIRST>Eric</FIRST> <LAST>Dashofy</LAST> <DEPARTMENT>Information and Computer Science</DEPARTMENT> <EMAIL>edashofy@ics.uci.edu</EMAIL> </NAME> </PERSON>

  11. Terminology • Tag • The markup of the document, enclosed in angle-brackets. • <foo> is the start tag • </foo> is the end tag • Tags may be nested, but may not cross • <A>foo<B>bar</B>baz</A> --OK! • <A>foo<B>bar</A>baz</B> --NO! • Hierarchical data structure

  12. Terminology • Element • Stuff in between a start and end tag • Includes the tags • May contain nested elements • Ex: • <a>foo</a> • <a>foo<b>bar</b></a> • (nested)

  13. Terminology • Attribute • A way of annotating tags with additional info • Simple name-value pairs • Ex: • <name lang=“English”>Henry</name> • <name lang=“Spanish”>Enrique</name>

  14. Document • A collection of elements, usually in a file • One top-level element • Called the “root” element or “document” element • Some header stuff <?xml version="1.0"?> <person> <name> <first>Eric</first> <last>Dashofy</last> </name> <department>Information and Computer Science</department> <email>edashofy@ics.uci.edu</email></person>

  15. Side-note: • “If you don’t understand it, ignore it.”

  16. Kinds of Documents • “Well Formed” • Syntactically correct • All the start tags have end tags • All the start-quotes have end-quotes • etc. • “Valid” • Well-formed, and conforms to some language specification

  17. Why a meta-language? • To define what elements, sub-elements, attributes are allowed • And in what order • So different organizations can agree on a real data format • Well-formed documents don’t restrict how you encode the data, so they’re not very valuable

  18. DTDs • Document Type Definition • Part of XML 1.0 • The original XML meta-language • Doesn’t look like XML • Like production rules <!DOCTYPE FooDocument [ <!ELEMENT Foo (Bar*,Baz?,Booyah+)> <!ELEMENT Bar (#PCDATA)> <!ELEMENT Baz (#PCDATA)> <!ELEMENT Booyah (#PCDATA)> ]>

  19. Namespaces • “You keep on using that word, I do not think it means what you think it means.” –Inigo Montoya • How can you make a document that draws elements from multiple DTDs? <usa:address xmlns:usa=“http://www.dtds.com/usaddress.dtd”> <usa:street>1600 Pennsylvania Ave</usa:street> <usa:city>Washington</usa:city> <usa:state>DC</usa:state> <usa:zip>20509</usa:zip></usa:address> <uk:address xmlns:uk=“http://www.dtds.com/ukaddress.dtd”> <uk:street>23B Baker Street</uk:street> <uk:city>London, England</uk:street> <uk:postcode>N22</uk:postcode></uk:address>

  20. Why not DTDs? • “Uhm, DTDs are bad, mmkay?” –Mr. Mackey • DTDs are lacking in some areas • Don’t look like XML • Can’t specify at a level below elements • i.e. can’t specify regular expressions on content • Difficult to extend/add things to existing element definitions • Difficult to implement modular languages

  21. XML Schemas • A DTD replacement from W3C • Look like XML / Easier to read • Contribute a type system to XML • Element, attribute definitions become types • Single-inheritance model in the type system • Better namespace management

  22. Example <complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </sequence></complexType> <complexType name="USAddress"> <complexContent> <extension base="Address"> <sequence> <element name="state" type="USState"/> <element name="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType>

  23. Example, cont. <complexType name="UKAddress"> <complexContent> <extension base="Address"> <sequence> <element name="postcode" type="UKPostcode"/> </sequence> <attribute name="exportCode" type="positiveInteger" fixed="1"/> </extension> </complexContent> </complexType>

  24. What do you get? • Lots of tools for free • Parsers • DOM and SAX • Serializers • Transformation • XSL(T) • A meta-language (two, actually ) • Data Bindings • Syntax-directed editors

  25. In-memory Representation Spotlight: DOM & SAX • APIs for accessing XML documents • SAX: Lightweight, callback based • “I saw an element! Ooh, I saw an attribute!” • DOM: Parses entire document into an object tree in memory DOM Parser XML Document

  26. Spotlight: Data Bindings • DOM API is very, very generic • Example functions: • appendChild(Element n) • setAttribute(String name, String value) • No namespace management • Data bindings are APIs guided by the language definition • Example functions: • addComponent(Component c); • setIdentifier(String id); • Data bindings can be generated automatically

More Related