XML Grammars - PowerPoint PPT Presentation

xml grammars n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
XML Grammars PowerPoint Presentation
Download Presentation
XML Grammars

play fullscreen
1 / 105
XML Grammars
140 Views
Download Presentation
seda
Download Presentation

XML Grammars

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. XML Grammars 95-733 Internet Technologies Internet Technologies

  2. XML Grammars: Three Major Uses 1. Validation • Code Generation • Communication Internet Technologies

  3. XML Validation Sources for this lecture: “Data on the Web” Abiteboul, Buneman and Suciu “XML in a Nutshell” Harold and Means “The XML Companion” Bradley The validation examples were originally tested with an older parser and so the specific outputs may differ from those shown. Internet Technologies

  4. XML Validation A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings. Consider batch validation to be analogous to program compilation, with similar errors detected. Interactive validation involves constant comparison of the DTD against a document as it is being created. Internet Technologies

  5. XML Validation • The benefits of validating documents against a DTD include: • Programmers can write extraction and manipulation filters • without fear of their software ever processing unexpected • input. • Using an XML-aware word processor, authors and editors can • be guided and constrained to produce conforming documents. • Consider how Netbeans allows you to edit web.xml files. Internet Technologies

  6. XML Validation Examples XML elements may contain further, embedded elements, and the entire document must be enclosed by a single document element. These are recursive hierarchical structures. A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents. Internet Technologies

  7. Things the DTD does not do: • Specify the document root. • Specify the number of instances of each kind of element. • (Or, it’s rather hard to do.) • Describe the character data inside an element (the precise • syntax). • DTD’s don’t naturally handle namespaces. • The XML schema language is much more recent • and improves on DTD’s. We have “programmer level” • type specifications. • To see a real DTD, view source on • http://www.silmaril.ie/software/rss2.dtd Internet Technologies

  8. We’ll run this program against several xml files with DTD’s. We’ll study the code soon. // Validate.java using Xerces import java.io.*; import org.xml.sax.ErrorHandler; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.InputSource; import org.xml.sax.helpers.XMLReaderFactory; import org.xml.sax.helpers.DefaultHandler; This slide shows the imported classes. Internet Technologies

  9. public class Validate { public static boolean valid = true; public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } Here we check if the command line is correct. Internet Technologies

  10. try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation", true); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]); // go ahead and parse reader.parse(inputSource); } Internet Technologies

  11. // Catch any errors or fatal errors here. // The parser will handle simple warnings. catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); } } Internet Technologies

  12. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Valid document is true Internet Technologies

  13. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "http://localhost:8001/dtd/FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document DTD on the Web? VERY NICE <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Valid document is true Internet Technologies

  14. XML Document with an internal subset <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap [ <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > ]> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Valid document is true Internet Technologies

  15. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Valid document is false Internet Technologies

  16. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Swaps SYSTEM "FixedFloatSwap.dtd"> <Swaps> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> </Swaps> XML Document Internet Technologies

  17. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT Swaps (FixedFloatSwap+) > <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > DTD C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml Valid document is true Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times Internet Technologies

  18. Is this a valid document? <?xml version="1.0"?> <!DOCTYPE person [ <!ELEMENT person (name+, profession*)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (#PCDATA)> ]> <person> <name>Alan Turing</name> <profession>computer scientist</profession> <profession>cryptographer</profession> </person> Sure! Internet Technologies

  19. The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data). <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears> <StartYear>2000</StartYear> <EndYear>2002</EndYear> </NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document Internet Technologies

  20. DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > Output C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" -- (#PCDATA) org.xml.sax.SAXParseException: Element type "StartYear" is not declared. org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (# PCDATA) org.xml.sax.SAXParseException: Element type "EndYear" is not declared. Valid document is false Internet Technologies

  21. Mixed Content There are strict rules which must be applied when an element is allowed to contain both text and child elements. The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”). The group must be optional and repeatable. This is known as a mixed content model. Internet Technologies

  22. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT Mixed (emph) > <!ELEMENT emph (#PCDATA | sub | super)* > <!ELEMENT sub (#PCDATA)> <!ELEMENT super (#PCDATA)> DTD XML Document <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Mixed SYSTEM "Mixed.dtd"> <Mixed> <emph>H<sub>2</sub>O is water.</emph> </Mixed> Valid document is true Internet Technologies

  23. Is this a valid document? <?xml version="1.0"?> <!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)> ]> <page> <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathematician</profession> </paragraph> </page> Sure! Internet Technologies

  24. How about this one? <?xml version="1.0"?> <!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)> ]> <page> The following is a paragraph marked up in XML. <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathemetician </profession> </paragraph> </page> java Validate mixed.xml org.xml.sax.SAXParseException: The content of element type "page" must match "(paragraph)+". Valid document is false Internet Technologies

  25. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> <Note> <![CDATA[This is text that <b>will not be parsed for markup]]> </Note> </FixedFloatSwap> XML Document CDATA Section DTD <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) > <!ELEMENT Notional (#PCDATA)> <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ELEMENT Note (#PCDATA) > Internet Technologies

  26. Recursion <?xml version="1.0"?> <!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)> ]> <tree> <node> <leaf>A DTD is a context-free grammar</leaf> </node> </tree> java Validate recursive1.xml Valid document is true Internet Technologies

  27. How about this one? <?xml version="1.0"?> <!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))> <!ELEMENT leaf (#PCDATA)> ]> <tree> <node> <leaf>Alan Turing would like this</leaf> </node> <node> <leaf>Alan Turing would like this</leaf> </node> </tree> java Validate recursive1.xml org.xml.sax.SAXParseException: The content of element type "tree" must match "(node)". Valid document is false Internet Technologies

  28. Relational Databases and XML Consider the relational database r1(a,b,c), r2(c,d) r1: a b c r2: c d a1 b1 c1 c2 d2 a2 b2 c2 c3 d3 c4 d4 How can we represent this database with an XML DTD? Internet Technologies

  29. Relations <?xml version="1.0"?> <!DOCTYPE db [ <!ELEMENT db (r1*, r2*)> <!ELEMENT r1 (a,b,c)> <!ELEMENT r2 (c,d)> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]> <db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2> </db> java Validate Db.xml Valid document is true There is a small problem…. Internet Technologies

  30. Relations <?xml version="1.0"?> <!DOCTYPE db [ <!ELEMENT db (r1|r2)* > <!ELEMENT r1 ((a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a))> <!ELEMENT r2 ((c,d) | (d,c))> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]> <db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2> </db> The order of the relations should not count and neither should the order of columns within rows. Internet Technologies

  31. Attributes An attribute is associated with a particular element by the DTD and is assigned an attribute type. The attribute type can restrict the range of values it can hold. Example attribute types include : CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right) ID an element id that holds a unique value (among other element ID’s in the document) IDREF attributes refer to an ID Internet Technologies

  32. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> DTD <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xml org.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED. Valid document is false Internet Technologies

  33. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> DTD <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document Valid document is true Internet Technologies

  34. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <!ATTLIST FixedFloatSwap note CDATA #IMPLIED> DTD <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document Valid document is true #IMPLIED means optional Internet Technologies

  35. <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > <!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED> <!ATTLIST FixedFloatSwap note CDATA #IMPLIED> DTD <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap note = “For your eyes only”> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> XML Document Valid document is true Internet Technologies

  36. ID and IDREF Attributes We can represent complex relationships within an XML document using ID and IDREF attributes. Internet Technologies

  37. An Undirected Graph edge vertex v w u x z y Internet Technologies

  38. A Directed Graph u w y x v Internet Technologies

  39. Geom100 Math 100 Calc300 Calc100 Calc200 CS1 Philo45 CS2 This is called a DAG (Directed Acyclic Graph) Internet Technologies

  40. <?xml version="1.0"?> <!DOCTYPE Course_Descriptions SYSTEM "course_descriptions.dtd"> <Course_Descriptions> <Course> <Course-ID id = "Math100" /> <Title>Algebra I</Title> <Description> Students in this course study introductory algebra. </Description> <Prerequisites/> </Course> This course has an ID But no prerequisites Internet Technologies

  41. <Course> <Course-ID id = "Geom100" /> <Title>Geometry I</Title> <Description> Students in this course study how to prove several theorems in geometry. </Description> <Prerequisites/> </Course> The DTD will force this to be unique. Internet Technologies

  42. <Course> <Course-ID id="Calc100" /> <Title>Calculus I</Title> <Description> Students in this course study the derivative. </Description> <Prerequisites pre="Math100 Geom100" /> </Course> <Course> These are references to ID’s. (IDREFS) Internet Technologies

  43. <Course-ID id = "Calc200" /> <Title>Calculus II</Title> <Description> Students in this course study the integral. </Description> <Prerequisites pre="Calc100" /> </Course> The DTD requires that this name be a unique id defined within this document. Otherwise, the document is invalid. Internet Technologies

  44. <Course> <Course-ID id = "Calc300" /> <Title>Calculus II</Title> <Description> Students in this course study the derivative and the integral (in 3-space). </Description> <Prerequisites pre="Calc200" /> </Course> Prerequisites is an EMPTY element. It’s used only for its attributes. Internet Technologies

  45. <Course> <Course-ID id = "CS1" /> <Title>Introduction to Computer Science I</Title> <Description> In this course we study Turing machines. </Description> <Prerequisites pre="Calc100" /> </Course> <Course> IDREF ID A One-to-one link Internet Technologies

  46. <Course-ID id = "CS2" /> <Title>Introduction to Computer Science II</Title> <Description> In this course we study basic data structures. </Description> <Prerequisites pre="Calc200 CS1"/> </Course> <Course> ID IDREFS ID One-to-many links Internet Technologies

  47. <Course-ID id = "Philo45" /> <Title>Ethical Implications of Information Technology</Title> <Description> TBA </Description> <Prerequisites/> </Course> </Course_Descriptions> Internet Technologies

  48. The Course_Descriptions.dtd <?xml version="1.0"?> <!-- Course Description DTD --> <!ELEMENT Course_Descriptions (Course)+> <!ELEMENT Course (Course-ID,Title,Description,Prerequisites)> <!ELEMENT Course-ID EMPTY> <!ELEMENT Title (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Prerequisites EMPTY> <!ATTLIST Course-ID id ID #REQUIRED> <!ATTLIST Prerequisites pre IDREFS #IMPLIED> Internet Technologies

  49. General Entities & General entities are used to place text into the XML document. They may be declared in the DTD and referenced in the document. They may also be declared in the DTD as residing in a file. They may then be referenced in the document. Internet Technologies

  50. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Mellon National Bank and Trust" > ] > <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap> Document using a General Entity <?xml version="1.0" encoding="utf-8"?> <!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Bank (#PCDATA) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) > DTD Validate is true Internet Technologies