1 / 48

Object Orientation in XML DTD & Schema

Object Orientation in XML DTD & Schema. Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007. Contents. Background Text Processing & Storage Markup Languages XML DTD & Schema for XML SOAP : an Application for XML Schema Demonstration Conclusion. Background.

lita
Download Presentation

Object Orientation in XML DTD & Schema

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orientation in XML DTD & Schema Dunam Kim / Jongdae Han IDB Lab. / SE Lab. SNU CSE April 25, 2007

  2. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  3. Background • Needs were arisen to process complex and large text data • Certain kind of ‘language’ was requested to describe such complicate text

  4. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  5. Text Processing & Storage • Primitive : Long, simple sequence of string • DO primitive • Lacks of semantic information • Not machine intuitive • Advanced : Organized structure Title<cr>This text is a sample.<eof> Title This entity is title of the article. It should be common string, with maximum length of 10.

  6. Sender Receiver 1089356705 GNU TypedStream 1D@îC¡ received 1089356705 Serialization(1) • Ruby, Smaltalk, Python, ObjC, Java, .NET • process of saving an object onto a storage medium or transmit it over network • deflating or marshalling • example of ObjC

  7. Serialization(2) • Simple, non-structured • Focuses on efficiency • Not applicable to long text document • How can we find certain phrase in a 5MB document?

  8. Indexed Text(1) • Inspired by RDB • Increased search speed • Syntax resolution rather than semantic one His clothing collapsed in a heap. She did not see it, seeing only the naked man who stood before the chair in which the new President had sat. Chapter 1. the beginning. So tall was he that his head nearly brushed the ceiling; and so glorious was he that one felt that the ceiling had risen so that his head would not brush it. Go “chapter 1”

  9. Indexed Text(2)

  10. Indexed Text(3) • No semantic information, again! A Gazebo, 15 century, China Picture taken by Kim, 2006 05 02 How can we index this?

  11. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  12. Markup language The telephone in the study rang ten minutes before the news came on. The new President picked it up and said hello. "Mister President?" Title of the Text is : Copperhead Author of the Text is : Gene Wolfe

  13. Markup language Copperhead By Gene Wolfe The telephone in the stud <b>Copperhead</b> <I>Gene Wolfe</I> <br> The telephone in the stud <Title>Copperhead <Author>Gene Wolfe <Contents> The telephone in the stud

  14. Early History of Markup language • GenCode • William W. Tunnicliffe, 1967 • Gave rough sketch of the “Markup Language” • troff/nroff • Typesetting tool for Unix, mid-1960 • Tex • Publishing standard, 1978 • Scribe • Charles Goldfarb, 1960’s

  15. SGML • Standard Generalized Markup Language • Distinct structure and presentation • Separately had syntax for describing what tags were allowed, and where • Ancestor of the HTML • Invented by Charles Goldfarb, 1970s <QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS> </QUOTE>

  16. Cons of SGML • Standardized too late • ISO 8879, 1986 • Very complex, hard to learn • Cumbersome, as a side-effect of flexibility • ex) Start-tag ( or end-tag, or both) • sometimes optional -> why? • to save keystroke

  17. HTML • HyperText Markup Language • Tim Berners-Lee, 1993 • Procedural and Descriptive • A profile of SGML • Simple, restricted format

  18. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  19. XML • Developed by the World Wide Web Consortium (1998) • Focusing a particular problem by simplifying SGML • “The Internet Documents” • DTD also brought with XML 1.0 (1998) • slightly different with SGML • XML Schema introduced (2001) • W3C recommendation

  20. Characteristics of XML • Derived from SGML • All XML documents are also SGML document • Availability of grammar-based validation (DTDs) • Separation of contents and additional information about the contents (elements and attributes) • Improvements in XML • Eliminates complexity • Improves internationalization • Can be parsed in hierarchical structure <?xml version="1.0" encoding="UTF-8"?> <俄语>Данные</俄语>

  21. Structured use of XML(1) • XML documents can be parsed into hierarchical diagram • tree-based • Parsers following DOM, SAX <?xml version="1.0" ?> <Address> <city>Seoul</city> <street>Sejongro</street> <number>145</number>

  22. Structured use of XML(2)

  23. Structured use of XML(3)

  24. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  25. book book title author title book title author author Reason why schema is required • It is impossible to recognize structure of XML without metadata • An XML file can’t cover every possible form book (title, author*) book (title, author) book (title) book (title, author+)

  26. Concept of XML Schema, DTD • XML Schema and DTD represent the structure of an XML • Main purpose is to validate XML class object DB schema DB instance XML Schema, DTD XML instance

  27. DTD and XML Schema (1/6) • DTD (Document Type Definitions) • Adopted with XML 1.0 proposal by W3C • Unable to satisfy requirements for data transfer • XML Schema • Invented as alternative schema language by W3C • Requirement was released at Feb 1999 • Adopted at May 2001

  28. DTD and XML Schema (2/6) • DTD • DTD constraints structure of XML data • What elements can occur • What attributes can/must an element have • What subelements can/must occur inside each element, and how many times. • DTD does not constrain data types • DTD syntax • <!ELEMENT element (subelements-specification) > • <!ATTLIST element (attributes) >

  29. DTD and XML Schema (3/6) • DTD (Cont.) • Subelements can be specified as • names of elements, or • #PCDATA (parsed character data), i.e., character strings • EMPTY (no subelements) or ANY (anything can be a subelement) • Subelement specification may have regular expressions • <!ELEMENTlibrary ( ( book | magazine | newspaper)+)> • Notation: • “|” - alternatives • “+” - 1 or more occurrences • “*” - 0 or more occurrences

  30. address address street city zip zip city street (street, city, zip) Seoul 123456 #PCDATA #PCDATA Jongro #PCDATA DTD and XML Schema (4/6) • XML sample <?xml version = "1.0"?> <address> <!--(street , city , zip)--> <street>Jongro</street> <city>Seoul</city> <zip>123456</zip> </address> • DTD sample <?xml version='1.0' encoding='UTF-8' ?> <!ELEMENT address (street , city , zip)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT zip (#PCDATA)>

  31. DTD and XML Schema (5/6) • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs • Typing of values • E.g. integer, string, etc • Also, constraints on min/max values • User-defined, complex types • Many more features, including • uniqueness and foreign key constraints, inheritance • XML Schema is itself specified in XML syntax, unlike DTDs • XML Scheme is integrated with namespaces

  32. DTD and XML Schema (6/6) • XML Schema sample <?xml version='1.0' encoding='UTF-8' ?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element name="street" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="zip" type="xs:int"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

  33. contact address street city zip last first name #PCDATA (street, city, zip) #PCDATA (name, address) #PCDATA (first, last) #PCDATA #PCDATA OO Concepts in DTD, Schema (1/8) • Complex Data in DTD • Element can have elements as child nodes • Child elements can also have elements as child nodes

  34. OO Concepts in DTD, Schema (2/8) • Complex Data in DTD (Cont.) <?xml version='1.0' encoding='UTF-8' ?> <!ELEMENT contact (name, address)> <!ELEMENT name (first, last)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT address (street , city , zip)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT zip (#PCDATA)>

  35. OO Concepts in DTD, Schema (3/8) • Complex Data in XML Schema • Separation of element and complex type • Sharing of one type with several elements Named ComplexType Unnamed ComplexType

  36. OO Concepts in DTD, Schema (4/8) • Complex Data in XML Schema (Cont.) <?xml version='1.0' encoding='UTF-8' ?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:element name="contact"> <xs:complexType> <xs:sequence> <xs:element name="name" type="nameType" /> <xs:element name="address" type="addressType" /> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="addressType"> <xs:sequence> <xs:element name="street" type="xs:string" /> <xs:element name="city" type="xs:string" /> <xs:element name="zip" type="xs:int" /> </xs:sequence> </xs:complexType> <xs:complexType name="nameType"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:schema>

  37. address street city zip address street city (street, city, zip) (street, city) OO Concepts in DTD, Schema (5/8) • Inheritance in DTD • DTD implements inheritance using parameter entity • Parameter entity is similar to ‘#define’ statement in C/C++ • Polymorphism is unavailable <!-- define Address.extra as empty string --> <!ENTITY % Address.extra “”> <!--Address’s content = “city, street” + Address.extra --> <!ELEMENT Address (city, street %Address.extra; )> <!-- redefine – Address’s content = city, street, zip--> <!ENTITY % Address.extra “, zip”>

  38. OO Concepts in DTD, Schema (6/8) • Inheritance in XML Schema • XML Schema supports inheritance naturally • Polymorphism is available with ‘substitution group’ feature • Extension and restriction options are available <xs:complexType name=“USA_addressType”> <xs:complexContent> <xs:extension base=“addressType”> <xs:sequence> <xs:element name=“zip” type=“xs:int” /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>

  39. book id title authorref author id #PCDATA @id, @authorref (title) @id (#PCDATA) OO Concepts in DTD, Schema (7/8) • Object identity in DTD • DTD implements object identity using ID, IDREF • DTD shares one unique index for every ID in an XML • Performance is poor for this one big unique index <?xml version="1.0" ?> <books> <book id=“b1” authorref=“a1” > <title>Database Concepts</title> </book> <book id=“b2” authorref=“a2” > <title>Operating Systems</title> </book> <author id=“a1”>Korth</author> <author id=“a2”>Ullman</author> </book>

  40. OO Concepts in DTD, Schema (8/8) • Object identity in XML Schema • Key in XML Schema is designed to support key in RDB • There can be various keys with different scopes in an XML • Several elements may build up one key <?xml version="1.0" ?> <tables> <table1> <row id=“1” field1=“value1” field2=“value2”> <row id=“2” field1=“value1” field2=“value2”> </table1> <table2> <row id=“1” field1=“value1” field2=“value2”> </table2> </tables>

  41. DTD vs XML Schema

  42. Contents • Background • Text Processing & Storage • Markup Languages • XML • DTD & Schema for XML • SOAP : an Application for XML Schema • Demonstration • Conclusion

  43. SOAP : Application of XML Schema • SOA • SOA has many advantages compared to CBD • The benefits come from XML, XML Schema CBD SOA

  44. SOAP: Application of XML Schema • SOAP (Simple Object Access Protocol) • Remote procedure call protocol for exchanging object • UDDI (registry of Web services) • WSDL(Web Service Description Language) web service consumer UDDI registry web service provider 1. Build web service 2. Register web service 3. Discover web service 4. Get WSDL 5. Build proxy and client 6. Call Web service (SOAP)

  45. SOAP: Application of XML Schema • WSDL (Web Service Description Language) • WSDL specifies • names of methods • names and data types of parameters • data types of return values • exceptions which can be thrown • URL of Web service • Data types are defined using XML Schema • platform-independent • machine-understandable

  46. SOAP: Application of XML Schema XML Schema • Sample of WSDL

  47. Conclusion • Recent CS advancement causes application to process large text • XML overcomes cons of previous object description languages • DTD has been introduced with XML to explain XML document • XML Schema enhanced XML with OO Paradigm

  48. Reference • Jon Duckett et al, Professional XML Schemas, WROX, 2001 • Elliotte Rusty Harold, Effective XML: 50 Specific Ways to Improve Your XML, Addison Wesley, 2003 • Russ Basiura et al, Professional ASP.NET Web Services, WROX, 2001 • W3C, Extensible Markup Language (XML) 1.0, W3C, 2006 • Brett McLaughlin et al., Java and XML, O’Reilly, 2006

More Related