1 / 29

XML and CHAIMS Dorothea Beringer

XML and CHAIMS Dorothea Beringer. The Extensible Markup Language and its Use for CHAIMS Main reference: W3C Recommendations for XML 1.0. Element Tags and Attributes. Start and end element tags with PCDATA or other elements in between: <DES repname=“lastname” >Rochelle</DES>

michon
Download Presentation

XML and CHAIMS Dorothea Beringer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML and CHAIMSDorothea Beringer The Extensible Markup Language and its Use for CHAIMS Main reference: W3C Recommendations for XML 1.0 XML and CHAIMS

  2. Element Tags and Attributes • Start and end element tags with PCDATA or other elements in between: <DES repname=“lastname” >Rochelle</DES> • Empty element tags: <H:Goal text=“So what?” /> • Attributes (name-value pairs, value is always text): <INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”> For documents: tags for structure, attributes for additional information concerning structure, all text is PCDATA:<p indent=“true”><b>My text:</b> this is an explanation for my text.</p> For protocols: large amount of freedom for putting information into tag-name, attribute or CDATA:<CPAMprimitive> <type>INVOKE_request</endtype> <clientid>09870987sdf</clientid> <methodname>makeHoroscope</methodname></CPAMprimitive> <lastname>Rochelle</lastname> XML and CHAIMS

  3. Comments, CDATA, Characters • Comments: <!-- this text is comment --> • CDATA: markup is interpreted as character data, not as markup (e.g. in a document about XML) <![CDATA[ text to be escaped ]]> <![CDATA[ <name>Michelle</name> ]]> • Characters: • only characters from chosen character set are allowed • use character references for uncommon characters • default escape characters: &amp;for &, &lt; for <, &gt;for >, &quot; for “, &apos; for ‘ XML and CHAIMS

  4. Prolog Prolog: • XML-declaration: <?xml version=“1.0” encoding=“UTF-8” ?> • ISO 10646 UTF-8: default encoding • other processing instructions: <? …… ?> • document type declaration: <!DOCTYPE CCIS:Message SYSTEM “CCISMessage.dtd” [ ….. ]>or <!DOCTYPE mydoc> • internal DTD as part of document type declaration Additional DTD’s: • external DTD in document type declaration • use parameter entity definition as part of internal DTD:<!ENTITY % HoroscopeDTD SYSTEM “http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” >%HoroscopeDTD; • internal DTD is read first, and thus overrides external DTD XML and CHAIMS

  5. Example of an XML Document (1) <?xml version=“1.0” encoding=“UTF-8” ?> <!DOCTYPE CCIS:Message SYSTEM “CCISMessage.dtd” [ <!ENTITY % HoroscopeDTD PUBLIC “-//HoroscopeCompany//TEXT Standard MagicHoroscope//EN” “http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” > %HoroscopeDTD; ]> <CCIS:Message version=“0.1” xmlns=“CCIS” xmlns:H=“MagicHoroscope”> <INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”> <Parameters> <DEC repname=“persdat” type=“list” fullname=“Personal Data for Horoscope”> <DEC repname=“name” fullname=“All names of person”> <DES repname=“firstname” type=“string”>Michelle</DES> <DES repname=“middlename” >Andr&eacute;e</DES> <DES repname=“lastname” >Rochelle</DES> <DES repname=“addname” fullname=“additional name”>Judit</DES> <DES repname=“addname” fullname=“additional name”>Monique</DES> <DES repname=“addname” fullname=“additional name”>Claire</DES> </DEC> XML and CHAIMS

  6. Example of an XML Document (2) continued from previous slide: <DES repname=“birthdate” fullname=“Date of Birth” type=“UTCdatetime” description=“exact date and time of birth”>1974-03-23T22:10:35Z</DES> </DEC> <DES repname=“fav” fullname=“Favorite Sentence”>Hello, Hello!!</DES> <DEO repname=“magic_addr” type=“unknown” fullname=“Magic Address” description=“magic address as returned by magic address makers”> <H:Address> <H:Homepage href=“http://www-db.stanford.edu/~meier”/> <H:Picture href=“ftp://ftp.pictures.com/random-picture”/> <H:Name>Adonso Alerta</H:Name> <H:MoreInfo>href=“http://www-db.stanford.edu/~meier”</H:MoreInfo> <H:Goal text=“So what?” /> </H:Address> </DEO> </Parameters> </INVOKE_request> </CCIS:Message> XML and CHAIMS

  7. The DTD DTDs restrict structure of document and provide default values: • Well-formed document: adheres to XML specification • Valid document: adheres to DTD ==> validating parsers • Element type declaration specify the element content model: <!ELEMENT EXTRACT_response (Parameters, Accuracies?, Error*) ><!ELEMENT Parameters (DEC | DES | DEO)* ><!ELEMENT DES #PCDATA><!ELEMENT DEO ANY><!ELEMENT MagicHoroscope:Goal EMPTY> Content model has to be deterministic. • Attribute listdeclaration: <!ATTLIST CCIS:DOS CCIS:repname NMTOKEN #REQUIRED CCIS:fullname CDATA #IMPLIED CCIS:type (string, integer, real, UTFdatetime) “string” CCIS:compliancy CDATA #FIXED “compliant to CHAIMS”> XML and CHAIMS

  8. Example: CCISMessage.dtd <!ELEMENT CCIS:Message ( SETUP_request | SETUP_response | INVOKE_request | INVOKE_response)> <!-- not complete --> <!ATTLIST CCIS:Message version CDATA "0.1”requestnr CDATA "" > <!ELEMENT INVOKE_request (Parameters)?> <!ATTLIST INVOKE_requestclientid NMTOKEN #REQUIREDmethodname CDATA #REQUIRED> <!ELEMENT CCIS:Parameters (CCIS:DES | CCIS:DEC | CCIS:DEO)*> <!ELEMENT CCIS:DEC (CCIS:DES | CCIS:DEC | CCIS:DEO)* > <!ATTLIST CCIS:DEC CCIS:repname NMTOKEN #REQUIRED CCIS:type NMTOKEN "list" CCIS:fullname CDATA #IMPLIED CCIS:description CDATA #IMPLIED> <!ELEMENT CCIS:DES #PCDATA > … XML and CHAIMS

  9. IDs Attributes of type ID and IDREF can be used for links within the same document: <?xml version=“1.0”?><!DOCTYPE document [ <!ELEMENT document ((#PCDATA | ref)*, paper*)> <!ELEMENT ref #PCDATA> <!ATTLIST ref ref IDREF #REQUIRED> <!ELEMENT paper #PCDATA> <!ATTLIST paper id ID #REQUIRED>]><document> This is text of my document that describes the paper <ref ref=“Wie99”>[Wiederhold99]</ref> into all possible details.<paper id=“Wie99”>Gio Wiederhold; “The advantage of CLAM”; not yet published</paper></document> or by using Xpointers: <document> This is text of my document that describes the paper <pointer href=“#Wie99”>[Wiederhold99]</pointer> into all possible details. ID’s must be unique within a document. XML and CHAIMS

  10. Entity References • Parameter Entities, only to be used within DTDs: <!ENTITY % doctype Proposal > in internal DTD statements <!ELEMENT &doctype; ANY> in external DTD, expands to <!ELEMENT Proposal ANY> • Entities, to be used everywhere: <!ENTITY su “Stanford University” > in DTD<!ENTITY eacute&#xE9; > Palo Alto is famous for nearby &su;. in document body <DES repname=“middlename” >Andr&eacute;e</DES> • External Entity References: <!ENTITY % HoroscopeDTD PUBLIC “-//HoroscopeCompany//TEXT Standard MagicHoroscope//EN” “http://www.horoscopecomp.com/DTDs/magichoroscope.dtd” > <!ENTITY logo SYSTEM “ftp://ftp.epfl.ch/pub/logos/epfl.gif” NDATA gif> • Notation declaration: <!NOTATION gif SYSTEM “/u/bin/gifviewer”> XML and CHAIMS

  11. Namespaces, Binary Data Defining a namespace for this and all enclosed elements: <Message xmlns=“CCIS” xmlns:U=“UTCStandard”><INVOKE_request clientid=“09870987sdf” U:date=“1999-05-03” U:time=“23:01:00” methodname=“makeHoroscope”> expands to: <CCIS:Message><CCIS:INVOKE_request CCIS:clientid=“09870987sdf” UTCStandard:date=“1999-05-03” UTCStandard:time=“23:01:00” CCIS:methodname=“makeHoroscope”> Binary data: • external reference: <BIN XML-LINK=“simple” HREF=“www.my.com/myfile” /> • internal, yet encoded as bin-hex or uuencode: <DES repname=“bindata” type=“uuencoded”>begin 644 tmp )37D@5&amp;5X=“$* end</DES> XML and CHAIMS

  12. XLink (1) An XLink element links together several resources: • an element is a linking element if it has attribute xml:link • locators identify resources participating in the link • one-directional, bi-directional, multi-directional links Reserved attributes used for making linking elements: • xml:link: “simple”, “extended”, “locator”, “group”, “document” • href: defines a remote locator participating in link, consists of a URI to remote resource and/or a connector (# or |) with an Xpointer to desired fragment of resource • inline: “false”, “true” (default); inline: one of the resources of the link is the local resource given by the content of the link element • show: “embed”, “replace”, “new” • actuate: “auto”, “user” • behavior: additional information how link should behave XML and CHAIMS

  13. XLink (2) • title: human-readable text describing the linked resource • role: role of the linked resource in the context of the originating resource • content-role: role of the local resource in the context of the remote resource • content-title: title of local resource Naming conflict: use attribute remapping: <mylink xml:link=“simple” xml:attributes=“role xlinkrole” xlinkrole=“lrole”... Simplifying XML-document: specify linking attribute values in DTD Different kinds of links: - simple links - extended link groups (using xml:link=“group- extended links xml:link=“document”, attribute steps) XLink and XPointer (not part of XML) are still working drafts (April 1999)! XML and CHAIMS

  14. XLink (3) Simple links: • just one remote resource, all in one element (xml:link=”simple”) • normally an inline link <Picture xml:link=“simple” href=“mypicture.gif” /> <BroaderTerm xml:link=“simple” role=“bt” content-role=“nt” href=“file:/u/dict/terms.dict|=building” actuate=“user” show=“embed” title=“Broader Term”> broader term</BroaderTerm> Extended links: • link-element (xml:link=“extended”) contains attributes for whole link • one locator element (xml:link=“locator”) for each remote resource • normally an out-of-line link ==> detached from the resources that are linked together <dictionary xml:link=“extended” inline=“false” role=“all synonyms”> <word xml:link=“locator” role=“synonym” href=“#big_id”/> <word xml:link=“locator” role=“synonym” href=“#id(large_id)”/> <word xml:link=“locator” role=“synonym” href=“#origin().great_id”/></dictionary> XML and CHAIMS

  15. XPointer (1) Usage: often part of a locator, specifies fragments of an XML document <mylink href=“myCCISmessage.xml#root().child(1,INVOKE_request).child(1, “Parameters”).child(1,DEC,repname,persdat).child(1,DEC,repname,name)” > XPointer starts with a node given by one of: • Root:root(), default, start is root element of document that is given by URI part of locator or from local document • Origin: origin(), start is the containing element of locator, no URI allowed in locator • ID: id(myID), shortcut: myID, looks for an element with an ID-attribute (specified in DTD with ID) with value myID • HTML: html(target), looks for an element <A name=“target” ….> In that node the target fragment is found by: • child, descendant, ancestor, preceding, following, psibling, fsibling • plus all (all candidate elements are selected) or a number • optionally plus the name of the element type or #element, #pi, #comment, #text, #cdata, #all, gives candidate nodes XML and CHAIMS

  16. XPointer (2) • optionally plus pairs of attribute name and value * any attribute name or any value #IMPLIED no value is given for this attributeor exact string or substring • or span(XPointer, XPointer), returns all in between • or attr(attributename), returns just this attribute value • or root().child(1).string(5,”hello”, 1, 3),selects first three characters of the fifth occurrence of the substring hello in first child of root element child descendant ancestor psibling fsibling preceding following pi cdata comment cdata cdata cdata cdata XML and CHAIMS

  17. Styles HTML • tags have structural as well as some representational semantics, e.g.: <a>, <header1> • additional formatting with CSS1 (Cascading Style Sheets) XML • no representational semantics at all ==> all representational information in additional documents • CSS1 • XSL (Extensible Style Sheet Language), draft, based on DSSSL, contains construction rules for elements • linking to XML document by XML processor either by external rules (e.g. user defined style sheet) and/or by processing instructions in XML document (draft!): <?xml-stylesheet href=“mystyle.css” title=“Compact type=“text/css”?> XML and CHAIMS

  18. XML Styles in CHAIMS I/O-megamodules in CHAIMS: • parameter in XML without style document: rendering (e.g. into RTF or HTML) according to default rules, based on datatypes • parameter in XML with additional style document: rendering according to style rules in style document Origin of style documents? • http-link in parameter to style document on site of a megamodule provider • …..? XML and CHAIMS

  19. Query Languages for XML Query languages: extract certain parts from an XML-document based on a filter/query • XQL • http://metalab.unc.edu/xql/ • example: /novel//author[@gender=‘mail’ and @size=‘5.4’] • XML-QL • http://www.w3.org/TR/NOTE-xml-ql • other syntax than XQL • more complex, allows joins… In CHAIMS: • allowing queries in extract primitives (enhancing partial extraction) • allowing queries in assignment primitives to variables and input parameters in invocation primitives • only on structures exposed in repository, or also on opaque structures? XML and CHAIMS

  20. DOM and Parsers DOM (Document Object Model) is a programming API for XML documents ==> closely linked to parsers! • DOM • http://www.w3.org/TR/REC-DOM-Level-1 • http://www.w3.org/TR/WD-DOM-Level-2 • general specification of the API • representation of an XML-document as a tree in a programming language • Parsers • e.g. IBM XML4J Parser for Java and others • http://www.software.ibm.com/xml/resources/ XML and CHAIMS

  21. XML and Other Standards • ASN.1: binary, semantic defined per application • TeX: readable program, semantic for typesetting, directives for sections, pages, etc. • PostScript: programming language, semantic for typesetting, directives for sections, pages, etc., page oriented • RTF: unreadable text, semantic for presentation and structure • HTML: readable text, presentational and declarative, semantic and limited presentation defined • SGML: readable text, declarative, semantic and DTD defined per application, richer than XML • XML: readable text, declarative, extensible, style described externally in CSS and XSL, semantic and DTD defined per application • RDF: XML text, declarative, metadata schema XML and CHAIMS

  22. text binary specific mif dump general xml ASN.1 Limitations of XML XML is just a syntax • it can be used for many different things in many different ways… • compare to alphabet - though I can read any language using the Latin alphabet, I do not understand most of them XML is general syntax for marked text / information • specific text-formats still useful • binary formats still useful XML has no type system • specify types of PCDATA with DTD and validate it with parsers? • work-around: see example XML and CHAIMS

  23. Why Using XML for CPAM? XML is one common way to mark up text, and to define the structure for the mark-up, so why not use XML instead of defining our own? Advantages of readable text with declarative mark-up: • human readable (in contrast to ASN.1) ==> monitoring of CPAM is straightforward • text-based (no marshalling problems) • extendable! no more problems when extending CPAM as long as old version is subset of new version • parsers and DTDs (can be much more error- and extension tolerant) instead of method signatures, message-paradigm! • combinable with XSL and XQL Advantages of mark-up that supports attributes: • traditional RPCs: type, name and other attributes are defined in separate documents, e.g. IDL files, header files • CHAIMS: all data elements carry with them type, name and even descriptive name information ==> use attributes in XML for this XML and CHAIMS

  24. Why Using XML for the CHAIMS Repository? • Repository is already plain text with mark-up • Precise and explicit DTD as part of repository • Using (other) of the shelf parsers • Combining repository with style-sheet for representation (does that make sense? the repository wizard is more helpful) • Yet text will get more lengthy Will it really make a difference? XML and CHAIMS

  25. CPAM in XML Part of or all the message in XML? 1 All the message is one XML file 2 Only input and result parameters for methods are in XML 2a One XML file for each parameter 2b All parameters are in one file Information in element types, element values, or in attributes? • which primitive: <EXTRACTprimitive> …. </EXTRACTprimitive> <primitive><primitivetype>EXTRACT<primitivetype>....</primitive> <primitive primitivetype=“EXTRACT”>….</primitive> • parameter names and types XML and CHAIMS

  26. CPAM in XML: Parameters (1) <CCIS:Message version=“0.1” xmlns=“CCIS” xmlns:H=“MagicHoroscope”> <INVOKE_request clientid=“09870987sdf” methodname=“makeHoroscope”> <Parameter repname=“persdat” type=“list” fullname=“Personal Data”> <DE repname=“name” type=“list” fullname=“All names of person”> <DE repname=“firstname” type=“simple” datatype=“string”>Michelle</DE> <DE repname=“lastname” type=“simple” datatype=“string”>Smith</DE> </DE> <DE repname=“address” type=“opaqueXML” fullname”Address of Person”> <Street>234234 El Camino Real</Street> <City>Palo Alto, CA 94305</City> <Other what=“picture” xml:link=“simple” href=“http://www.a.b/pict” />\> </DE> </Parameter> <Parameter> <DE repname=“sdt” type=“dateTime.iso8601tz” name=”Date and Time of Submission”>19990528T08:24:45+08</DE> </Parameter> XML and CHAIMS

  27. CPAM in XML: Parameters (2) <Parameter> <DE repname=“grade” type=“number”>3.24</DE> </Parameter> </INVOKE_request> </CCIS:Message> • datatype: according to XML-Data (or subset of it) • type: simple, list, opaque, opaqueXML, link (to file that contains one or more parameters in above syntax) • each top-level parameter is has a <Parameter> mark-up • <DE> according to repository, data-structure can be exposed down to a certain level • at any level the opaque part of the parameter can start, if there is an opaque structure at all XML and CHAIMS

  28. Parameter table in Wrapper Each parameter is either: • native • XML in string form • XML in DOM For long data: • empty data value plus Xlink into a file, when sending data, Xlink has to be replaced by actual data??? Direct dataflow: • Keeping link in XML-message? Referencing unique URL? Additional expiration flag? Until then no expiration allowed and other megamodule can get data? • Adding to extract request: “links allowed” XML and CHAIMS

  29. Incremental Extract... Presetting of parameters: • only for highest level parameters Extract/Examine: • only highest level parameters?: easy • or also indivicual DEs?: tricky, maybe using Xpointers or XQL XML and CHAIMS

More Related