1 / 41

XML for Information Management

XML for Information Management. 26.4.-30.4.2010. University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/. Outline. 1. Entity types 2. Entity declarations and references 3. XML processor treatment of entity references

arlene
Download Presentation

XML for Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML for Information Management 26.4.-30.4.2010 University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/

  2. Outline 1. Entity types 2. Entity declarations and references 3. XML processor treatment of entity references 4. Motivations for the use of entities 5. XML family of languages

  3. 3. Entity types Physical structure of XML documents consists of entities. An entity is a unit recognized by the XML processor, the content of an entity is text or other kind of data. 3

  4. 3. Entity types 3-dimensional categorization: • parsed entities -- unparsed entities • internal entities -- external entities • general entities -- parameter entities 4

  5. 3. Entity types parsed entity intended to be parsed by the XML processor, content consists of marked-up text unparsed entity not intended to be parsed by the XML processor, content can be whatever data 5

  6. 3. Entity types internal entity name and value given in an entity declaration always a parsed entity external entity not internal parsed or unparsed 6

  7. 3. Entity types general entity used in elements and attributes parsed or unparsed internal or external parameter entity used in the document type definition always parsed internal or external 7

  8. 3. Entity types Alternatives 8

  9. 3. Entity types UNPARSED ENTITIES: • files not intended for XML processing but referred to by entity references in the INPUT FILES INPUT FILES for XML processing: XML processor Information about: application • root entity, external subset of DTD • other files intended for XML processing • elements and attributes • comments • processing instructions • character data • namespaces • notations and locations of unparsed entities INTERNAL ENTITIES: • name and textual content given in DTD 9

  10. 4. Entity declarations and references EntityDecl ::= GEDecl | PEDecl GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>' PEDecl ::= '<!ENTITY' S '%' Name S PEDef S? '>' EntityDef ::= EntityValue | ( ExternalID NDataDecl?) PEDef ::= EntityValue | ExternalID entity definition for internal entity entity definition for external entity 10

  11. 4. Entity declarations and references internal entity name and value ( = literal value) given <!ENTITY % Shape "(rect | circle | poly | default )"> <!ENTITY JY "Jyväskylän yliopisto"> name literal value 11

  12. 4. Entity declarations and references external entity name and system identifier (possibly together with public identifier) given, for an unparsed entity also notation <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"> <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"> Declarations from XHTML specification: http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html <!ENTITY virtuaaliyliopistouutiset SYSTEM "http://virtuaaliyliopisto.jyu.fi/kotisivut/sisalto/etusivu/newsfeed.xml"> 12

  13. 4. Entity declarations and references Unparsed entity <!ENTITY image1 SYSTEM "../images/birdnest.gif" NDATA gif> notation name The notation must have been declared, for example: <!NOTATION gif PUBLIC "-//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe Graphic Interchange Format//EN" > 13

  14. 4. Entity declarations and references References to parameter entities: %Shape; %HTMLsymbol; References to parsed general entities: &JY; &virtuaaliyliopistouutiset; Reference to an unparsed general entity: <poem image="image1"> The type of the attribute has to be ENTITY or ENTITIES 14

  15. 4. Entity declarations and references In addition to entity references, XML documents may contain character references. Refers to a specific character of Unicode Provides a decimal or hexadecimal representation of the character’s code point in Unicode Example: &#34; One-character entity defined: <!ENTITY quot "&#34;"> 15

  16. 4. Entity declarations and references Where an entity or character reference can occur? reference to can occur in 16

  17. 5. XML processor treatment of entity references References to unparsed entities Validating processor makes the identifiers for the entities and associated notations available to the application. <poem image="figure1"> <!-- From a poem of Aale Tynni --> <line>Seisoin ikkunassa ja nauroin. Ihana puu.</line> <line>Ihana pesä.</line> </poem> 17

  18. 5. XML processor treatment of entity references References to parsed entities Dealing with two kinds of entity values: literal value - the character string written between quotes in the entity definition replacement text - derived by replacing the character references and parameter entity references in the literal value by their character values and replacement texts, respectively. The XML processor replaces the entity reference by its replacement text. 18

  19. 5. XML processor treatment of entity references entity declaration <!ENTITY rhyme1 "<rhyme xml:lang="fi"> <line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme>"> The XML processor is not able to parse this! Problem with the quotes inside the quotes! 19

  20. 5. XML processor treatment of entity references <!ENTITY rhyme1 "<line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme>"> entity declaration entity reference <rhymecollection> &rhyme1; </rhymecollection> replacement text = literal value <rhyme> <line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme> 20

  21. 5. XML processor treatment of entity references <!ENTITY rhyme1 "<rhyme xml:lang=&#34;fi&#34;> <line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme>"> entity declaration with character references <rhymecollection> &rhyme1; </rhymecollection> entity reference <rhyme xml:lang=&#34;fi&#34;> <line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme> literal value replacement text <rhyme xml:lang="fi"> <line>Ole aina iloinen</line> <line>niin kuin pikku varpunen</line> </rhyme> 21

  22. 5. XML processor treatment of entity references <!ENTITY % StyleSheet "CDATA"> <!-- style sheet data --> <!ENTITY % Text "CDATA"> <!-- used for titles etc. --> <!ENTITY % coreattrs "id ID #IMPLIED class CDATA #IMPLIED style %StyleSheet; #IMPLIED title %Text; #IMPLIED"> Declarations from XHTML specification: http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html literal value of coreattrs:id ID #IMPLIED class CDATA #IMPLIED style %StyleSheet; #IMPLIED title %Text; #IMPLIED replacement text ofcoreattrs:id ID #IMPLIED class CDATA #IMPLIED style CDATA #IMPLIED title CDATA #IMPLIED 22

  23. 5. XML processor treatment of entity references Exercise Entity declaration from XHTML Strict-DTD: <!ENTITY % Block " (%block; | form | %misc; )*"> What is the (a) literal value (b) replacement text of entity Block (a) literal value: (%block; | form | %misc; )* 23

  24. 5. XML processor treatment of entity references Other entity declarations needed from the DTD: <!ENTITY % heading "h1| h2| h3| h4| h5| h6"> <!ENTITY % lists "ul | ol | dl"> <!ENTITY % blocktext "pre | hr | blockquote | address"> <!ENTITY % block "p | %heading; | div | %lists; | %blocktext; | fieldset | table"> <!ENTITY % misc.inline "ins | del | script"> <!ENTITY % misc "noscript | %misc.inline;"> Declarations from XHTML specification: http://www.w3.org/TR/2002/REC-xhtml1-20020801/dtds.html 24

  25. 5. XML processor treatment of entity references Deriving the replacement text of Block : references to parameter entities in the literal value (%block; | form | %misc;)*replaced by their replacement texts. Literal value of block: p | %heading; | div | %lists; | %blocktext; | fieldset | table Replacement text of block: p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table Literal value of misc : noscript | %misc.inline; Replacement text of misc: noscript | ins | del | script Replacement text of Block: (p | h1| h2| h3| h4| h5| h6 | div | ul | ol | dl | pre | hr | blockquote | address | fieldset | table | form | noscript | ins | del | script )* 25

  26. 6. Motivations for the use of entities The use of entities supports: • use of non-textual data (audio, graphics, etc.) in XML documents (but can be added also in stylesheets) • modularization of documents • consistency • multiuse of definitions • adding semantic information by informative entity names and comments attached to entity declarations 26

  27. 5. XML family of languages Specification of XML 1.0 was just the first step in the development of languages for the management of data on the Web. • W3C (World Wide Web Consortium) developes specifications to support the use of the web, the specifications are publicly available at http://www.w3.org/TR/ • Development is systematic • Development process is specified and published

  28. 5. XML family of languages Phases of the W3C development process • Working Draft: represents work in progress. • Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback. • Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review. • Recommendation: represents consensus within W3C, widespread implementation encouraged.

  29. 5. XML family of languages XML family = XML + XML-related languages A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html

  30. 5. XML family of languages XML-related languages fall into the following categories: • XML accessory: intended for wide use to extend the capabilites of XML • XML transducer: intended for transducing some input XML data into some output form • XML application: intended for some special application domain, defines constraints for XML data on the domain

  31. 5. XML family of languages XML Accessory • additional rules extending the capabilities specified in XML • intended for wide use • development primarily at W3C • for realizing the modularization principle of W3C: keep XML itself small and as stable as possible • most important: XML Names, XML Schema, XPath, XLink

  32. 5. XML family of languages W3C Recommendations for XML Accessories:

  33. 5. XML family of languages XML Transducer • To convert XML input data (a document, part of document, a set of documents) into output • Associated with a processing model • Active development at W3C • most important: CSS, XSL, XSLT, XQuery

  34. 5. XML family of languages W3C Recommendations for XML Transducers:

  35. 5. XML family of languages XML Application • Defines constraints for a class of XML data on a particular application domain • Usually defined by a DTD or some other schema language • development work both at W3C and outside • Examples from W3C: SMIL, RDF, XHTML

  36. 5. XML family of languages XML Applications developed at W3C for: • Non-textual Data • Web Publishing • Metadata and Semantic Web • Web Communication and Services

  37. 5. XML family of languages W3C Recommendations for non-textual data:

  38. 5. XML family of languages W3C Recommendations for Web publishing:

  39. 5. XML family of languages W3C Recommendations for Semantic Web:

  40. 5. XML family of languages W3C Recommendations for Web communication and services:

  41. 1. XML family of languages For more information: A. Salminen, XML Family of Languages. Overview and Classification. http://users.jyu.fi/~airi/xmlfamily.html

More Related