Lecture 14 metadata and markup
This presentation is the property of its rightful owner.
Sponsored Links
1 / 67

Lecture 14: Metadata and Markup PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003 http://www.sims.berkeley.edu/academics/courses/is202/f03/. Lecture 14: Metadata and Markup. SIMS 202: Information Organization and Retrieval. Lecture Overview. Review

Download Presentation

Lecture 14: Metadata and Markup

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 14 metadata and markup

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2003

http://www.sims.berkeley.edu/academics/courses/is202/f03/

Lecture 14: Metadata and Markup

SIMS 202:

Information Organization

and Retrieval


Lecture overview

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Lecture overview1

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Lecture overview2

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Xml as a common syntax

XML as a common syntax

  • XML (and SGML) provide a way of expressing the structure of documents that can be verified and validated by document processing systems

  • “Documents” can be metadata structures

    • Such as the description of a particular photograph in our Phone project

  • XML thus provides a way of representing metadata descriptions as well as the content that they describe


Xml as a common syntax1

XML as a common syntax

  • All XML documents follow some simple rules that make them interchangeable and usable across different systems

    • All data and markup is in UNICODE

    • All elements are marked by begin and end tags

    • All markup is case-sensitive

    • XML DTD’s and/or Schemas define the valid structure (and sometimes content) of the documents


Example mets

Example – METS

  • METS – the Metadata Encoding and Transmission Standard is a new Schema intended to provide:

    • “a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium”

  • METS can be used to “wrap” complex sets of data (the actual data, with rules for encoding binary forms), the metadata describing the parts of that data, and the sequence and conditions under which the data can or should be presented or displayed


Lecture overview3

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Sgml xml structure

SGML/XML Structure

  • An SGML document consists of three parts:

    • The SGML Declaration

    • The Document Type Definition (DTD)

    • The Document Instance

  • An XML document REQUIRES only the document instance, but for effective processing a DTD is very important

  • XML Schema (later) provides an alternative to DTDs for XML applications


Document type definitions

Document Type Definitions

  • The DTD describes the structural elements and "shorthand" markup for a particular document type and defines:

    • Names of "legal" elements

    • How many times elements can appear

    • The order of elements in a document

    • Whether markup can be omitted (SGML only)

    • Contents of elements (i.e., nested structures)

    • Attributes associated with elements

    • Names of "entities"

    • Short-hand conventions for element tags (SGML only)


Dtd components

DTD Components

  • The major components of a DTD are:

    • Entity Declarations

    • Element Declarations

    • Attribute Declarations


Document type definitions1

Document Type Definitions

  • Entity Declarations are a "macro" definition facility for both DTD and Document instance parts

    • General Internal Entity Definitions<!ENTITY name "substitute string">referenced by &name;

    • General External Entity Definitions<!ENTITY name SYSTEM "file path">referenced by &name;

    • Parameter Entity Definitions (used only inside DTDs)<!ENTITY %name "substitute string">or<!ENTITY %name SYSTEM "file path">referenced by %name; or %name


Document type definitions2

Document Type Definitions

  • SGML Element Declarations define the structural elements of a document and its associated markup<!ELEMENT name - - content_model or declared_content +(include_list) -(exclude_list) >

    • Omitted tag minimization indicates whether start-tags or end-tags can be omitted in the markup (o) or (-) are required in SGML but can NOT be used in XML


Document type definitions3

Document Type Definitions

  • Content model provides a nested structural description of the elements that make up this element, e.g.:

    <!ELEMENT memo - - ((to & from), body, close?)>

    <!ELEMENT body - O (p)* >

    <!ELEMENT p - O (#PCDATA | q)*>

    <!ELEMENT q - - (#PCDATA)>...

    • ANY (in SGML) may be used to indicate a content model of any elements in the DTD, in any order


Document type definitions4

Document Type Definitions

  • Same content model in XML

    <?xml version = “1.0”?>

    <!DOCTYPE memo [

    <!ELEMENT memo ((to | from)+, body, close?)><!ELEMENT body (p)* ><!ELEMENT p (#PCDATA | q)* ><!ELEMENT q (#PCDATA)>…

    ]>

    • Note the XML processing instruction “Prolog”

    • Note that & in previous page is not legal XML


Document type definitions5

Document Type Definitions

  • Declared content can be:PCDATA, CDATA, RCDATA, EMPTY

  • Inclusion and Exclusion lists can be used to indicate elements that can occur or are forbidden to occur in any sub-elements of the content model (NOT in XML), e.g.:

    <!ELEMENT memo -- ((to & from), body close?) +(fn)>

    • Says that element fn can appear anyplace in the memo


Document type definitions6

Document Type Definitions

  • Attribute Declarations define attributes associated with (potentially) each element of a document and provide the acceptable values for those attributes


Attributes example

Attributes Example

  • <!ATTLIST associate_element attribute_name declared_value default_value >

  • <!ATTLIST memo status (PUBLIC | CONFIDENTIAL) PUBLIC>

    • In markup of a document: <memo status="CONFIDENTIAL">also, because of the default set:<memo>would be the same as <memo status="PUBLIC">There are a variety of special defaults and data types that can be given in attribute definitions


Sample sgml dtd

Sample SGML DTD

<!doctype ELIB-TEXTS [

<!-- This is a DTD for bibliographic records extracted from the

elib/rfc1357 simple bibliographic format. -->

<!ELEMENT ELIB-TEXTS o o (ELIB-BIB*)>

<!-- We allow most elements to occur any number of times in any order -->

<!-- this is because there is little consistency in the actual usage. -->

<!ELEMENT ELIB-BIB - - (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*,

(SERIES | TYPE | REVISION | REVISION-DATE |

AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL |

AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT

AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION |

ULTIMATE-CLIENT | URL |

KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )>

<!-- We won't make any assumptions about content... all PCDATA -->

<!ELEMENT ID - o (#PCDATA)>

<!ELEMENT ABSTRACT - o (#PCDATA)>

<!ELEMENT AUTHOR-CONTRIBUTING-INSTITUTIONAL - o (#PCDATA)>

<!ELEMENT AUTHOR-CONTRIBUTING-PERSONAL - o (#PCDATA)>

<!ELEMENT AUTHOR-PERSONAL-CONTRIBUTING - o (#PCDATA)>

… etc…

]>


Xml version

XML Version

<!doctype ELIB-TEXTS [

<!-- This is a DTD for bibliographic records extracted from the

elib/rfc1357 simple bibliographic format. -->

<!ELEMENT ELIB-TEXTS(ELIB-BIB*)>

<!-- We allow most elements to occur any number of times in any order -->

<!-- this is because there is little consistency in the actual usage. -->

<!ELEMENT ELIB-BIB (BIB-VERSION, ID, ENTRY?, DATE?, TITLE*, ORGANIZATION*,

(SERIES | TYPE | REVISION | REVISION-DATE |

AUTHOR-PERSONAL | AUTHOR-INSTITUTIONAL | AUTHOR-CONTRIBUTING-PERSONAL |

AUTHOR-CONTRIBUTING-PERSONAL | AUTHOR-CONTRIBUTING-INSTITUTIONAL | CONTACT

AUTHOR | PROJECT | PAGES | BIOREGION | CERES-BIOREGION | TEXTSOUP | LOCATION |

ULTIMATE-CLIENT | URL |

KEYWORDS | NOTES | ABSTRACT)*, (TEXT-REF | PAGED-REF)* )>

<!-- We won't make any assumptions about content... all PCDATA -->

<!ELEMENT ID (#PCDATA)>

<!ELEMENT ABSTRACT (#PCDATA)>

<!ELEMENT AUTHOR-CONTRIBUTING-INSTITUTIONAL (#PCDATA)>

<!ELEMENT AUTHOR-CONTRIBUTING-PERSONAL (#PCDATA)>

<!ELEMENT AUTHOR-PERSONAL-CONTRIBUTING (#PCDATA)>

… etc…

]>


Document using that dtd

Document Using That DTD

<ELIB-BIB>

<BIB-VERSION>ELIB-v1.0 </BIB-VERSION>

<ID>6</ID>

<ENTRY>February 13 1995</ENTRY>

<DATE>March 1, 1993</DATE>

<TITLE>Water Conditions in California Report 2</TITLE>

<ORGANIZATION>California Department of Water Resources</ORGANIZATION>

<SERIES>120-93</SERIES>

<TYPE>bulletin</TYPE>

<AUTHOR-INSTITUTIONAL>California Department of Water Resources

</AUTHOR-INSTITUTIONAL>

<PAGES>17</PAGES>

<TEXT-REF>/elib/data/disk/disk5/documents/6/HYPEROCR/hyperocr.html

</TEXT-REF>

<PAGED-REF>/elib/data/disk/disk5/documents/6/OCR-ASCII-NOZONE

</PAGED-REF>

</ELIB-BIB>


Dublin core

Dublin Core

  • Review…

  • Simple metadata for describing internet resources

  • For “Document-Like Objects”

  • 15 Elements


Dublin core elements

Title

Creator

Subject

Description

Publisher

Other Contributors

Date

Resource Type

Format

Resource Identifier

Source

Language

Relation

Coverage

Rights Management

Dublin Core Elements


Dc xml dtd implementation

DC XML DTD Implementation

  • There have been various versions

  • This one is the one recommended (required) by the Open Archives Initiative Metadata Harvesting Protocol (OAI-MHP)

  • Uses XML Name Spaces

  • Available at http://dublincore.org/documents/2001/09/20/dcmes-xml/


Dc element and attribute definitions

DC Element and Attribute Definitions

<!-- The elements from DCMES 1.1 -->

<!-- The name given to the resource. -->

<!ELEMENT dc:title (#PCDATA)>

<!ATTLIST dc:title xml:lang CDATA #IMPLIED>

<!-- An entity primarily responsible for making the content of the

resource. -->

<!ELEMENT dc:creator (#PCDATA)>

<!ATTLIST dc:creator xml:lang CDATA #IMPLIED>

<!-- The topic of the content of the resource. -->

<!ELEMENT dc:subject (#PCDATA)>

<!ATTLIST dc:subject xml:lang CDATA #IMPLIED>

<!-- An account of the content of the resource. -->

<!ELEMENT dc:description (#PCDATA)>

<!ATTLIST dc:description xml:lang CDATA #IMPLIED>

<!-- The entity responsible for making the resource available. -->

<!ELEMENT dc:publisher (#PCDATA)>

<!ATTLIST dc:publisher xml:lang CDATA #IMPLIED>

<!-- An entity responsible for making contributions to the content of

the resource. -->

<!ELEMENT dc:contributor (#PCDATA)>

<!ATTLIST dc:contributor xml:lang CDATA #IMPLIED>

<!-- A date associated with an event in the life cycle of the resource. -->

<!ELEMENT dc:date (#PCDATA)>

<!ATTLIST dc:date xml:lang CDATA #IMPLIED>


Dc element definitions cont

DC Element Definitions (cont.)

<!-- The nature or genre of the content of the resource. -->

<!ELEMENT dc:type (#PCDATA)>

<!ATTLIST dc:type xml:lang CDATA #IMPLIED>

<!-- The physical or digital manifestation of the resource. -->

<!ELEMENT dc:format (#PCDATA)>

<!ATTLIST dc:format xml:lang CDATA #IMPLIED>

<!-- An unambiguous reference to the resource within a given context. -->

<!ELEMENT dc:identifier (#PCDATA)>

<!ATTLIST dc:identifier xml:lang CDATA #IMPLIED>

<!ATTLIST dc:identifier rdf:resource CDATA #IMPLIED>

<!-- A Reference to a resource from which the present resource is derived. -->

<!ELEMENT dc:source (#PCDATA)>

<!ATTLIST dc:source xml:lang CDATA #IMPLIED>

<!ATTLIST dc:source rdf:resource CDATA #IMPLIED>

<!-- A language of the intellectual content of the resource. -->

<!ELEMENT dc:language (#PCDATA)>

<!ATTLIST dc:language xml:lang CDATA #IMPLIED>

<!-- A reference to a related resource. -->

<!ELEMENT dc:relation (#PCDATA)>

<!ATTLIST dc:relation xml:lang CDATA #IMPLIED>

<!ATTLIST dc:relation rdf:resource CDATA #IMPLIED>

<!-- The extent or scope of the content of the resource. -->

<!ELEMENT dc:coverage (#PCDATA)>

<!ATTLIST dc:coverage xml:lang CDATA #IMPLIED>

<!-- Information about rights held in and over the resource. -->

<!ELEMENT dc:rights (#PCDATA)>

<!ATTLIST dc:rights xml:lang CDATA #IMPLIED>


A more complex sgml dtd

A More Complex SGML DTD

<!DOCTYPE USMARC [

<!-- USMARC DTD. UCB-SLIS v.0.08 -->

<!-- By Jerome P. McDonough, April 1, 1994 -->

<!ELEMENT USMARC - - (Leader, Directry, VarFlds)>

<!ATTLIST USMARC Material (BK|AM|CF|MP|MU|VM|SE) "BK"

id CDATA #IMPLIED>

<!-- Author's Note: the id attribute for the USMARC element is

intended to hold a unique record number for

each MARC record in the local database. That

is to say, it is intended ONLY as an aid in

maintaining the local database of MARC records -->

<!ELEMENT Leader - O (LRL, RecStat, RecType, BibLevel, UCP, IndCount, SFCount,

BaseAddr, EncLevel, DscCatFm, LinkRec, EntryMap)>

<!ELEMENT Directry - O (#PCDATA)>

<!ELEMENT VarFlds - O (VarCFlds, VarDFlds)>

<!-- Component parts of Leader -->

<!-- Logical Record Length -->

<!ELEMENT LRL - O (#PCDATA)>

…etc…


More complex dtd cont

More Complex DTD (cont.)

<!-- Variable Data Fields -->

<!ELEMENT VarDFlds - O (NumbCode, MainEnty?, Titles, EdImprnt?, PhysDesc?,

Series?, Notes?, SubjAccs?, AddEnty?, LinkEnty?,

SAddEnty?, HoldAltG?, Fld9XX?)>

<!-- Component Parts of Variable Data Fields -->

<!-- Numbers & Codes -->

<!ELEMENT NumbCode - O (Fld010?, Fld011?, Fld015?, Fld017*, Fld018?,

Fld019*, Fld020*,

Fld022*, Fld023*, Fld024*, Fld025*, Fld027*,

Fld028*, Fld029*,

Fld030*, Fld032*, Fld033*, Fld034*, Fld035*, Fld036?,

Fld037*, Fld039*, Fld040?, Fld041?, Fld042?,

Fld043?, Fld044?,

Fld045?, Fld046?, Fld047?, Fld048*, Fld050*, Fld051*,

Fld052*, Fld055*, Fld060*, Fld061*, Fld066?,

Fld069*, Fld070*,

Fld071*, Fld072*, Fld074*, Fld080?, Fld082*,

Fld084*, Fld086*, Fld088*, Fld090*, Fld096*)>

<!-- Main Entries -->

<!ELEMENT MainEnty - O (Fld100?, Fld110?, Fld111?, Fld130?)>

<!-- Titles -->

<!ELEMENT Titles - O (Fld210?, Fld211*, Fld212*, Fld214*, Fld222*,

Fld240?, Fld242*, Fld243?, Fld245, Fld246*, Fld247*)>

<!-- Edition, Imprint, etc. -->

<!ELEMENT EdImprnt - O (Fld250?, Fld254?, Fld255*, Fld256?, Fld257?, Fld260?,

Fld261?, Fld262?, Fld263?, Fld265?)>

<!-- Physical Description, etc. -->

<!ELEMENT PhysDesc - O (Fld300*, Fld305*, Fld306?, Fld310?, Fld315?,

Fld321*, Fld340*, Fld350?, Fld351*,

Fld355*, Fld357*, Fld362*)>

…etc…


Complex dtd cont

Complex DTD (cont.)

<!-- Title Statement -->

<!ELEMENT Fld245 - O (Six?, (a|b|c|f|g|h|k|n|p|s)+)>

<!ATTLIST Fld245 AddEnty (No|Yes|Blank) #IMPLIED

NFChars (0|1|2|3|4|5|6|7|8|9|Blnk) #IMPLIED>

…etc…

<!-- Subfield Element Declarations -->

<!ELEMENT a - O (#PCDATA)>

<!ELEMENT b - O (#PCDATA)>

<!ELEMENT c - O (#PCDATA)>

<!ELEMENT d - O (#PCDATA)>

<!ELEMENT e - O (#PCDATA)>


Document markup

Document Markup

  • All document markup is derived from the DTD for the particular document type

  • In SGML the DTD should be referenced in the document using the DOCTYPE declaration:

    <!DOCTYPE name SYSTEM "file_path" >or<!DOCTYPE name SYSTEM "file_path" [doctype_declaration_subset]>or<!DOCTYPE name [doctype_declaration_subset]>The doctype_declaration_subset can be any combination of elements, entity, and attribute declarations


Lecture 14 metadata and markup

HTML

  • HTML was not originally "real" SGML, the DTD was invented after the language

  • It is often more concerned with the form of the output on the screen than with the structural contents of the HTML docs

  • Relies on the application (such as Netscape) to implement interesting actions like hypertext linking

  • XHTML is now a W3C “recommendation” that applies XML conventions to HTML, and provides a growing set of capabilities within an XML framework (our phones use XHTML)


Lecture overview4

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


What are xml schemas

What are XML Schemas?

  • An XML vocabulary for expressing your data's structure AND content types, and even the business rules involved in processing the data

  • Written in XML themselves

  • Support namespaces for combining multiple schemas in the same documents

    • The slides in this section are based on an XML tutorial by Roger L. Costello


Example

Example

<location>

<latitude>32.904237</latitude>

<longitude>73.620290</longitude>

<uncertainty units="meters">2</uncertainty>

</location>

Is this data valid?

To be valid, it must meet these constraints (data business rules):

1. The location must be comprised of a latitude, followed

by a longitude, followed by an indication of the uncertainty

of the lat/lon measurements.

2. The latitude must be a decimal with a value between -90 to +90

3. The longitude must be a decimal with a value between -180 to +180

4. For both latitude and longitude the number of digits to the right

of the decimal point must be exactly six digits.

5. The value of uncertainty must be a non-negative integer

6. The uncertainty units must be either meters or feet.

We can express all these data constraints using XML Schemas


Validating your data

Validating your data

<location>

<latitude>32.904237</latitude>

<longitude>73.620290</longitude>

<uncertainty units="meters">2</uncertainty>

</location>

XML Schema

validator

Data is ok!

-check that the latitude is between -90 and +90

-check that the longitude is between -180 and +180

- check that the fraction digits is 6 for lat and lon

...

XML Schema


Purpose of xml schemas

Purpose of XML Schemas

  • Specify:

    • the structure of instance documents

      • "this element contains these elements, which contains these other elements, etc"

    • the datatype of each element/attribute

      • "this element shall hold an integer with the range 0 to 12,000" (DTDs don't do too well with specifying datatypes like this)


Why schemas

Why Schemas?

Motivation for XML Schemas

  • People are dissatisfied with DTDs

    • It's a different syntax

      • You write your XML (instance) document using one syntax and the DTD using another syntax --> bad, inconsistent

    • Limited datatype capability

      • DTDs support a very limited capability for specifying datatypes. You can't, for example, express "I want the <elevation> element to hold an integer with a range of 0 to 12,000"

    • Desire a set of datatypes compatible with those found in databases

      • DTD supports 10 datatypes; XML Schemas supports 44+ datatypes


Highlights of xml schemas

Highlights of XML Schemas

  • XML Schemas are a tremendous advancement over DTDs:

    • Enhanced datatypes

      • 44+ versus 10

      • Can create your own datatypes

        • Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".

    • Written in the same syntax as instance documents

      • less syntax to remember

    • Object-oriented'ish

      • Can extend or restrict a type (derive new type definitions on the basis of old ones)

    • Can express sets, i.e., can define the child elements to occur in any order


Highlights of xml schemas1

Highlights of XML Schemas

  • Can specify element content as being unique (keys on content) and uniqueness within a region

  • Can define multiple elements with the same name but different content

  • Can define elements with nil content

  • Can define substitutable elements - e.g., the "Book" element is substitutable for the "Publication" element.


Bookstore dtd

BookStore.dtd

<!ELEMENT BookStore (Book)+>

<!ELEMENT Book (Title, Author, Date, ISBN, Publisher)>

<!ELEMENT Title (#PCDATA)>

<!ELEMENT Author (#PCDATA)>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT ISBN (#PCDATA)>

<!ELEMENT Publisher (#PCDATA)>


Lecture 14 metadata and markup

ELEMENT

ATTLIST

BookStore

Author

#PCDATA

Book

ID

Title

CDATA

NMTOKEN

ISBN

Publisher

Date

ENTITY

This is the vocabulary that

DTDs provide to define your

new vocabulary


Lecture 14 metadata and markup

http://www.w3.org/2001/XMLSchema

http://www.books.org (targetNamespace)

complexType

element

BookStore

Author

sequence

Book

schema

Title

boolean

string

ISBN

Publisher

Date

integer

This is the vocabulary that

XML Schemas provide to define your

new vocabulary

One difference between XML Schemas and DTDs is that the XML Schema vocabulary

is associated with a name (namespace). Likewise, the new vocabulary that you

define must be associated with a name (namespace). With DTDs neither set of

vocabulary is associated with a name (namespace) [DTDs pre-dated namespaces].


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

BookStore.xsd

xsd = Xml-Schema Definition


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

<!ELEMENT BookStore (Book)+>

<!ELEMENT Book (Title, Author, Date,

ISBN, Publisher)>

<!ELEMENT Title (#PCDATA)>

<!ELEMENT Author (#PCDATA)>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT ISBN (#PCDATA)>

<!ELEMENT Publisher (#PCDATA)>


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

All XML Schemas have

"schema" as the root

element.


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

The elements and

datatypes that

are used to construct

schemas

- schema

- element

- complexType

- sequence

- string

come from the

http://…/XMLSchema

namespace


Xmlschema namespace

XMLSchema Namespace

http://www.w3.org/2001/XMLSchema

complexType

element

sequence

schema

boolean

string

integer


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

Says that the

elements defined

by this schema

- BookStore

- Book

- Title

- Author

- Date

- ISBN

- Publisher

are to go in this

namespace


Book namespace targetnamespace

Book Namespace (targetNamespace)

http://www.books.org (targetNamespace)

BookStore

Author

Book

Title

ISBN

Publisher

Date


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

The default namespace

Is http://www.books.org

which is the

targetNamespace!

This is referencing a

Book element declaration.

The Book in what

namespace? Since there

is no namespace qualifier

it is referencing the Book

element in the default

namespace, which is the

targetNamespace! Thus,

this is a reference to the

Book element declaration

in this schema.


Lecture 14 metadata and markup

<?xml version="1.0"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.books.org"

xmlns="http://www.books.org"

elementFormDefault="qualified">

<xsd:element name="BookStore">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Book">

<xsd:complexType>

<xsd:sequence>

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>

<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name="Title" type="xsd:string"/>

<xsd:element name="Author" type="xsd:string"/>

<xsd:element name="Date" type="xsd:string"/>

<xsd:element name="ISBN" type="xsd:string"/>

<xsd:element name="Publisher" type="xsd:string"/>

</xsd:schema>

This is a directive to any

instance documents which

conform to this schema:

Any elements used by the

instance document which

were declared in this

schema must be

namespace qualified.


Referencing a schema in an xml instance document

Referencing a schema in an XML instance document

<?xml version="1.0"?>

<BookStore xmlns ="http://www.books.org"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.books.org

BookStore.xsd">

<Book>

<Title>My Life and Times</Title>

<Author>Paul McCartney</Author>

<Date>July, 1998</Date>

<ISBN>94303-12021-43892</ISBN>

<Publisher>McMillin Publishing</Publisher>

</Book>

...

</BookStore>

1

3

2

1. First, using a default namespace declaration, tell the schema-validator that all of the elements

used in this instance document come from the http://www.books.org namespace.

2. Second, with schemaLocation tell the schema-validator that the http://www.books.org

namespace is defined by BookStore.xsd (i.e., schemaLocation contains apair of values).

3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in

the XMLSchema-instance namespace.


Xmlschema instance namespace

XMLSchema-instance Namespace

http://www.w3.org/2001/XMLSchema-instance

schemaLocation

type

noNamespaceSchemaLocation

nil


Referencing a schema in an xml instance document1

Referencing a schema in an XML instance document

targetNamespace="http://www.books.org"

schemaLocation="http://www.books.org

BookStore.xsd"

BookStore.xsd

BookStore.xml

- uses elements from

namespace http://www.books.org

- defines elements in

namespace http://www.books.org

A schema defines a new vocabulary. Instance documents use that

new vocabulary.


Note multiple levels of checking

Note multiple levels of checking

BookStore.xml

BookStore.xsd

XMLSchema.xsd

(schema-for-schemas)

Validate that the xml document

conforms to the rules described

in BookStore.xsd

Validate that BookStore.xsd is a valid

schema document, i.e., it conforms

to the rules described in the

schema-for-schemas


Default value for minoccurs and maxoccurs

Default Value for minOccurs and maxOccurs

  • The default value for minOccurs is "1"

  • The default value for maxOccurs is "1"

<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>

Equivalent!

<xsd:element ref="Title"/>


Much more to xmlschema

Much More to XMLSchema!

  • This was an overview of some basics

  • There are many other features, such as:

    • The ability to import other schemas or parts of schemas

    • Ability to specify many data types

    • Etc.

  • XMLSchema definitions are at W3C

    • http://www.w3.org/TR/xmlschema-0/ is a good place to start


Lecture overview5

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Other protocols and metadata systems using xml

Other Protocols and Metadata Systems Using XML

  • SOAP (Simple Object Access Protocol)

  • DAV/DASL (Distributed Authoring and Versioning)

  • SDLIP (Simple Digital Library Interoperability Protocol)

  • RDF (Resource Description Framework)

  • ADL Gazetteer Protocol

  • OAI-MHP (already discussed)

  • MPEG-7 (more next time)

  • METS

  • Also versions of MARC and other formats in XML


Sgml and xml sources and resources

SGML and XML Sources and Resources

  • Books:

    • van Herwijnen, Eric. Practical SGML. (2nd Ed.) Boston: Kluwer Academic Publishers, 1994.

    • Goldfarb, Charles F. The SGML Handbook. Oxford: Clarenden Press, 1990. (and MANY XML books)

  • Web Sites:

    • The W3C web site (all XML standards documents)

      • http://www.w3.org

    • Robin Cover’s SGML/XML Site

      • http://www.oasis-open.org/cover/sgml-xml.html


Lecture overview6

Lecture Overview

  • Review

    • XML and Document Engineering

  • Metadata And Markup

    • XML As A Metadata Lingua Franca

      • METS

    • SGML vs. XML DTD Construction

    • XML Schemas

    • XML For Protocols And Metadata Languages

  • Readings/Discussion


Discussion vam makam

Discussion – Vam Makam

  • Kirk covers examples of DTDs for books and newspapers. Many individuals and corporations have been creating numerous DTDs for themselves and general purposes. What are some innovative and useful ideas for areas where designing DTDs might be useful? For ideas that may have already been thought of, how could they be improved or extended?


Discussion vam makam1

Discussion – Vam Makam

  • However, recent XML DTDs have emerged, newer ideas such as XML schemas have presented themselves as a better option. Given the thought process and work gone into designing existing DTDs, at what point is it worth modifying an existing DTD to an XML schema?

  • Now that you have learned how to design a dtd and have basic knowledge about XML, what are some existing technologies that combined with XML become more useful?


Discussion annie yeh

Discussion – Annie Yeh

  • Kirk addresses the advantages of using external DTDs, the reusability of public DTDs, the ability to focus on content rather than structure, easier management or multiple documents, and easier data error checking. What are some of the existing repositories in which we can store these DTDs? What are some of the ways with which we can facilitate this process? What are their pros and cons? What are some of the more ideal interfaces with which to facilitate this?


Discussion annie yeh1

Discussion – Annie Yeh

  • What are the differences between DTDs and Schemas, and what are the pros and cons of each?


Next time

Next Time

  • Metadata for Motion Pictures: MPEG-7

  • Readings/Discussion

    • MPEG-7 (Part 1) (J. M. Martinez, R. Koenen, F. Pereira)

    • MPEG-7 (Part 2) (J. Martinez)


  • Login