Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 109

Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on
  • Presentation posted in: General

Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting. Uwe Müller Humboldt University Berlin, Germany [email protected] Andy Powell UKOLN, University of Bath [email protected] Agenda. Part I

Download Presentation

Tutorial OAI and OAI-PMH for Beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Andy Powell

UKOLN, University of Bath

[email protected]


Agenda l.jpg

Agenda

  • Part I

    • History and overview

  • Part II

    • Technical introduction

  • Coffee/tea break

  • Part III

    • Implementation issues – data provider and service provider

  • Part IV

    • Implementation issues – XML schema and supporting multiple record formats

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners


Acknowledgements l.jpg

Acknowledgements

  • Some of the slides presented here are our own!

  • Many of them have been kindly donated by (taken from!):

    • Herbert Van de Sompel

    • Carl Lagoze

    • Michael Nelson

    • Simeon Warner

    • (and others probably!)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners


Slide4 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

PartI:History and overview

Andy Powell

UKOLN, University of Bath

[email protected]


Oai roots l.jpg

OAI roots…

  • the roots of OAI lie in the development of eprint archives…

    • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL

  • each offered Web interface for deposit of articles and for end-user searches

  • difficult for end-users to work across archives without having to learn multiple different interfaces

  • recognised need for single search interface to all archives

    • Universal Pre-print Service (UPS)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Searching vs harvesting l.jpg

Searching vs. harvesting

  • two possible approaches to building the UPS…

  • cross-searching multiple archives based on protocol like Z39.50

  • harvesting metadata into one or more ‘central’ services – bulk move data to the user-interface

  • US digital library experience in this area (e.g. NCSTRL) indicated that cross-searching not preferred approach - distributed searching of N nodes viable, but only for small values of N

    • NCSTRL: N > 100; bad

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Problems of cross searching l.jpg

Problems of cross-searching

  • collection description

    • how do you know which targets to search?

  • query-language problem

    • syntax varies and drifts over time between the various nodes

  • rank-merging problem

    • how do you meaningfully merge multiple result sets?

  • performance

    • tends to be limited by slowest target

  • difficult to build browse interface

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Universal preprint service l.jpg

Universal Preprint Service

  • a cross-archive DL that that provides services on a collection of metadata harvested from multiple archives

    • based on NCSTRL+; a modified version of Dienst

  • demonstrated at Santa Fe NM, October 21-22, 1999

    • http://ups.cs.odu.edu/

    • D-Lib Magazine, 6(2) 2000 (2 articles)

      • http://www.dlib.org/dlib/february00/02contents.html

  • UPS was soon renamed the Open Archives Initiative (OAI) http://www.openarchives.org/

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Rdn experience l.jpg

RDN experience

  • similar experience within the UK Resource Discovery Network (RDN)

  • cross-searching of only 5 subject gateways

  • problems with cross-searching approach

    • performance

    • central browse interface

  • looking for metadata harvesting solution

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Data and service providers l.jpg

Data and service providers

  • UPS identified two logical groups of services…

  • data providers

    • handle deposit/publishing of resources in archive

    • expose metadata about resources in archive

  • service providers

    • harvest metadata from data providers

    • use it to offer single user-interface across all harvested metadata

  • note:

    • data provider may also be responsible for human-oriented (I.e. Web) interface to archive

    • both functions may be offered by same ‘service’

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Metadata harvesting requirements l.jpg

Metadata harvesting requirements

  • in order that harvesting approach can work need agreements about…

  • transport protocols – HTTP vs. FTP vs. …

  • metadata formats – DC vs. MARC vs. …

  • quality assurance – mandatory elements, mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice

  • intellectual property and usage rights – who can do what with the records

  • work in this area resulted in the “Santa Fe Convention”

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Oai pmh v 1 0 01 2001 l.jpg

OAI-PMH v 1.0 [01/2001]

  • goal: optimise discovery of document-like objects

  • inputs…

    • Santa Fe Convention

    • various DLF meetings on metadata harvesting

    • deliberations at Cornell

    • alpha-testers of OAI-PMH v 1.0

    • recognition of DC as ‘best’ core metadata format for interoperability across multiple archives

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Oai pmh v 1 0 01 200113 l.jpg

OAI-PMH v 1.0 [01/2001]

  • low-barrier interoperability specification

  • metadata harvesting model: data provider / service provider

  • focus on document-like objects

  • autonomous protocol

  • HTTP based

  • XML responses

  • unqualified Dublin Core

  • experimental: 12-18 months

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


What s in a name l.jpg

the protocol is openly

documented, and metadata

is “exposed” to at least some

peer group (note: rights

management can still apply!)

archive defined as a

“collection of stuff” --

not the archivist’s

definition of “archive”.

“Repository” used in

most OAI documents.

OAI is happening

at break-neck speed...

What’s in a name?

Open Archives Initiative

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Oai timeline before v 2 0 l.jpg

OAI timeline before v. 2.0

  • October 21-22, 1999 - initial UPS meeting

  • February 15, 2000 - Santa Fe Convention published in D-Lib Magazine

    • precursor to the OAI metadata harvesting protocol

  • June 3, 2000 - workshop at ACM DL 2000 (Texas)

  • August 25, 2000 - OAI steering committee formed, DLF/CNI support

  • September 7-8, 2000 - technical meeting at Cornell University

    • defined the core of the current OAI metadata harvesting protocol

  • September 21, 2000 - workshop at ECDL 2000 (Portugal)

  • November 1, 2000 - Alpha test group announced (~15 organizations)

  • January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S. (Washington DC)

    • purpose: freeze protocol for 12-16 months, generate critical mass

  • February 26, 2001 - OAI Open Day in Europe (Berlin)

  • July 3, 2001 - OAI protocol 1.1 announced

    • to reflect changes in the W3C’s XML latest schema recommendation

  • September 8, 2001 - workshop at ECDL 2001 (Darmstadt)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Oai pmh v 2 0 06 2002 l.jpg

OAI-PMH v.2.0 [06/2002]

  • goal: recurrent exchange of metadata about resources between systems

  • inputs:

    • OAI-PMH v.1.0

    • feedback on OAI-implementers

    • deliberations by OAI-tech [09/01 - 06/02]

    • alpha test group of OAI-PMH v.2.0 [03/02 - 06/02]

    • officially released June 14, 2002

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Oai pmh v 2 0 06 200217 l.jpg

OAI-PMH v.2.0 [06/2002]

  • low-barrier interoperability specification

  • metadata harvesting model: data provider / service provider

  • metadata about resources

  • autonomous protocol

  • HTTP based

  • XML responses

  • unqualified Dublin Core

  • stable

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Slide18 l.jpg

nature

experimental

experimental

stable

Dienst

verbs

OAI-PMH

OAI-PMH

requests

HTTP GET/POST

HTTP GET/POST

HTTP GET/POST

responses

XML

XML

XML

transport

HTTP

HTTP

HTTP

unqualified

Dublin Core

unqualified

Dublin Core

metadata

OAMS

document

like objects

resources

about

eprints

metadata

harvesting

metadata

harvesting

metadata

harvesting

model

Santa Fe

convention

OAI-PMH

v.1.0/1.1

OAI-PMH

v.2.0

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Flexible deployment l.jpg

Flexible deployment

  • simple protocol based on HTTP and XML allows for rapid deployment

  • a number of toolkits available – see part III

  • systems can be deployed in variety of configurations

  • multiple service providers can harvest from multiple data providers

  • aggregators can sit between data and service providers

  • harvesting approach can be complemented with searching based on Z39.50 or SRW

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Multiple data and service p s l.jpg

Multiple data and service p’s

Data providers

Harvesting

based on

OAI-PMH

Service providers

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Aggregators l.jpg

Aggregators

Data providers

Aggregator

Service providers

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Can be mixed with x searching l.jpg

Can be mixed with x-searching

Data providers

Harvesting

based on

OAI-PMH

Searching

based on

Z39.50 or

SRW

Service providers

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Summary l.jpg

Summary

  • OAI-PMH – OAI Protocol for Metadata Harvesting

  • low-cost mechanism for harvesting metadata records from one system to another

    • from ‘data providers’ to ‘service providers’

  • development over last 2-3 years has seen move from specific (discovery of e-prints) to generic (sharing descriptions of any resources)

  • based on HTTP and XML – Web-friendly

  • allows client to say ‘give me some or all of your records’ where ‘some’ is based on

    • date-stamps, sets, metadata formats

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Summary 2 l.jpg

Summary (2)

  • mandates simple DC as record format but extensible to any format encoded in XML

  • OAI-PMH is not a search protocol

    • but use can underpin search-based services based on Z39.50 or SRW or …

  • metadata and full-text typically made freely available – but not a requirement

    • OAI-PMH can be used between closed groups

  • access-control and compression mechanisms based on underlying HTTP protocol

  • simple protocol allows easy deployment

    • systems can be combined in variety of ways

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Important resources l.jpg

Important resources

  • OAI Web site:

    • http://www.openarchives.org/

  • OAI-PMH specification:

    • http://www.openarchives.org/OAI/openarchivesprotocol.html

  • Implementation guidelines:

    • http://www.openarchives.org/OAI/2.0/guidelines.htm

  • Discussion lists:

    • http://www.openarchives.org/mailman/listinfo/oai-general

    • http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

  • Repository explorer:

    • http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

  • Tools: http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I


Slide26 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

PartII:Technical Introduction

Uwe Müller

Humboldt University Berlin, Germany

[email protected]


Agenda27 l.jpg

Agenda

  • Protocol Basics

  • Protocol Details

  • Request Types

  • Examples

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


The open archives initiative oai l.jpg

The Open Archives Initiative (OAI)

  • Main ideas

    • world-wide consolidation of scholarly archives

    • free access on the archives (at least: metadata)

    • consistent interfaces for archives and service provider

    • low barrier protocol / effortless implementation

    • based on existing standards (e.g. HTTP, XML, DC)

  • Basic functioning

Requests (based on HTTP)

Metadata

Metadata

(Documents)

„Service”

Metadata (encoded in XML)

Harvester

Repository

Service Provider

Data Provider

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Oai general assumptions l.jpg

OAI: General Assumptions

  • two groups of ‘participants’

  • Data Providers (Open Archives, Repositories)

    • free access of metadata

    • not necessarily: free access to full texts / resources

    • easy to implement, low barriers

  • Service Providers

    • use OAI interfaces of the Data Providers

    • harvest and store metadata (no live requests!)

    • may select certain subsets from Data Providers(set hierarchy, date stamp)

    • may enrich metadata

    • offer (value-added) service on the basis of the metadata

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Oai pmh structure model l.jpg

OAI-PMH: Structure Model

Data Provider

e-prints

e-print

Requests:

Identify

ListMetadataformats

ListSets

ListIdentifiers

ListRecords

GetRecord

Repository

Data Provider

Images

e-print

Repository

Service Provider

Data Provider

OPAC

e-print

Repository

Data Provider

Harvester

Data Provider

Responses:

General information

Metadata formats

Set structure

Record identifier

Metadata

Museum

e-print

Repository

Data Provider

Archive

e-print

Repository

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Oai pmh protocol overview l.jpg

OAI-PMH: Protocol Overview

  • protocol based on HTTP

  • request arguments as GET or POST parameters

  • six request types

  • e.g. http://archive.org?verb=ListRecords&from=2002-11-01

  • responses are encoded in XML syntax

  • supports any metadata format (at least: Dublin Core)

  • logical set hierarchy (definition: data providers)

  • date stamps (last change of metadata set)

  • error messages

  • flow control

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Agenda32 l.jpg

Agenda

  • Protocol Basics

  • Protocol Details

  • Request Types

  • Examples

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details definitions l.jpg

Protocol Details: Definitions

Harvester

  • client application issuing OAI-PMH requests

    Repository

  • network accessible server, able to process OAI-PMH requests correctly

    Resource

  • object the metadata is “about”, nature of resources is not defined in the OAI-PMH

    Item

  • component of an repository from which metadata about a resource can be disseminated

  • has an unique identifier

    Record

  • metadata in a specific metadata format

    Identifier

  • unique key for an item in a repository

    Set

  • optional construct for grouping items in a repository

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details definitions 2 l.jpg

resource

Protocol Details: Definitions (2)

all available metadata

about David

item = identifier

item

Dublin Core

metadata

MARC

metadata

SPECTRUM

metadata

records

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details records l.jpg

Protocol Details: Records

  • metadata of a resource in a specific format

  • three parts

    • header (mandatory)

      • identifier (1)

      • datestamp (1)

      • setSpec elements (*)

      • status attribute for deleted item (?)

    • metadata (mandatory)

      • XML encoded metadata with root tag, namespace

      • repositories must support Dublin Core

    • about (optional)

      • rights statements

      • provenance statements

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details datestamps l.jpg

Protocol Details: Datestamps

  • date of last modification of a metadata set

  • mandatory characteristic of every item

  • two possible granularities:YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ

  • function: information on metadata, selective harvesting (from and until arguments)

  • applications: incremental update mechanisms

  • modification, creating, deletion

  • deletion: three support levels

    • no, persistent, transient

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details metadata schema l.jpg

Protocol Details: Metadata Schema

  • OAI-PMH supports dissemination of multiple metadata formats from a repository

  • properties of metadata formats

    • id string to specify the format (metadataPrefix)

    • metadata schema URL (XML schema to test validity)

    • XML namespace URI (global identifier for metadata format)

  • repositories must be able to disseminate unqualified Dublin Core

  • arbitrary metadata formats can be defined and transported via the OAI-PMH

  • returned metadata must comply with XML namespace specification

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details metadata schema 2 l.jpg

Protocol Details: Metadata Schema (2)

  • minimum standard: unqualified Dublin Core

    • http://dublincore.org/

    • Dublin Core Metadata Element Set contains 15 elements

    • elements are optional

    • elements may be repeated

      The Dublin Core Metadata Element Set:

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details sets l.jpg

Protocol Details: Sets

  • logical partitioning of repositories

  • optional – archives do not have to define sets

  • no recommendations

  • not necessarily exhaustive

  • not necessarily strictly hierarchical

  • function: selective harvesting (set parameter)

  • applications: subject gateways, dissertation search engine, …

  • examples (Germany, see http://www.dini.de)

    • publication types (thesis, article, …)

    • document types (text, audio, image, …)

    • content sets, according to DNB (medicine, biology, …)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details request format l.jpg

Protocol Details: Request Format

  • requests must be submitted using the GET or POSTmethods ofHTTP

  • repositories must support both methods

  • at least one key=value pair: verb=[RequestType]

  • additional key=value pairs depend on request type

  • example for GET request: http://archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

  • encoding of special characterse.g. “:” (host port separator) becomes “%3A”

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details response l.jpg

Protocol Details: Response

  • formatted as HTTP responses

  • content type must be text/xml

  • status codes (distinguished from OAI-PMH errors)e.g. 302 (redirect), 503 (service not available)

  • compression: optional in OAI-PMH,only identity encoding is mandatory

  • response format: well formed XML with markup:

    • XML declaration (<?xml version="1.0" encoding="UTF-8" ?>)

    • root element named OAI-PMH with three attributes(xmlns, xmlns:xsi, xsi:schemaLocation)

    • three child elements

      • responseDate (UTC datetime)

      • request (request that generated this response)

      • a) error (in case of an error or exception condition) b) element with the name of the OAI-PMH request

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details flow control l.jpg

Protocol Details: Flow Control

  • four of the request types return a list of entries

  • three of them may reply ‘large’ lists

  • OAI-PMH supports partitioning

  • decision on partitioning: repository

  • response to a request includes

    • incomplete list

    • resumption token + expiration date, size of complete list, cursor (optional)

  • new request with same request type

    • resumption token as parameter

    • all other parameters omitted!

  • response includes

    • next (maybe last) section of the list

    • resumption token (empty if last section of list enclosed)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details flow control 2 l.jpg

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

“have 267, but give you only 100”

100 records + resumptionToken “anyID1”

“want more of this”

archive.org/oai?resumptionToken=anyID1

“have 267, give you another 100”

100 records + resumptionToken “anyID2”

“want more of this”

archive.org/oai?resumptionToken=anyID2

“have 267, give you my last 67”

67 records + resumptionToken “”

Protocol Details: Flow Control (2)

Example

Data Provider

Service Provider

Repository

Harvester

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Protocol details errors and exceptions l.jpg

Protocol Details: Errors and Exceptions

  • repositories must indicate OAI-PMH errors

  • inclusion of one or more error elements

  • defined error identifiers

    • badArgument

    • badResumptionToken

    • badVerb

    • cannotDisseminateFormat

    • idDoesNotExist

    • noRecordsMatch

    • noMetaDataFormats

    • noSetHierarchy

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Agenda45 l.jpg

Agenda

  • Protocol Basics

  • Protocol Details

  • Request Types

  • Examples

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request types l.jpg

Request Types

  • six different request types

    • Identify

    • ListMetadataFormats

    • ListSets

    • ListIdentifiers

    • ListRecords

    • GetRecord

  • harvester has not to use all types

  • repository must implement all types

  • required and optional arguments

  • depend on request types

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type identify l.jpg

Request Type: Identify

functiondescription of an archive

examplearchive.org/oai-script?verb=Identify

parametersnone

errors / exceptionsbadArgumente.g. archive.org/oai-script?verb=Identify&set=biology

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type identify 2 l.jpg

Request Type: Identify (2)

response format

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type listmetadataformats l.jpg

Request Type: ListMetadataFormats

functionretrieve available metadata formats from archive

examplearchive.org/oai-script?verb=ListMetadataFormats&identifier=oai:HUBerlin.de:3000218

parametersidentifier (optional)

errors / exceptionsbadArgumentidDoesNotExiste.g. archive.org/oai-script?verb=ListMetadataFormats&identifier=really-wrong-identifiernoMetadataFormats

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type listsets l.jpg

Request Type: ListSets

functionretrieve set structure of a repository

examplearchive.org/oai-script?verb=ListSets

parametersresumptionToken (exclusive)

errors / exceptionsbadArgumentbadResumptionTokene.g. archive.org/oai-script?verb=ListSets&resumptionToken=any-wrong-token

noSetHierarchy

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type listidentifiers l.jpg

Request Type: ListIdentifiers

functionabbreviated form of ListRecords, retrieving only headers

examplearchive.org/oai-script?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2002-12-01

parametersfrom(optional)until(optional)metadataPrefix(required)set(optional) resumptionToken (exclusive)

errors / exceptionsbadArgument, e.g. …&from=2002-12-01-13:45:00badResumptionTokencannotDisseminateFormatnoRecordsMatchnoSetHierarchy

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type listrecords l.jpg

Request Type: ListRecords

functionharvest records from a repository

examplearchive.org/oai-script?verb=ListRecords&metadataPrefix=oai_dc&set=biology

parametersfrom(optional)until(optional)metadataPrefix(required)set(optional) resumptionToken (exclusive)

errors / exceptionsbadArgumentbadResumptionTokencannotDisseminateFormatnoRecordsMatchnoSetHierarchy

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Request type getrecord l.jpg

Request Type: GetRecord

functionretrieve individual metadata record from a repository

examplearchive.org/oai-script?verb=GetRecord&identifier=oai:HUBerlin.de:3000218&metadataPrefix=oai_dc

parametersidentifier(required)metadataPrefix(required)

errors / exceptionsbadArgumentcannotDisseminateFormatidDoesNotExist

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Agenda54 l.jpg

Agenda

  • Protocol Basics

  • Protocol Details

  • Request Types

  • Examples

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Slide55 l.jpg

Example: http://edoc.hu-berlin.de/OAI-2.0?verb=ListIdentifiers&from=2002-01-06&until=2002-01-08&metadataPrefix=oai_dc&set=doctypes:dissertations

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-10-22T17:49:49+01:00</responseDate> <request verb="ListIdentifiers" from="2002-01-03" until="2002-01-08" metadataPrefix="oai_dc" set="doctypes:dissertations">http://edoc.hu-berlin.de/OAI-2.0</request> <ListIdentifiers> <header> <identifier>oai:HUBerlin.de:3000819</identifier> <datestamp>2002-01-08</datestamp> <setSpec>doctypes</setSpec> <setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb33</setSpec> </header> <header> <identifier>oai:HUBerlin.de:3000831</identifier> <datestamp>2002-01-07</datestamp> <setSpec>doctypes</setSpec> <setSpec>doctypes:dissertations</setSpec> <setSpec>dnb</setSpec> <setSpec>dnb:dnb27</setSpec> </header> </ListIdentifiers> </OAI-PMH>

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Slide56 l.jpg

Example: http://edoc.hu-berlin.de/OAI-2.0?verb=GetRecord&identifier=oai:HUBerlin:3000819&metadataPrefix=oai_dc

<?xml version="1.0" encoding="UTF-8"?>

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/

http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">

<responseDate>2002-11-27T14:57:01+01:00</responseDate>

<request verb="GetRecord" metadataPrefix="oai_dc"

identifier="oai:HUBerlin.de:3000819">http://edoc.hu-berlin.de/OAI-2.0</request>

<GetRecord>

<record>

<header>

<identifier>oai:HUBerlin.de:3000819</identifier>

[…]

</header>

<metadata>

<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"

xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/

http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

<dc:title>Einfluß genetischer Variationen im Tumor Nekrose […]</dc:title>

<dc:creator>Schüttlöffel, Antje</dc:creator>

[…]

</metadata>

</record>

</GetRecord>

</OAI-PMH>

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Technical introduction questions l.jpg

Technical Introduction: Questions?

OAI – official site

http://www.openarchives.org/

protocol specificationhttp://www.openarchives.org/OAI/openarchivesprotocol.html

general mailing listhttp://www.openarchives.org/mailman/listinfo/OAI-general/

implementers mailing listhttp://www.openarchives.org/mailman/listinfo/OAI-implementers/

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II


Slide58 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

PartIII:Implementation Issues

Data Provider and Service Provider

Uwe Müller

Humboldt University Berlin, Germany

[email protected]


Agenda59 l.jpg

Agenda

  • General Considerations

  • Data Provider

  • Service Provider

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


General first questions l.jpg

General: First Questions

Data Provider

  • Which data do I want to deliver?

  • Which service providers do I want to provide with data?

    Service Provider

  • Which Service do I want to provide?

  • From which data providers do I get the metadata?

  • In which way the metadata have to be processed?

    Data Provider & Service Provider

  • Which aspects do we have to agree upon?

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


General metadata formats sets l.jpg

General: Metadata Formats / Sets

  • required: unqualified Dublin Core

  • special subjects / communities: other metadata specifications may be required

    • describe resources in a specialised way

    • definition of an XML schema (publicly available for validation)

  • define set hierarchy

    • sensible partitioning for selective harvesting

    • agreement between data providers and between data and service providers

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


General organisational structure l.jpg

General: Organisational Structure

  • aggregated data providers

    • if harvested by a service provider, “sub data providers” should not be harvested by same SP (duplication ...)

  • subject gateways

    • selective harvesting if corresponding sets have been defined and implemented

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Agenda63 l.jpg

Agenda

  • General Considerations

  • Data Provider

  • Service Provider

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider prerequisites l.jpg

Data Provider: Prerequisites

  • metadata on resources (“items”)

    • should be stored in (SQL) database

    • possible in case of need: file system …

    • unique identifier for each item

  • web server, accessible via the internet

    • e.g. apache, IIS

  • programming interface / API

    • e.g. Perl, PHP, Java-Servlet

    • web server extension

    • access to database (or filesystem)

    • not needed: session management

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider prerequisites 2 l.jpg

Data Provider: Prerequisites (2)

  • archive identifier / base URL

  • unique identifier for items

  • metadata format (at least: unqualified Dublin Core)

  • datestamps for metadata (created / last modified)

  • logical set hierarchy (may have)

    • agreement within (subject) communities

  • flow control / implementation of resumption token (optional, ‘larger’ archives should have that)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider architecture l.jpg

Web server (e.g. Apache, IIS)

Programming extension (e.g. PHP, Perl,JavaServlets)

Script / Programme- parsing arguments- creating error messages- creating SQL statements-creating XML output

OAI response

(XML instance)

SQL-Database

SQL request

DB response

OAI Data Provider

Data Provider:Architecture

OAI request

(HTTP request)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider general structure l.jpg

Data Provider: General Structure

Argument Parser

  • validates OAI requests

    Error Generator

  • creates XML responses with encoded error messages

    Database Query / Local Metadata Extraction

  • retrieves metadata from repository

  • according to the required metadata format

    XML Generator / Response Creation

  • creates XML responses with encoded metadata information

    Flow Control

  • realises incomplete list sequences for ‘larger’ repositories

  • uses resumption token as mechanism

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider flow chart l.jpg

verb

XML response

ListIdentifiers

Identify

ListMetadata-Formats

ListSets

ListRecords

GetRecord

else

error: badArgument

error: badVerb

empty

re

sumptionToken

empty

else

error: cannotDiss-eminateFormat

deliver min (rows, 100)

record headers

unknown

store parameters,

store and deliver

resumptionToken

valid

oai_dc

parse the otherparameters

read parameters

from local system

yes

rows>

100

no

send SQL request

to database

error: badResumptionToken

Data Provider: Flow Chart

  • verb, metadataPrefix, resump-tionToken … OAI arguments

  • rows … size of the result list

  • 100 … here: maximal list sizefor responses

HTTP request

metadataPrefix

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider resumption token l.jpg

Data Provider: Resumption Token

  • should be implemented for “large” lists

  • initiated by data provider

  • store parameters (set, from, …) and number of already delivered records

  • properties

    • expiration: expirationDate (optional)

    • completeListSize (optional)

    • already delivered records: cursor (optional)

    • recovery from network errors (possibility to re-issue most recent resumption token)

  • problem

    • database changes

    • two possible solutions

      • duplicate data in a “request table”

      • store date of first request with the other parameters  use like additional until argument

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider resumption token 2 l.jpg

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

“have 267, but give you only 100”

100 records + resumptionToken “anyID1”

“want more of this”

archive.org/oai?resumptionToken=anyID1

“have 267, give you another 100”

100 records + resumptionToken “anyID2”

“want more of this”

archive.org/oai?resumptionToken=anyID2

“have 267, give you my last 67”

67 records + resumptionToken “”

Data Provider: Resumption Token (2)

Example

Data Provider

Service Provider

Repository

Harvester

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider resumption token 3 l.jpg

“want to have all your records”

archive.org/oai?verb=ListRecords&metadataPrefix=oai_dc

select dc-data

from metadata-table

insert,update,delete

“have 267, but give you only 100”

267 records

100 records + resumptionToken “anyID1”

1

2

“want more of this”

3

archive.org/oai?resumptionToken=anyID1

4

“have 268, give you another 100”

5

select dc-data

from metadata-table

100 records + resumptionToken “anyID2”

268 records

Data Provider: Resumption Token (3)

Example (2)

Data Provider

anyID1 = { from=empty, until=empty, set=empty, mdP=oai_dc, date= 2002-12-05T15:00:00Z, delivered=100}

Database

Repository

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider data representation l.jpg

Data Provider: Data Representation

  • use recommended data representation

    • dates

      • 2002-12-05

      • 2002-xx-xx, 2002, 05.12.2002

    • language code

      • eng, ger, ...

      • en, de, english, german

  • multi values: use own XML element for each entity

    • author

      • <dc:creator>Smith, Adam</dc:creator><dc:creator>Nash, John</dc:creator>

      • <dc:creator>Smith, Adam; Nash, John</dc:creator>

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider compression l.jpg

Data Provider: Compression

  • method to reduce traffic and enhance performance

  • optional for both sides: data and service providers

  • handled on HTTP level

  • harvesters may include an Accept-Encoding header in their requests –specifying preferences

  • harvesters without Accept-Encoding header always receive uncompressed data

  • repositories must support HTTP identity encoding

  • repositories should specify supported encodings by including compression elements in the identify response

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data provider test and registration l.jpg

Data Provider: Test and Registration

  • create own OAI-PMH requests and send to OAI interface – check results

  • use the Repository Explorer (VT University)

    • http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/

    • provide arguments via HTML forms

    • responses are validated

    • ‘browsing’ to other requests

    • automatic conformance tester

  • official registration site

    • http://www.openarchives.org/data/registerasprovider.html

    • provide base URL

    • extensive conformance test (incl. error conditions …)

    • information on incorrect behaviour

    • in case of conformance – added to the official list

    • regular checks

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Agenda75 l.jpg

Agenda

  • General Considerations

  • Data Provider

  • Service Provider

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider examples l.jpg

Service Provider: Examples

  • Repository Explorer:

    • http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/

  • search engines / subject gateways

    • Cross Archive Searching Service: http://arc.cs.odu.edu/

    • MyOAI: http://www.myoai.org/

    • DINI: http://edoc.hu-berlin.de/oaisearch/

    • Physnet: http://physnet.uni-oldenburg.de/oai/query.php

  • internal communication

    • ProPrint: http://edoc.hu-berlin.de/proprint/

    • library compounds

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider prerequisites l.jpg

Service Provider: Prerequisites

  • internet connected server

  • database system (relational or XML)

  • programming environment

    • can issue HTTP requests to web servers

    • can issue database requests

    • XML parser

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider structure 1 l.jpg

Service Provider: Structure (1)

Archive Management

  • selection of archives to be harvested

  • enter entries manually or

  • automatically add / remove archives using the official registry

    Request Component

  • creates HTTP requests and sends them to OAI archives (data provider)

  • demands metadata using the allowed verbs of the OAI-PMH

  • possibly selective harvesting (set parameter)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider structure 2 l.jpg

Service Provider: Structure (2)

Scheduler

  • realises timed and regular retrieval of the associated archives

  • simplest case: manual initiation of the jobs

  • else: e.g. cron job …

    Flow Control

  • resumption token: partitioning of the result list into incomplete sections – anew request to retrieve more results

  • HTTP error 503 (service not available) – analysis of response to extract “retry-after” period

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider structure 3 l.jpg

Service Provider: Structure (3)

Update Mechanism

  • realises consolidation of metadata which have been harvested earlier (merge old and new data)

  • easiest case: always delete all ‘old’ metadata of an archive before harvesting it

  • reasonable: incremental update (from parameter) – insert new metadata and overwrite changed / deleted metadata (assignment using the unique identifiers)

    XML Parser

  • analyses the responses received from the archives

  • validation: using the XML schema

  • transforms the metadata encoded in XML into the internal data structure

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider structure 4 l.jpg

Service Provider: Structure (4)

Normaliser

  • transforms data into a homogenous structure (different metadata formats)

  • harmonises representation (e.g. date, author, language code)

  • maps / translates different languages

    Database

  • mapping the XML structure of the metadata into a relational database (multi values …)

  • or: use an XML database

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider structure 5 l.jpg

Service Provider: Structure (5)

Duplication Checker

  • merges identical records from different data providers

  • possibility: unique identifier for the item (e.g. URN, …)

  • but: often not easily practicable and not risk / error free

    Service Module

  • provides the actual service to the ‘public’

  • basis: harvested and stored records of the associated archives

  • uses only local database for requests etc.

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider architecture l.jpg

Service Provider: Architecture

User

Harvester

User

Administrator

OAI Service Provider

Scheduler

Service

module

Normaliser

Update

mechanism

Database

XML Parser

Flow control

Dublication

checker

Data Provider

Data Provider

Data Provider

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider resumption token l.jpg

Service Provider: Resumption Token

  • optional from the data provider’s point of view

  • but: mandatory for service providers

  • for complete lists: resume sequences of incomplete lists

    • ‘recognise’ that response contains incomplete list

    • re-issue OAI request to data provider in order to get next part of the list

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Service provider test and registration l.jpg

Service Provider: Test and Registration

  • harvest registered ( OAI complient!) data providers

  • test behaviour of service provider

  • official registration site

    • http://www.openarchives.org/service/registerasprovider.html

    • provide institutional information

    • web site, email address, ...

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Data service provider questions l.jpg

Data & Service Provider: Questions?

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III


Slide87 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

PartIV:Implementation issues -XML schemas and support for multiple record formats

Andy Powell

UKOLN, University of Bath

[email protected]


Agenda88 l.jpg

Agenda

  • basics

  • XML schema details

  • extending oai_dc for your application

  • using IMS metadata as new record format

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Basics l.jpg

Basics

  • OAI-PMH uses XML Schemas to define record formats

  • you can exchange any data you like using OAI-PMH as long as you can encode it as XML and define an XML-Schema for it!

  • OAI-PMH mandates the ‘oai_dc’ XML schema

  • OAI-PMH documentation also describes use of XML schema to exchange

    • rfc1807: a schema for rfc1807 format metadata; marc21: a recommended schema for MARC21 metadata, provided by the Library of Congress;oai_marc: a schema for MARC format metadata

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


A closer look at oai dc l.jpg

A closer look at oai_dc

  • the simple DC schema used as mandatory record format in OAI-PMH defines a container schema

  • container schema is OAI-specific

  • container schema is hosted on the OAI Web site

  • imports a generic DCMES schema

  • generic DCMES schema is hosted on the DCMI Web site

  • same model likely to be used for ‘qualified’ DC schema – container schema hosted by OAI, generic schema hosted by DCMI

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


An oai dc record l.jpg

An oai_dc record…

  • an example oai_dc record (viewed via the repository explorer)

  • here’s the full GetRecord response

  • three important things to notice…

  • namespace for the oia_dc format

    • xmlns:oai_dc=http://www.openarchives.org/OAI/2.0/oai_dc/

  • namespace for DCMES elements

    • xmlns:dc=http://purl.org/dc/elements/1.1/

  • container schema associated with the oai_dc namespace

    • xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


The xml schemas l.jpg

The XML schemas

  • The oai_dc container schema

    • http://www.openarchives.org/OAI/2.0/oai_dc.xsd

  • imports DCMES schema from

    • http://dublincore.org/schemas/xmls/simpledc20020312.xsd

  • defines a container element called ‘dc’

  • lists the allowed elements within the ‘dc’ container (from the DCMES namespace/schema above)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


When oai dc isn t enough l.jpg

When oai_dc isn’t enough

  • when the 15 DCMES elements are too limited – e.g. adding extra metadata elements

  • when you need greater precision in your metadata records – e.g. adding ‘encoding schemes’ to existing elements

  • when you want to exchange other metadata formats

    • IMS/IEEE LOM – eLearning metadata

    • ODRL – Open Digital Rights Language

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Extending the oai dc schema l.jpg

Extending the oai_dc schema

  • simple scenario…

  • RDN currently uses oai_dc schema to exchange records but wants to add one additional element called

    • accessControl

  • note: this is not a real scenario…

    • RDN really wants to use qualified DC records – but doing qualified DC too complicated for this tutorial!

    • hope to write-up RDN work on exchanging qualified DC in future issue of Ariadne

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 1 metadata format name l.jpg

Step 1 – metadata format name

  • the new metadata format needs a name

  • in this case, we’ve chosen

    • rdn_dc

  • following OAI’s naming of ‘oai_dc’

  • alternative possibilities

    • rdndc

    • rdn

    • etc.

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 2 create namespaces l.jpg

Step 2 – create namespaces

  • two namespaces are required…

  • namespace for the rdn_dc format

    • http://www.rdn.ac.uk/oai/rdn_dc/

  • namespace for the new metadata elements (properties) that we are going to use in this format

    • http://purl.org/rdn/terms/

  • note:

    • use of Purl for the elements namespace follows DCMI usage but is not mandatory

    • however, both these namespace URIs should be under your control to ensure uniqueness and prevent re-use in the future

    • URIs do not need to resolve to anything

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 3 local copy of dc schema l.jpg

Step 3 – local copy of DC schema

  • make local copy of the DCMES schema

  • in this case the copy is at

    • http://www.rdn.ac.uk/oai/rdn_dc/20021204/dc.xsd

  • this step isn’t strictly necessary

  • in fact – it is probably bad practice to do this

  • but, currently some minor problems with the DCMI-hosted copy of the schema

  • …working with local copy is easier

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 4 schema for new terms l.jpg

Step 4 – schema for new terms

  • create an XML schema for the new ‘rdnterms’

  • in this case the schema is available at

    • http://www.rdn.ac.uk/oai/rdn_dc/20021204/rdnterms.xsd

  • the schema defines the new element/property

    • accessControl

  • and adds it to the dc:any group

  • also creates a new container type

    • rdnterms:elementContainer

  • note:

    • schema URI contains a date-stamp

    • this should make future enhancements to the schema easier to implement

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 5 container schema l.jpg

Step 5 – container schema

  • create a container schema for the new record format

  • in this case the schema is available at

    • http://www.rdn.ac.uk/oai/rdn_dc/20021204/rdn_dc.xsd

  • this simply imports the rdnterms schema

  • then defines a container element called ‘rdndc’ of type

    • rdnterms:elementContainer

  • again, the schema URI contains a date-stamp

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 6 validate validate val l.jpg

Step 6 – validate, validate, val…

  • create some test records using your new schemas

    • http://www.rdn.ac.uk/oai/rdn_dc/20021204/test.xml

    • http://www.rdn.ac.uk/oai/rdn_dc/20021204/oai-test.xml

  • use the XML schema validator at

    • http://www.w3.org/2001/03/webdata/xsv

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 7 listmetadataformats l.jpg

Step 7 – ListMetadataFormats

  • add information about the new format to your repository’s response to the ‘ListMetadataFormats’ request…

<metadataFormat>

<metadataPrefix>rdn_dc</metadataPrefix>

<schema>http://www.rdn.ac.uk/oai/rdn_dc/20021113/rdn_dc.xsd</schema>

<metadataNamespace>http://www.rdn.ac.uk/oai/rdn_dc/</metadataNamespace>

</metadataFormat>

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 8 other verbs l.jpg

Step 8 – other verbs

  • modify your repository’s response to the ‘ListSets’, ‘ListIdentifiers’, ‘ListRecords’ and ‘GetRecord’ requests

  • accept ‘metadataPrefix’ set to new format name ‘rdn_dc’

  • return records formatted according to the new schema(s)

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Step 9 validate again l.jpg

Step 9 – validate again

  • use the Repository Explorer to check that:

  • all requests work with new ‘metadataPrefix’

  • oai_dc format still works!

  • appropriate records are returned for each format

  • responses validate correctly

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Summary104 l.jpg

Summary

  • decide on name for your new metadata format and appropriate namespaces

  • develop XML schemas for container and new elements if appropriate

  • create test records and validate

  • modify your repository (source code and/or configuration files) to support the new format

  • validate and test repository

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Other record formats l.jpg

Other record formats

  • can take similar approach with other metadata record formats

    • IMS/IEEE LOM

    • ODRL

  • in these cases, XML schemas and namespaces have already been agreed

  • deployment of these formats should be easier because you don’t need to define your own schemas…

    • BUT… XML schema specs continually undergoing revisions currently so sometimes hard for applications like IMS to keep up!

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Adding support for ims l.jpg

Adding support for IMS

  • modify ‘ListMetadataFormats’ response to include

  • extend ‘ListSets’, ‘ListIdentifiers’, ‘ListRecords’ and ‘GetRecord’ requests

    • accept ‘metadataPrefix’ set to ‘ims’ and return records formatted appropriately

<metadataFormat>

<metadataPrefix>ims</metadataPrefix>

<schema>http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd</schema>

<metadataNamespace>

http://www.imsglobal.org/xsd/imsmd_v1p2

</metadataNamespace>

</metadataFormat>

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV


Slide107 l.jpg

TutorialOAI and OAI-PMH for BeginnersAn introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting


Summary108 l.jpg

Summary

  • during today’s tutorial we hope that you have

  • gained an overview of the history behind the OAI-PMH and an overview of its key features

  • been given a deeper technical insight into how the protocol works

  • learned something about some of the main implementation issues

  • found some useful starting points and hints that will help you as implementors

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners


Questions l.jpg

Questions

  • now…

  • feel free to tell us what you didn’t understand

  • and ask general questions (of course!)

Uwe Müller

Humboldt University Berlin, Germany

[email protected]

Andy Powell

UKOLN, University of Bath

[email protected]

2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners


  • Login