the oai protocol for metadata harvesting n.
Skip this Video
Download Presentation
The OAI Protocol for Metadata Harvesting

Loading in 2 Seconds...

play fullscreen
1 / 20

The OAI Protocol for Metadata Harvesting - PowerPoint PPT Presentation

  • Uploaded on

The OAI Protocol for Metadata Harvesting. Andy Powell UKOLN, University of Bath IVOA Registry Meeting, London March 2003. Contents. a brief history of OAI 10 technical things you should know about the OAI-PMH. OAI roots.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'The OAI Protocol for Metadata Harvesting' - oliana

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the oai protocol for metadata harvesting

The OAI Protocol for Metadata Harvesting

Andy Powell

UKOLN, University of Bath

IVOA Registry Meeting, London

March 2003

  • a brief history of OAI
  • 10 technical things you should know about the OAI-PMH
oai roots
OAI roots
  • the roots of OAI lie in the development of eprint archives…
    • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL
  • each offered Web interface for deposit of articles and for end-user searches
  • difficult for end-users to work across archives without having to learn multiple different interfaces
  • recognised need for single search interface to all archives
    • Universal Pre-print Service (UPS)
searching vs harvesting
Searching vs. harvesting
  • two possible approaches to building a single search interface to multiple eprint archives…
    • cross-searching multiple archives based on protocol like Z39.50
    • harvesting metadata into one or more ‘central’ services – bulk move data to the user-interface
  • US digital library experience in this area indicated that cross-searching not preferred approach
    • distributed searching of N nodes viable, but only for small values of N
harvesting requirements
Harvesting requirements
  • in order that harvesting approach can work there need to be agreements about…
    • transport protocols – HTTP vs. FTP vs. …
    • metadata formats – DC vs. MARC vs. …
    • quality assurance – mandatory elements, mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice
    • intellectual property and usage rights – who can do what with the records
  • work in this area resulted in the “Santa Fe Convention”
development of oai pmh
Development of OAI-PMH
  • 2 year metamorphosis thru various names
    • Santa Fe Convention, OAI-PMH versions 1.0, 1.1…
    • OAI Protocol for Metadata Harvesting 2.0
  • development steered by international technical committee
  • inter-version stability helped developer confidence
  • move from focus on eprints to more generic protocol
    • move from OAI-specific metadata schema to mandatory support for DC
bluffer s guide to oai
Bluffer’s guide to OAI

  • OAI-PMH is a low-cost mechanism for harvesting metadata records
    • from ‘data providers’ to ‘service providers’
  • allows ‘service provider’ to say ‘give me some or all of your metadata records’
    • where ‘some’ is based on date-stamps, sets, metadata formats
  • not limited to repositories of eprints
    • images, museum artefacts, learning objects, …
  • based on HTTP and XML
    • simple, Web-friendly, autonomous
    • fast, flexible deployment
bluffer s guide to oai1
Bluffer’s guide to OAI
  • OAI-PMH is not a search protocol
    • but use can underpin search-based services based on Z39.50 or SRW or SOAP or…
  • OAI-PMH carries only metadata
    • content (e.g. full-text or image) made available separately – typically at URL in metadata
  • mandates simple DC as record format
    • but extensible to any XML format – IMS, ONIX, MARC, METS, etc.
  • extensible framework for metadata about
    • repository, resources, ‘items’, sets
    • can include rights metadata
bluffer s guide to oai2
Bluffer’s guide to OAI
  • metadata and ‘content’ often made freely available – but not a requirement
    • OAI-PMH can be used between closed groups
    • or, can make metadata available but restrict access to content in some way
  • underlying HTTP protocol provides
    • access control – e.g. HTTP BASIC
    • compression mechanisms (for improving performance of harvesters)
    • could, in theory, also provide encryption if required
resources items and records


Resources, items and records

all available metadata

about David

item = identifier


Dublin Core







protocol requests
Protocol requests
  • six different request types
    • Identify
    • ListMetadataFormats
    • ListSets
    • ListIdentifiers
    • ListRecords
    • GetRecord
  • harvester need not use all types
  • repository must implement all types
  • required and optional arguments
    • on request types
record structure
Record structure
  • metadata about a resource in a particular XML format
    • header (mandatory)
      • identifier (1)
      • datestamp (1)
      • setSpec elements (*)
      • status attribute for deleted item (?)
    • metadata (mandatory)
      • XML encoded metadata within root tag which provides namespace and schema
      • repositories must support Dublin Core
    • about (optional)
      • rights statements
      • provenance statements
dublin core
Dublin Core

  • OAI-PMH mandates use of simple DC as lowest common denominator
  • agreed XML schema – ‘oai_dc’
    • simple DC – 15 metadata properties
    • all DC properties optional and repeatable
oai demonstration
OAI demonstration
  • repository explorer demo
oai and google
OAI and Google







DP9 gateway

OAI gatewaymakes harvested


available to


implementing oai
Implementing OAI
  • OAI protocol is relatively simple
  • implementation and deployment tends to be very fast
  • lots of available toolkits
    • Java, Perl, PHP, etc.
  • complete tools also available
    • e.g. tools that sit in front ofexisting databases
  • see ‘tools’ area on theOAI Web site…
creative commons
Creative Commons

  • CC is “devoted to expanding the range of creative work available for others to build upon and share”
  • provides ‘standard’ licences for content
    • attribution
    • noncommercial
    • no derivative works
    • share alike
  • mechanisms for indicating licence on Web pages
  • need similar mechanism in OAI