1 / 20

The OAI Protocol for Metadata Harvesting

The OAI Protocol for Metadata Harvesting. Andy Powell a.powell@ukoln.ac.uk UKOLN, University of Bath IVOA Registry Meeting, London March 2003. Contents. a brief history of OAI 10 technical things you should know about the OAI-PMH. OAI roots.

oliana
Download Presentation

The OAI Protocol for Metadata Harvesting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The OAI Protocol for Metadata Harvesting Andy Powell a.powell@ukoln.ac.uk UKOLN, University of Bath IVOA Registry Meeting, London March 2003

  2. Contents • a brief history of OAI • 10 technical things you should know about the OAI-PMH

  3. OAI roots • the roots of OAI lie in the development of eprint archives… • arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL • each offered Web interface for deposit of articles and for end-user searches • difficult for end-users to work across archives without having to learn multiple different interfaces • recognised need for single search interface to all archives • Universal Pre-print Service (UPS)

  4. Searching vs. harvesting • two possible approaches to building a single search interface to multiple eprint archives… • cross-searching multiple archives based on protocol like Z39.50 • harvesting metadata into one or more ‘central’ services – bulk move data to the user-interface • US digital library experience in this area indicated that cross-searching not preferred approach • distributed searching of N nodes viable, but only for small values of N

  5. search service …or… search service Searching vs. harvesting

  6. Harvesting requirements • in order that harvesting approach can work there need to be agreements about… • transport protocols – HTTP vs. FTP vs. … • metadata formats – DC vs. MARC vs. … • quality assurance – mandatory elements, mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice • intellectual property and usage rights – who can do what with the records • work in this area resulted in the “Santa Fe Convention”

  7. Development of OAI-PMH • 2 year metamorphosis thru various names • Santa Fe Convention, OAI-PMH versions 1.0, 1.1… • OAI Protocol for Metadata Harvesting 2.0 • development steered by international technical committee • inter-version stability helped developer confidence • move from focus on eprints to more generic protocol • move from OAI-specific metadata schema to mandatory support for DC

  8. Bluffer’s guide to OAI http://www.openarchives.org/ • OAI-PMH is a low-cost mechanism for harvesting metadata records • from ‘data providers’ to ‘service providers’ • allows ‘service provider’ to say ‘give me some or all of your metadata records’ • where ‘some’ is based on date-stamps, sets, metadata formats • not limited to repositories of eprints • images, museum artefacts, learning objects, … • based on HTTP and XML • simple, Web-friendly, autonomous • fast, flexible deployment

  9. Bluffer’s guide to OAI • OAI-PMH is not a search protocol • but use can underpin search-based services based on Z39.50 or SRW or SOAP or… • OAI-PMH carries only metadata • content (e.g. full-text or image) made available separately – typically at URL in metadata • mandates simple DC as record format • but extensible to any XML format – IMS, ONIX, MARC, METS, etc. • extensible framework for metadata about • repository, resources, ‘items’, sets • can include rights metadata

  10. Bluffer’s guide to OAI • metadata and ‘content’ often made freely available – but not a requirement • OAI-PMH can be used between closed groups • or, can make metadata available but restrict access to content in some way • underlying HTTP protocol provides • access control – e.g. HTTP BASIC • compression mechanisms (for improving performance of harvesters) • could, in theory, also provide encryption if required

  11. resource Resources, items and records all available metadata about David item = identifier item Dublin Core metadata MARC metadata SPECTRUM metadata records

  12. Protocol requests • six different request types • Identify • ListMetadataFormats • ListSets • ListIdentifiers • ListRecords • GetRecord • harvester need not use all types • repository must implement all types • required and optional arguments • on request types

  13. Record structure • metadata about a resource in a particular XML format • header (mandatory) • identifier (1) • datestamp (1) • setSpec elements (*) • status attribute for deleted item (?) • metadata (mandatory) • XML encoded metadata within root tag which provides namespace and schema • repositories must support Dublin Core • about (optional) • rights statements • provenance statements

  14. Dublin Core http://dublincore.org/ • OAI-PMH mandates use of simple DC as lowest common denominator • agreed XML schema – ‘oai_dc’ • simple DC – 15 metadata properties • all DC properties optional and repeatable

  15. OAI demonstration • repository explorer demo

  16. OAI and Google eprint archive(s) Web site(s) multimedia database(s) DP9 gateway OAI gatewaymakes harvested metadata available to Google…

  17. Implementing OAI • OAI protocol is relatively simple • implementation and deployment tends to be very fast • lots of available toolkits • Java, Perl, PHP, etc. • complete tools also available • e.g. tools that sit in front ofexisting databases • see ‘tools’ area on theOAI Web site…

  18. Creative Commons http://www.creativecommons.org/ • CC is “devoted to expanding the range of creative work available for others to build upon and share” • provides ‘standard’ licences for content • attribution • noncommercial • no derivative works • share alike • mechanisms for indicating licence on Web pages • need similar mechanism in OAI

  19. Questions…

More Related