The OAI Protocol for Metadata Harvesting
Download
1 / 36

the OAI Protocol for Metadata Harvesting an update - PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on

the OAI Protocol for Metadata Harvesting an update. H erbert V an de S ompel Los Alamos National Laboratory – Research Library.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' the OAI Protocol for Metadata Harvesting an update' - jud


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

the OAI Protocol for Metadata Harvesting

an update

Herbert Van de Sompel

Los Alamos National Laboratory – Research Library


The Open ArchivesInitiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance.

Paul Ginsparg, Rick Luce & Herbert Van de Sompel



  • 2 core motivations

    • as a systems librarian: change the system

    • as a researcher: find (technical) ways to facilitate the change


P

U

B

D I S

L

I

B

A

R

as a systems librarian

optimizing the output

the input is far from optimal


eprint systems

  • xxx e-print archive

  • (Physics - 1991 - Los Alamos - Ginsparg)

  • RePEc

  • (Economy - Surrey U - Krichel)

  • NCSTRL

    • (Computer Science - Cornell U - Lagoze)

  • NDLTD

    • (Theses - Virginia Tech - Fox)

  • CogPrints

    • (Cognitive Sciences - Southampton U - Harnad)


  • as a researcher

    • eprints are attractive building block in ongoing transformation of scholarly communication

    • but: interoperability could increase impact of e-prints:

      • amongst e-print solutions

      • with building blocks that implement other functions of scholarly communication

      • with the established communication system


    UPS Prototype: eprints discovery

    • 1999: Van de Sompel, Krichel, Nelson

    • results:

      • insights regarding how un-interoperable the systems were

      • a cross-repository searching and linking service

      • recommendations to the Santa Fe meeting:

        • data provider / service provider model

        • metadata harvesting

        • simplicity


    evolution towards OAI-PMH v.2.0

    • Santa Fe Convention [02/2000]

    • OAI-PMH 1.0 [01/2001]

    • OAI-PMH 2.0 [06/2002]


    nature

    experimental

    experimental

    stable

    Dienst

    verbs

    OAI-PMH

    OAI-PMH

    requests

    HTTP GET/POST

    HTTP GET/POST

    HTTP GET/POST

    responses

    XML

    XML

    XML

    transport

    HTTP

    HTTP

    HTTP

    unqualified

    Dublin Core

    unqualified

    Dublin Core

    metadata

    OAMS

    document

    like objects

    resources

    about

    eprints

    metadata

    harvesting

    metadata

    harvesting

    metadata

    harvesting

    model

    Santa Fe

    convention

    OAI-PMH

    v.1.0/1.1

    OAI-PMH

    v.2.0


    Requests

    repos i tory

    harves ter

    Replies

    OAI-PMH model

    service provider

    data provider

    6 OAI-PMH


    repos i tory

    harves ter

    OAI-PMH model

    service provider

    data provider

    • Supporting protocol requests:

    • Identify

    • ListMetadataFormats

    • ListSets

    • Harvesting protocol requests:

    • ListRecords

    • ListIdentifiers

    • GetRecord


    repos i tory

    harves ter

    OAI-PMH model

    service provider

    data provider

    Datestamp

    Identifier

    Set

    Records


    A&I

    image

    FTXT

    OPAC

    e-print

    federated services


    A&I

    image

    OPAC

    e-print

    harvester

    FTXT

    metadata harvesting via OAI-PMH

    metadata

    FTXT


    A&I

    image

    FTXT

    e-print

    OPAC

    Author

    Title

    Abstract

    Identifer

    metadata harvesting via OAI-PMH

    metadata


    issue solved?

    • no, just a tiny part of the technical challenges to support discovery

    • many more technical issues

    • even more non-technical issues


    A

    R

    interoperable grid

    issue solved? technical

    awareness

    certification

    rewarding

    registration

    archiving


    issue solved? non-technical

    • I am happy to leave those to you

    • but: even for non-technological issues, part of the answer might be found in applying technology


    indicators of adoption of OAI-PMH

    • data providers

    • service providers

    • tools

    • structural support


    data providers

    • 49 registered repositories [11/2001]

    • 65 registered repositories [03/2002]

      • 5+ million records

    • many unregistered repositories


    service providers

    • Arc : cross-searching of registered repositories [Old Dominion U]

    • [ http://arc.cs.odu.edu ]

    • OLAC: cross-searching of Language Archive Community repositories

    • http://www.language-archives.org/index.html


    service providers

    • Scirus scientific search engine [Elsevier]

      • [ http://www.scirus.com ]

    • my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.]

    • [http://www.myoai.com]

    • growing interest from web search engines


    OAI-PMH tools

    • Repository Explorer: interactive exploration of repositories [Virginia Tech]

    • [ http://www.purl.org/NET/oai_explorer ]

    • eprints.org: generic OAI-PMH compliant repository software [U of Southampton]

    • [ http://www.eprints.org ]

    • ALCME repository and harvester software [OCLC]

    • [ http://alcme.oclc.org/index.html ]


    OAI-PMH flies: structural support

    • Metadata Harvesting Initiative of the Mellon Foundation

    • NSDL (NSF funded)

    • UK FAIR call for proposals to support disclosure of institutional assets (papers, learning materials, etc.)

    • Institute for Museum and Library Services

    • several EC projects exploring/supporting usage of OAI-PMH: TEL, Leaf, Cyclades, OA Forum


    OAI-PMH flies: and also …

    • Australian Museums Online & CIMI : OAI conference

    • NIMH white paper on data archiving for Animal Cognition Research

    • Library of Congress

    • National Library of Canada

    • OCLC thesis database

    • Illinois State Library Catalogue


    future

    • OAI

    • OAI-PMH

    • communities

    • adoption


    the OAI-PMH

    • release of OAI-PMH v.2.0 [06/2002]

      • no backwards compatibility with v.1.0/1.1

      • stable

      • migration process for registered repos

    • ? formal standardization ?

    • ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ?


    communities

    • proliferation of community-specific add-ons for:

      • collection & set level metadata

      • expressive metadata formats (e.g. qualified DC XML Schema)

      • shared set-structures

      • machine readable rights (about the metadata)


    adoption

    • evolution

      • from talking about OAI-PMH

      • to talking about projects that use OAI-PMH

      • to talking about projects and failing to mention they use OAI-PMH

      • => OAI-PMH becomes part of the infrastructure


    I just wanted to report what I consider an OAI success. I discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials Initiative service without the need for a single e-mail or phone call. They reported that it was working very well for them.

    [Caroline Arms, Library of Congress]


    http://www.openarchives.org discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials

    [email protected]


    the OAI: not really an organization discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials

    • Executive: Carl Lagoze & Herbert Van de Sompel

    • 2000 – 2002 funding from CNI and DLF

    • Steering Committee

    • Technical Committe:

      • protocol revision & stabilization

    • Alpha testers


    OAI-tech discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials

    US representatives

    Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Muhammad Zubair (Old Dominion U) - Steven Bird (U Penn.)

    European representatives

    Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton)


    OAI-PMH 2.0 alpha testers (1/2) discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials

    • The British Library

    • Cornell U. -- NSDL project & e-print arXiv

    • Ex Libris

    • FS Consulting Inc -- harvester for my.OAI

    • Humboldt-Universität zu Berlin

    • InQuirion Pty Ltd, RMIT University

    • Library of Congress

    • NASA

    • OCLC


    OAI-PMH 2.0 alpha testers (2/2) discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials

    • Old Dominion U. -- ARC , DP9

    • U. of Illinois at Urbana-Champaign

    • U. Of Southampton -- OAIA, CiteBase, eprints.org

    • UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection

    • UKOLN, U. of Bath -- RDN

    • Virginia Tech -- repository explorer


    ad