Harvesting and resolution methods for building oai based services
Download
1 / 26

Harvesting and Resolution Methods for Building OAI-based Services - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Harvesting and Resolution Methods for Building OAI-based Services. Jeffrey A. Young jyoung@oclc.org CERN OAI3 Workshop# 4 Geneva, Switzerland 14 February 2004. Introductions. Name Affiliation Plans Needs Technical experience. Review OAI-PMH Protocol. Identify ListSets

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Harvesting and Resolution Methods for Building OAI-based Services' - aquarius


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Harvesting and resolution methods for building oai based services

Harvesting and Resolution Methodsfor Building OAI-based Services

Jeffrey A. Youngjyoung@oclc.org

CERN OAI3 Workshop# 4Geneva, Switzerland14 February 2004


Introductions
Introductions

  • Name

  • Affiliation

  • Plans

  • Needs

  • Technical experience


Review oai pmh protocol
Review OAI-PMH Protocol

  • Identify

  • ListSets

  • ListMetadataFormats

  • ListRecords

  • ListIdentifiers

  • GetRecord


Find repositories to harvest
Find Repositories to Harvest

  • http://www.openarchives.org/Register/BrowseSites.pl

  • http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

  • http://oai.grainger.uiuc.edu/registry/

  • Friends lists

  • Communities (e.g. www.ndltd.org)


Exercise getting started
Exercise: Getting Started

  • What are your data sources?

  • How will you add value?

  • Who will design the system?

  • Who will create/operate the software?

  • Who will create/maintain the data?

  • Who will advocate for it politically?

  • Who will benefit?

  • Who will pay?


Metadata
Metadata

  • Metadata is data about data

  • Metadata formats: two extremes

    • Dublin Core

    • MARC

  • Metadata can be relative

    • Who created this document?

    • Who created the metadata about this document?

  • Keep in mind, though, that OAI works just as well for sharing XML content


Xml dtd xsd xsl
XML/DTD/XSD/XSL

  • XML - eXtensible Markup Language

  • DTD - Document Type Definition

  • XSD - XML Schema Definition

  • XSL - eXtensible Stylesheet Language


Extensible markup language
eXtensible Markup Language

  • Meta-markup language

  • HTML – Hypertext markup language

  • XHTML – eXtensible hypertext markup language


Xml overview
XML Overview

  • Well-formed XML

  • XML Namespaces

  • Valid XML

    • DTDs

    • XML Schemas

  • OAI Items vs. Records

    • Item identifiers

    • Multiple metadata record representations


Xml namespaces
XML Namespaces

  • Ambiguous XML Elements

    • <wind>NNE</wind>

    • <wind>Clockwise</wind>

  • Prefixes help identify and differentiate elements

    • <weather:wind>SE</weather:wind>

    • <toy:wind>Widdershins</toy:wind>

  • But, prefixes are arbitrary and potentially ambiguous, so what we really need is a URI (ie. prefixes are a local shorthand for the URI)

    • <weather:wind xmlns:weather=“someURI”>NW</weather:wind>


Xml schema definition
XML Schema Definition

  • Defines what an XML document contains

    • XHTML

    • oai_dc

    • MARC21 XML


What is our item
What is our “item”?

  • Work – a distinct intellectual or artistic creation

    • J.S. Bach’s The art of the fugue

  • Expression – the intellectual or artistic realization of a work

    • The composer’s score for organ

    • An arrangement for chamber orchestra by Anthony Lewis

  • Manifestation – The physical embodiment of an expression of a work

    • CD, printed score, multimedia kit, etc.

  • Item – A single exemplar of a manifestation


Exercise data definition
Exercise: Data Definition

  • Design a metadata format for items in your project

    • List the elements you need

    • Consider the encoding rules

    • Consider using controlled vocabularies

  • Assign an XML namespace

  • Map a crosswalk to Dublin Core

  • Create a sample item with both formats

    • Consider assigning OAI sets

  • Report issues, problems, and concerns


Exercise a simple harvester
Exercise: A Simple Harvester

  • XOAIHarvester – a simple harvester written in XSLT

  • http://errol.oclc.org/oai:xmlregistry.oclc.org:xoai/xoaiharvester.xsl

  • The purpose of the Perl script is to manage incremental harvesting

  • Caveat! OAI is merely the first step. Once data is harvested, OAI provides absolutely no guidance for doing something useful with it.


Concerns
Concerns

  • Data quality

  • Duplicates

  • Intellectual Property Rights (IPR)

  • The appropriate copy problem

  • Persistence


Repository variables
Repository Variables

  • MetadataPrefix

    • oai_dc – the lowest common denominator

  • Set

    • Hierarchical

    • Allows selective harvesting

    • Work best with community agreement

    • Client warrant


Exercise select create tools
Exercise: Select/Create Tools

  • http://www.oaforum.org/oaf_db/list_db/list_software.php

  • http://www.openarchives.org/tools/tools.html

  • http://www.cs.cornell.edu/people/simeon/software/utf8conditioner/

  • http://harvest.physik.uni-oldenburg.de/dc/index.html


An alternative service model
An Alternative Service Model

  • ERRoLs are URLs to content and services related to repositories in the OAI Registry at UIUC

  • http://errol.oclc.org/


Discussion
Discussion

  • Issues, Problems, Concerns?


Music services
Music Services

  • Organizational issues

  • Cultural issues

  • Collection policies

  • Best practices

  • Consensus-building

  • Controlled vocabularies

    • http://alcme.oclc.org/gsafd/

  • Do items represent digital and/or physical entities?

  • Authority control


Repository descriptors
Repository Descriptors

  • Repository-level “description” elements

    • oai-identifier description – identifier layout

    • eprints description – content & policies

    • friends description – discover repositories

    • branding description – branding information

    • olac-archive description – archive info

  • Record-level “about” elements

    • Rights statements

    • Provenance statements


Xslt overview
XSLT Overview

XML

Document

XSLT

Processor

XML

Document

XSLT

Stylesheet


Validate repositories
Validate Repositories

  • http://www.openarchives.org/data/registerasprovider.html

  • http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

  • http://www.w3.org/2001/03/webdata/xsv


Example service providers
Example Service Providers

  • ARC - A Cross Archive Search Service (experimental research service)http://arc.cs.odu.edu/

  • Dokumenten- und Publikationsserver der Humboldt-Universität zu Berlin (search service, German language user interface)http://edoc.hu-berlin.de/oaisearch/

  • iCite (citation index)http://icite.sissa.it/

  • NCSTRL—Networked Computer Science Technical Reference Library (search engine)http://www.ncstrl.org/

  • my.OAI (value-added search interface to a selected list of metadata databases)http://www.myoai.com/

  • Physnet (simple search interface to an experimental OAI harvester)http://physnet.uni-oldenburg.de/oai/query.php

  • ProPrint (printing-on-demand service, German and English language user interfaces offered)http://www.proprint-service.de/


Resources
Resources

  • http://www.openarchives.org/

  • http://www.oaforum.org/


Everything you need to know
Everything you need to know

  • http://www.oaforum.org/otherfiles/oaf_d23_technical2.pdf