The basics of oai
Sponsored Links
This presentation is the property of its rightful owner.
1 / 35

The Basics of OAI PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

The Basics of OAI. An Introduction to the Protocol for Metadata Harvesting. Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July 27, 2004. Outline. What the OAI protocol is & what it is not Place in digital library infrastructure How it works (basically)

Download Presentation

The Basics of OAI

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The Basics of OAI

An Introduction to the Protocol for Metadata Harvesting

Sarah Shreeves

University of Illinois at Urbana-Champaign

Basics and Beyond

July 27, 2004


Outline

  • What the OAI protocol is & what it is not

  • Place in digital library infrastructure

  • How it works (basically)

  • Challenges for data / service providers

Basics and Beyond


OAI- PMH is a tool

  • Moves metadata (not content) from a data provider to a service provider (or harvester)

  • A set of rules that defines the communication between two systems (like FTP and HTTP)

  • Build once, use for many applications – a building block for digital library services

    Facilitates the federation of metadata

Basics and Beyond


OAI-PMH is not….

Metadata

A search tool

A database

Open Access

Basics and Beyond


Who uses OAI?

  • Approximately 400 data providers

  • Basic building block of the National Science Digital Library (NSDL); OAIster

  • Incorporated into D-Space and Eprints.org

  • Part of CONTENTdm, Michigan’s DLXS, and other products

  • International use

Basics and Beyond


Basic OAI-PMH Concepts

  • “Aggregated search” rather than “Federated search”

  • Data providers – support OAI PMH as a means to expose metadata

  • Service providers – ‘harvests’ metadata from data providers via the OAI-PMH

  • OAI-PMH based upon HTTP and XML

  • OAI-PMH requires use of simple Dublin Core

    • BUT supports and encourages use of other metadata schemas

  • Unique and Persistent Identifiers and a Datestamp for each OAI record

Basics and Beyond


Dig.

Mana Sys.

Data

Base

XML

files

OAI Data

Provider

OAI Data

Provider

OAI Data

Provider

Aggregated

Metadata

OAI Request

S

E

R

V

I

C

E

S

OAI Response

OAI Request

OAI Response

OAI Data Provider

OAI Response

OAI Request

O

A

I

H

A

R

V

E

S

T

E

R

Basics and Beyond


Examples of OAI Service Providers

  • OAIster: http://oaister.umdl.umich.edu/o/oaister/

  • Engineering, Computer Science, and Physics: http://g118.grainger.uiuc.edu/engroai/

  • Open Language Archives Community:http://www.language-archives.org/

Basics and Beyond


How OAI Works (Technically)

Service Provider Data Provider

  • 6 distinct ‘verbs’ or requests

  • OAI requests are sent via HTTP

  • Responses are sent in valid XML

Dig.

Mngt.

Sys.

A

G

G

R

E

G

A

T

E

D

OAI

H

A

R

V

E

S

T

E

R

OAI

Data

P

R

O

V

I

D

E

R

M

E

T

A

D

A

T

A

HTTP Request

(OAI Verb)

HTTP Response

(Valid XML)

Basics and Beyond


An OAI Record

- <record xmlns="http://www.openarchives.org/OAI/2.0/">

- <header>

<identifier>oai:docsouth.unc.edu:12</identifier>

<datestamp>2003-04-24T13:15:52Z</datestamp>

<setSpec>4</setSpec>

</header>

- <metadata>

- <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">

<title>Advice to Soldiers</title>

<creator>William Royal</creator>

<subject>United States -- History -- Civil War, 1861-1865 -- Religious aspects.</subject>

<subject>Confederate States of America -- Religion.</subject>

<subject>Soldiers -- Religious life -- Confederate States of America.</subject>

<subject>Soldiers -- Confederate States of America -- Conduct of life.</subject>

<subject>Confederate States of America -- Church history.</subject>

<subject>Sin.</subject>

<publisher>[Raleigh, N. C.: s. n., between 1861 and 1865]</publisher>

<date>2003-04-24T13:15:52Z</date>

<type>Text</type>

<format>text/html</format>

<identifier>http://docsouth.unc.edu/royal/royal.html</identifier>

<language>en-us</language>

</oai_dc:dc>

</metadata>

</record>

Basics and Beyond


OAI “VERBS”

Identify

ListMetadataFormats

ListSets

ListIdentifiers

ListRecords

GetRecord

Basics and Beyond


Identify

  • Purpose

    • Return general information about the archive and its policies (e.g., datestamp granularity)

  • Parameters

    • None

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

Basics and Beyond


ListSets

  • Purpose

    • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat)

  • Parameters

    • None

      Sample URL:

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

Basics and Beyond


ListMetadataFormats

  • Purpose

    • List metadata formats supported by the archive as well as their schema locations and namespaces

  • Parameters

    • identifier – for a specific record (O)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats

Basics and Beyond


ListIdentifiers

  • Purpose

    • List headers for all items corresponding to the specified parameters

  • Parameters

    • from – start date (O) and/or until – end date (O)

    • set – set to harvest from (O)

    • metadataPrefix – metadata format to list identifiers for (R)

    • resumptionToken – flow control mechanism (X)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc

Basics and Beyond


GetRecord

  • Purpose

    • Returns the metadata for a single item in the form of an OAI record

  • Parameters

    • identifier – unique id for item (R)

    • metadataPrefix – metadata format for the record (R)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Basics and Beyond


ListRecords

  • Purpose

    • Retrieves metadata records for multiple items

  • Parameters

    • from – start date (O)

    • until – end date (O)

    • set – set to harvest from (O)

    • resumptionToken – flow control mechanism (X)

    • metadataPrefix – metadata format (R)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

Basics and Beyond


Other Pieces of OAI

  • Flow Control

  • Sets

  • Multiple metadata schemas

Basics and Beyond


Challenges for the OAI Community

  • Relatively recent protocol but no best practices (yet)

  • ‘Shareablity of metadata’

    • Heterogeneity of items described

    • Loss of Context / Information loss

    • Knowledge structures differ so….

      • Native metadata schemas differ

      • Controlled vocabularies differ

      • Use and presentation of items differ

Basics and Beyond


Metadata for different communities

http://digital.lib.umn.edu/IMAGES/reference/mswp/MPW00476.jpg

Basics and Beyond


Metadata for different communities

http://images.library.uiuc.edu:8081/cgi-bin/viewer.exe?CISOROOT=/tdc&CISOPTR=746

Basics and Beyond


Loss of Context: Record in OAI aggregation

Basics and Beyond


Context: Record in native database

Basics and Beyond


Loss of context / data

Basics and Beyond


Loss of context / data

Basics and Beyond


Sense / Completeness of Metadata

  • identifier:http://images.umdl.umich.edu/cgi/i/image/image-idx?view=entry;subview=detail;cc=fish3ic;entryid=X-0802;viewid=1004_112

  • publisher: UMMZ Fish Division

  • format: jpeg

  • type: image

  • subject: 1926-05-18

  • subject: 1926;0812;18;Trib. to Sixteen Cr. Trib. Pine River, Manistee R.;R10W;S26; S27;JAM26-460;05;T21N;1926/05/18

  • language: UND

  • description: Flora and Fauna of the Great Lakes Region;

Basics and Beyond


Basics and Beyond


Digital Image of "Cotton Coverlet with Emboridered Butterfly Design"

Description:Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.

Source:Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.

Format:Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.

Coverage:—

Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?

Type:Image

Granularity of Description: Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design"

Basics and Beyond


Granularity of Description: Excerpt of Metadata Record Describing “American Woven Coverlet”

Digital Image of "American Woven Coverlet"

Description:Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973

Source:—

Format:228 x 169 x 1.2 cm (1,629 g)

Coverage:Euro-American; America, North; United States; Indiana? Illinois?

Date:Early 19th c. CE

Type:cultural; physical object; original

Basics and Beyond


Range of vocabularies in use

Basics and Beyond


Data providers can:

  • Create metadata for interoperability

    • Reusable metadata - think beyond your local users and environment

    • Use well structured and defined schemas; move beyond simple DC

    • Use and identify controlled vocabularies

Basics and Beyond


Service Providers can…

  • Analyze metadata and cluster and normalize some aspects

  • Communicate with data providers about their metadata

  • Custom interfaces and selective views for target audiences / domains

Basics and Beyond


Resources

  • OAI for beginners tutorialhttp://www.oaforum.org/tutorial/

  • OAI Frequently Asked Questionshttp://www.openarchives.org/documents/FAQ.html

  • IMLS Digital Collections and Content Projecthttp://imlsdcc.grainger.uiuc.edu/

Basics and Beyond


Recap

  • OAI protocol is a tool

  • OAI is easy - metadata is hard

  • Better metadata = better interoperability

Basics and Beyond


Contact Information

Sarah Shreeves

Project Coordinator

IMLS Digital Collections and Content

University of Illinois Library at Urbana-Champaign

Email: [email protected]

Phone: 217-244-7809

Website: http://imlsdcc.grainger.uiuc.edu/

Basics and Beyond


  • Login