The basics of oai
Download
1 / 35

The Basics of OAI - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

The Basics of OAI. An Introduction to the Protocol for Metadata Harvesting. Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond July 27, 2004. Outline. What the OAI protocol is & what it is not Place in digital library infrastructure How it works (basically)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Basics of OAI' - sef


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The basics of oai

The Basics of OAI

An Introduction to the Protocol for Metadata Harvesting

Sarah Shreeves

University of Illinois at Urbana-Champaign

Basics and Beyond

July 27, 2004


Outline
Outline

  • What the OAI protocol is & what it is not

  • Place in digital library infrastructure

  • How it works (basically)

  • Challenges for data / service providers

Basics and Beyond


Oai pmh is a tool
OAI- PMH is a tool

  • Moves metadata (not content) from a data provider to a service provider (or harvester)

  • A set of rules that defines the communication between two systems (like FTP and HTTP)

  • Build once, use for many applications – a building block for digital library services

    Facilitates the federation of metadata

Basics and Beyond


Oai pmh is not
OAI-PMH is not….

Metadata

A search tool

A database

Open Access

Basics and Beyond


Who uses oai
Who uses OAI?

  • Approximately 400 data providers

  • Basic building block of the National Science Digital Library (NSDL); OAIster

  • Incorporated into D-Space and Eprints.org

  • Part of CONTENTdm, Michigan’s DLXS, and other products

  • International use

Basics and Beyond


Basic oai pmh concepts
Basic OAI-PMH Concepts

  • “Aggregated search” rather than “Federated search”

  • Data providers – support OAI PMH as a means to expose metadata

  • Service providers – ‘harvests’ metadata from data providers via the OAI-PMH

  • OAI-PMH based upon HTTP and XML

  • OAI-PMH requires use of simple Dublin Core

    • BUT supports and encourages use of other metadata schemas

  • Unique and Persistent Identifiers and a Datestamp for each OAI record

Basics and Beyond


The basics of oai

Dig.

Mana Sys.

Data

Base

XML

files

OAI Data

Provider

OAI Data

Provider

OAI Data

Provider

Aggregated

Metadata

OAI Request

S

E

R

V

I

C

E

S

OAI Response

OAI Request

OAI Response

OAI Data Provider

OAI Response

OAI Request

O

A

I

H

A

R

V

E

S

T

E

R

Basics and Beyond


Examples of oai service providers
Examples of OAI Service Providers

  • OAIster: http://oaister.umdl.umich.edu/o/oaister/

  • Engineering, Computer Science, and Physics: http://g118.grainger.uiuc.edu/engroai/

  • Open Language Archives Community:http://www.language-archives.org/

Basics and Beyond


How oai works technically
How OAI Works (Technically)

Service Provider Data Provider

  • 6 distinct ‘verbs’ or requests

  • OAI requests are sent via HTTP

  • Responses are sent in valid XML

Dig.

Mngt.

Sys.

A

G

G

R

E

G

A

T

E

D

OAI

H

A

R

V

E

S

T

E

R

OAI

Data

P

R

O

V

I

D

E

R

M

E

T

A

D

A

T

A

HTTP Request

(OAI Verb)

HTTP Response

(Valid XML)

Basics and Beyond


An oai record
An OAI Record

- <record xmlns="http://www.openarchives.org/OAI/2.0/">

- <header>

<identifier>oai:docsouth.unc.edu:12</identifier>

<datestamp>2003-04-24T13:15:52Z</datestamp>

<setSpec>4</setSpec>

</header>

- <metadata>

- <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">

<title>Advice to Soldiers</title>

<creator>William Royal</creator>

<subject>United States -- History -- Civil War, 1861-1865 -- Religious aspects.</subject>

<subject>Confederate States of America -- Religion.</subject>

<subject>Soldiers -- Religious life -- Confederate States of America.</subject>

<subject>Soldiers -- Confederate States of America -- Conduct of life.</subject>

<subject>Confederate States of America -- Church history.</subject>

<subject>Sin.</subject>

<publisher>[Raleigh, N. C.: s. n., between 1861 and 1865]</publisher>

<date>2003-04-24T13:15:52Z</date>

<type>Text</type>

<format>text/html</format>

<identifier>http://docsouth.unc.edu/royal/royal.html</identifier>

<language>en-us</language>

</oai_dc:dc>

</metadata>

</record>

Basics and Beyond


Oai verbs
OAI “VERBS”

Identify

ListMetadataFormats

ListSets

ListIdentifiers

ListRecords

GetRecord

Basics and Beyond


Identify
Identify

  • Purpose

    • Return general information about the archive and its policies (e.g., datestamp granularity)

  • Parameters

    • None

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

Basics and Beyond


Listsets
ListSets

  • Purpose

    • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat)

  • Parameters

    • None

      Sample URL:

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

Basics and Beyond


Listmetadataformats
ListMetadataFormats

  • Purpose

    • List metadata formats supported by the archive as well as their schema locations and namespaces

  • Parameters

    • identifier – for a specific record (O)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats

Basics and Beyond


Listidentifiers
ListIdentifiers

  • Purpose

    • List headers for all items corresponding to the specified parameters

  • Parameters

    • from – start date (O) and/or until – end date (O)

    • set – set to harvest from (O)

    • metadataPrefix – metadata format to list identifiers for (R)

    • resumptionToken – flow control mechanism (X)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc

Basics and Beyond


Getrecord
GetRecord

  • Purpose

    • Returns the metadata for a single item in the form of an OAI record

  • Parameters

    • identifier – unique id for item (R)

    • metadataPrefix – metadata format for the record (R)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Basics and Beyond


Listrecords
ListRecords

  • Purpose

    • Retrieves metadata records for multiple items

  • Parameters

    • from – start date (O)

    • until – end date (O)

    • set – set to harvest from (O)

    • resumptionToken – flow control mechanism (X)

    • metadataPrefix – metadata format (R)

  • Sample URL

    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

Basics and Beyond


Other pieces of oai
Other Pieces of OAI

  • Flow Control

  • Sets

  • Multiple metadata schemas

Basics and Beyond


Challenges for the oai community
Challenges for the OAI Community

  • Relatively recent protocol but no best practices (yet)

  • ‘Shareablity of metadata’

    • Heterogeneity of items described

    • Loss of Context / Information loss

    • Knowledge structures differ so….

      • Native metadata schemas differ

      • Controlled vocabularies differ

      • Use and presentation of items differ

Basics and Beyond


Metadata for different communities
Metadata for different communities

http://digital.lib.umn.edu/IMAGES/reference/mswp/MPW00476.jpg

Basics and Beyond


Metadata for different communities1
Metadata for different communities

http://images.library.uiuc.edu:8081/cgi-bin/viewer.exe?CISOROOT=/tdc&CISOPTR=746

Basics and Beyond




Loss of context data
Loss of context / data

Basics and Beyond


Loss of context data1
Loss of context / data

Basics and Beyond


Sense completeness of metadata
Sense / Completeness of Metadata

  • identifier:http://images.umdl.umich.edu/cgi/i/image/image-idx?view=entry;subview=detail;cc=fish3ic;entryid=X-0802;viewid=1004_112

  • publisher: UMMZ Fish Division

  • format: jpeg

  • type: image

  • subject: 1926-05-18

  • subject: 1926;0812;18;Trib. to Sixteen Cr. Trib. Pine River, Manistee R.;R10W;S26; S27;JAM26-460;05;T21N;1926/05/18

  • language: UND

  • description: Flora and Fauna of the Great Lakes Region;

Basics and Beyond



The basics of oai

Digital Image of "Cotton Coverlet with Emboridered Butterfly Design"

Description:Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.

Source:Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss.

Format:Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web.

Coverage:—

Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920?

Type:Image

Granularity of Description: Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design"

Basics and Beyond


Granularity of description excerpt of metadata record describing american woven coverlet
Granularity of Description: Excerpt of Metadata Record Describing “American Woven Coverlet”

Digital Image of "American Woven Coverlet"

Description:Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973

Source:—

Format:228 x 169 x 1.2 cm (1,629 g)

Coverage:Euro-American; America, North; United States; Indiana? Illinois?

Date:Early 19th c. CE

Type:cultural; physical object; original

Basics and Beyond


Range of vocabularies in use
Range of vocabularies in use Describing “American Woven Coverlet”

Basics and Beyond


Data providers can
Data providers can: Describing “American Woven Coverlet”

  • Create metadata for interoperability

    • Reusable metadata - think beyond your local users and environment

    • Use well structured and defined schemas; move beyond simple DC

    • Use and identify controlled vocabularies

Basics and Beyond


Service providers can
Service Providers can… Describing “American Woven Coverlet”

  • Analyze metadata and cluster and normalize some aspects

  • Communicate with data providers about their metadata

  • Custom interfaces and selective views for target audiences / domains

Basics and Beyond


Resources
Resources Describing “American Woven Coverlet”

  • OAI for beginners tutorialhttp://www.oaforum.org/tutorial/

  • OAI Frequently Asked Questionshttp://www.openarchives.org/documents/FAQ.html

  • IMLS Digital Collections and Content Projecthttp://imlsdcc.grainger.uiuc.edu/

Basics and Beyond


Recap
Recap Describing “American Woven Coverlet”

  • OAI protocol is a tool

  • OAI is easy - metadata is hard

  • Better metadata = better interoperability

Basics and Beyond


Contact information
Contact Information Describing “American Woven Coverlet”

Sarah Shreeves

Project Coordinator

IMLS Digital Collections and Content

University of Illinois Library at Urbana-Champaign

Email: sshreeve@uiuc.edu

Phone: 217-244-7809

Website: http://imlsdcc.grainger.uiuc.edu/

Basics and Beyond