metadata interoperability and contentdm n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Metadata Interoperability and CONTENTdm PowerPoint Presentation
Download Presentation
Metadata Interoperability and CONTENTdm

Loading in 2 Seconds...

play fullscreen
1 / 59

Metadata Interoperability and CONTENTdm - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Metadata Interoperability and CONTENTdm. Midwest CONTENTdm Users Group April 30, 2008 IUPUI Indianapolis, IN Amy Jackson , amyjacks@uiuc.edu Myung-ja Han , mhan3@uiuc.edu University of Illinois at Urbana Champaign. University of Illinois at Urbana Champaign.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Metadata Interoperability and CONTENTdm' - melodie-clark


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
metadata interoperability and contentdm

Metadata Interoperability and CONTENTdm

Midwest CONTENTdm Users Group

April 30, 2008

IUPUI

Indianapolis, IN

Amy Jackson, amyjacks@uiuc.edu

Myung-ja Han, mhan3@uiuc.edu

University of Illinois at Urbana Champaign

university of illinois at urbana champaign
University of Illinois at Urbana Champaign
  • Producer/consumer of metadata in CONTENTdm
  • Currently use CONTENTdm to provide public access to 12 collections
  • Various projects harvest metadata from 11 CONTENTdm repositories around the nation
metadata interoperability and contentdm1
Metadata Interoperability and CONTENTdm
  • Metadata and CONTENTdm
    • Service provider
    • Data provider
  • Longitudinal analysis of harvested metadata
    • Qualitative results
    • Quantitative results
  • Service provider view (Amy)
  • Data provider view (MJ)
imls digital collections and content
IMLS Digital Collections and Content
  • Project began December 2002 as an IMLS National Leadership Grant
    • Carole Palmer, Principal Investigator, 2007-2010
    • Tim Cole, Principal Investigator, 2002-2007
    • Amy Jackson, Project Coordinator
  • Collaboration between UIUC Library and Graduate School of Library and Information Science
  • http://imlsdcc.grainger.uiuc.edu/
imls digital collections and content1
IMLS Digital Collections and Content
  • Project Objectives:
    • Implement a collection registry of digital collections created or developed with funding from IMLS NLG program
    • Use OAI-PMH to implement an item-level metadata repository for items contained in NLG collections
    • Carry out associated research related to:
      • Utility and usability of Registry & Repository
      • Current metadata practices of IMLS NLG grantees
      • Implications for interoperability (Framework of Guidance for Building Good Digital Collections)
item level repository
Item-level repository
  • Item-level Repository
    • Harvesting 71 of 195 Collections (36%)
    • 37 Repositories (some multiple institutions)
    • 10 CONTENTdm repositories
    • 310,448 records
  • Item Records (self identified types)
    • 86% images
    • 14% text
item level repository1
Item-level repository

Number of harvested collections using each DC field

item level repository2
Item-level repository

Top Item-level subjects

Archaeology

Buildings

Photographers

Mountains

Men

Archaeological site

Insect

Bodies of water

oai pmh
OAI-PMH
  • All 37 repositories export metadata in simple Dublin Core
  • Five export in schemas other than simple or Qualified Dublin Core
    • MARC21
    • MODS
    • OLAC
    • ETDMS
metadata harvesting
Metadata harvesting
  • OAI-PMH
    • Harvested approach rather than federated approach
    • Data providers – create and expose metadata
    • Service providers – harvest and aggregate metadata
    • Based on HTTP and XML
    • Requires use of Dublin Core
      • Encourages and supports other formats
how oai works technically
How OAI Works (Technically)

6 distinct ‘verbs’ or request

OAI requests are sent via HTTP

Responses are sent in valid XML

Service Provider Data Provider

Digi.

Mana.

Sys.

A

G

G

R

E

G

A

T

E

D

OAI

H

A

R

V

E

S

T

E

R

OAI

Data

P

R

O

V

I

D

E

R

M

E

T

A

D

A

T

A

HTTP Request

(OAI Verb)

HTTP Response

(Valid XML)

oai pmh in contentdm
OAI-PMH in CONTENTdm
  • Enable oai.txt file
  • CONTENTdm base url followed by /cgi-bin/oai.exe
    • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe
  • OAI “verbs”
    • ?verb=Identify
    • Return general information about the archive and its policies (e.g., datestamp granularity)
    • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=Identify
oai pmh verbs
OAI-PMH verbs
  • Identify
  • ListMetadataFormats
  • ListSets
  • ListIdentifiers
  • ListRecords
  • GetRecord
oai pmh1
OAI-PMH
  • ListSets
    • Purpose
      • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat)
    • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListSets
oai pmh2
OAI-PMH
  • ListRecords
    • Purpose
      • Retrieves metadata records for multiple items
    • Parameters
      • from – start date
      • until – end date
      • set – set to harvest from
      • resumptionToken – flow control mechanism
      • metadataPrefix – metadata format
    • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc
oai pmh3
OAI-PMH
  • Barriers to sharing metadata through OAI-PMH
    • Technical Infrastructure
    • Metadata
    • Institution/Project
  • CONTENTdm
    • Compliant with OAI-PMH
    • Metadata is mapped to DC
harvested metadata
Harvested Metadata
  • How has use of Dublin Core changed over time?

Records harvested from January 1, 2001 and December 31, 2006.

    • Quantitative analysis
      • What measurable changes can we see in the metadata?
    • Qualitative analysis
      • How has use of fields changed over time?
quantitative analysis
Quantitative analysis
  • Quantitative analysis
    • Repetition of elements
    • Length of fields
    • Use of core fields (Shreeves et al. (2005))
quantitative analysis1
Quantitative analysis
  • Repetition of fields
    • Stable
  • Length of fields
    • Stable
  • Use of all 8 core fields
    • Declining
quantitative analysis2

Percent of records containing all core DC fields

100%

93.95%

IMLS

90%

IMLS & CIC

80%

71.65%

70.27%

70%

60%

52.41%

50%

40%

30%

22.97%

18.55%

20%

11.04%

7.99%

10%

0%

34 or more

21 to 33

11 to 20

10 or less

Age of record in months

Quantitative Analysis
quantitative analysis3
Quantitative Analysis
  • Of these eight elements, the two elements most often missing are creator (used in 39% of records) and rights (52%).
  • Identifier, title, and subject were each used in over 96% of all records.
  • Format and description fields have shown the most significant decline in use since 2003.
  • Decreased repetition and length of the description field, and an overall increase in use of the relation field.
conclusions
Conclusions
  • Recommendations
    • Publish local metadata practices
    • Publish crosswalking information
    • Expose native metadata in addition to Dublin Core
slide27
Amy Jackson

Project Coordinator

IMLS Digital Collections and Content

University of Illinois at Urbana Champaign

amyjacks@uiuc.edu

slide29
What does exporting mean?

Qualitative analysis

- Changes over time

- Unpacking MARC

- Incorrect mapping

- Misuse and confusion of DC elements

- What top expose and what not

- Lost in harvesting

What we have learned

Recommendations

what does exporting mean
What does exporting mean?

Exporting

Makes collection metadata available for service providers to harvest.

CONTENTdm has a turnkey option to make this possible.

Has DC mapping to provide Dublin Core records to service providers.

why export metadata
Why export metadata?

Increases exposure of collections

Broadens user base

We can no longer assume that users will come through the front door, sharing metadata gets us ‘in the flow (Locan Dempsey)’

- Metadata for you & me

qualitative analysis
Qualitative analysis

225 records from 6 repositories (time increments)

- Document changes in practice over time

- Compare original record vs. harvested record in service provider’s environment

600 randomly selected records

95 records from 11 repositories and 19 collections harvested from CONTENTdm

any changes over time
Any Changes over time?

Only 1 observed change in overtime

Early records: <title>Frankie / Music by Neil Sedaka; words by Howard Greenfield </title>

Later records: <title>Frankie</title><creator>Music by Neil Sedaka; words by Howard Greenfield</creator>

other findings
Other findings…

Unpacking MARC

Incorrect mapping

Misuse and confusion of Dublin Core elements

What to export and what not…

And

Lost in harvesting…

unpacking marc
Unpacking MARC

Object Description Photograph: b&w; 6 1/8x8 in.<type>Photograph: b&amp;w; 6 1/8 x 8 in.</type>

Publication Information [Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?]

<publisher>[Lancaster, Pa.? : Johann Albrecht und Comp.?, 1790?]</publisher>

unpacking marc1
Unpacking MARC

a. MARC 245 could be mapped to:

Subfield 'a' => <title>

Subfield 'b' => <title> or <alternative>

Subfield 'c' => <creator> or <contributor>

Subfield 'f' => <date>

Subfield 'g' => <date>

Subfield 'h' => <format>

Subfield 'k' => <type>

Subfield 'n' => <description> or <title>

Subfield 'p' => <description> or <title>

unpacking marc2
Unpacking MARC

b. MARC 260:

<publisher>

<date>

c. MARC 6xx:

<subject>

<coverage-temporal>

<coverage-spatial>

<type>

incorrect mapping
Incorrect Mapping

a. Digital Reproduction Information Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3.

Where do you map this?

<format> Scanned as a 3000 pixel TIFF image in 8-bit grayscale, resized to 640 pixels in the longest dimension and compressed into JPEG format using Photoshop 6.0 and its JPEG quality measurement 3.

incorrect mapping1
Incorrect Mapping

b. Repository University of Prominent Libraries. Special Collections Division.

Repository Collection Prominent Photograph Collection. PH Coll 282

Where do you map these?

<source> University of Prominent Libraries. Special Collections Division.

<source> Prominent Photograph Collection. PH Coll 282

incorrect mapping2
Incorrect Mapping

c. Physical description 9 in. x 6 in.

Where do you map this?

<description> 9 in. x 6 in.

misuse of dublin core elements
Misuse of Dublin Core elements

a. <date> and <coverage>

- Item about the nineteenth century, published in 2007.

Metadata should be?

<date>1800-1899

OR

<date>2007

<coverage>1800-1899

misuse of dublin core elements1
Misuse of Dublin Core elements

b. <source> and <relation>

Repository: PSMHS Collection is located at the Museum of History & Industry, Seattle

Repository Collection: Joe Williamson Collection

Both of them mapped to <source>

<source>:

A related resource from which the described resource is derived.

<relation>:

A related resource. - Dublin Core Metadata Element Set, Version 1.1

misuse of dublin core elements2
Misuse of Dublin Core elements

c.<type>, <format>, and <description>

<type>Photograph: b&amp;w; 6 1/8 x 8 in.</type>

<format>1 tool : wood</format>

<description>9 in. x 6 in.</description>

<description>Material: Whale Bone</description>

after re mapping the records
After re-mapping the records…

DC Elements Usages (118 records)

after re mapping the records1
After re-mapping the records…

Number of records with 8 DC fields

what to export and what not
What to export and what not…

a. Information about scanning?

<format>Three-dimensional objects, oversized prints and posters photographed with a Nikon D1X digital camera at resolution of 1312 x 2000 pixels, eight bits per RGB channel in TIF format. Images downloaded onto CD-R's, then copied using a Dell Optiplex GX150 and stored in Network Area Storage for non-display archival purposes. Additional copy created for further processing. If necessary, color correction performed using Levels in Photoshop. Resized at 720 dpi vertical, then compressed using Photoshop setting of 80 into JPG format for Web display.</format>

what to export and what not1
What to export and what not…

b. Information about shelf, box, and folder number of item?

<dc:source>99</dc:source>

<dc:source>1</dc:source>

<dc:source>14</dc:source>

<dc:source>5</dc:source>

what to export and what not2
What to export and what not…

c. Two publishers, which to export?

Digital Publisher Electronically reproduced by the Digital Services unit of the University of Central Florida Libraries, Orlando, 2005.

Publisher Students of Rollins College.

<publisher>Students of Rollins College.</publisher>

The Digital Publisher information is not mapped to export.

what we have learned
What we have learned…

Native metadata records are rich in meaning in their own environment, but lose richness in the aggregated environment due to mapping errors and misunderstanding and misuse of Dublin Core elements.

Mapping is often based on semantic meanings of metadata fields rather than value strings.

Correct mapping could improve metadata quality significantly.

contentdm collections
CONTENTdm Collections

Could be exposed via service providers in DC format

Could be exposed via WorldCat in MARC format

How can we provide good records to users in service providers’ environments?

recommendations
Recommendations
  • Create a project based best practices and content standard
  • Consider using field names that can be useful globally
  • Ensure that metadata creators receive proper training

But first of all,

questions and comments
Questions and comments

Myung-ja (mj) Han

Metadata Librarian

University of Illinois at Urbana-Champaign

mhan3@uiuc.edu