Metadata for digital repositories
Download
1 / 152

Metadata for - PowerPoint PPT Presentation


  • 452 Views
  • Updated On :

Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License Schedule 9:00 - 10:30

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Metadata for ' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Metadata for digital repositories l.jpg

Metadata for Digital Repositories

Mark Jordan

Repository Redux

University of Prince Edward Island

September 19, 2007

This work is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 2.5 Canada

License


Schedule l.jpg
Schedule

  • 9:00 - 10:30

    • Background; types of metadata; major standards; choosing metadata schemes

  • 10:45 - 12:00

    • Metadata life cycle; strategies for creation and management; automated creation; supplementation strategies

  • 1:00 - 2:30

    • SFU theses workflow case study; native vs. derived; crosswalks

  • 2:45 - 4:30

    • Application Profiles; OAI; CARLCore AP case study


What is metadata l.jpg
What is Metadata?

  • Different meanings in different communities

  • Information about information

  • Can describe information at any level

    • Collection

    • Item

    • Item within item

  • Can be embedded within an object or separate from it


Types of metadata l.jpg
Types of Metadata

  • Descriptive

  • Terms and conditions

  • Administrative data

  • Content ratings

  • Provenance

  • Linking or relationship data

  • Structural data

Carl Lagoze, Clifford A. Lynch, and Ron Daniel, Jr. “The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata”. 1996. http://hdl.handle.net/1813/7248


Metadata and cataloguing l.jpg
Metadata and Cataloguing

  • Perception that cataloguing is old and metadata is new

  • Traditional cataloguing focuses on descriptions of analogue materials

  • Metadata focuses on management of networked resources

  • For locally created or managed networked resources (such as repositories), cataloguing is insufficient


Metadata schemes l.jpg
Metadata Schemes

  • Defines a collection of elements for supporting a specific function

  • Defines structures for element values

  • Defines formal aspects of the element set, such as name, definition, data type, etc.

  • Some schemes are expressed as XML schemas


Containers vs rules of description l.jpg
Containers vs. Rules of Description

  • Containers dictate structure

  • Rules of description dictate content

  • Common rules of description

    • AACR2

    • RDA

    • RAD


Vs glue standards l.jpg
Vs. Glue Standards

  • OpenURL

    • Syntax for encoding bib data in URLs

    • http://resolver.example.edu/cgi?genre=book&isbn=0836218310&title=The+Far+Side+Gallery+3

  • COinS

    • OpenURLS embedded in HTML <span> tags

  • unAPI

    • Identifiers embedded in HTML <abbr> tags for autodiscovery and “copy and paste”

  • Microformats

    • For example, <a href="http://creativecommons.org/licenses/by/2.0/" rel="license">cc by 2.0</a>


Selected major standards l.jpg
Selected Major Standards

  • Dublin Core

  • MODS

  • Collection Description

  • RDA

  • EAD

  • PREMIS

  • METS


Dublin core l.jpg
Dublin Core

  • Standard metadata set for describing resources

  • It is flexible

    • Qualified vs. unqualified

    • Can be expressed in HTML, XML ,or using RDF

  • Dummying down is a good thing


Dublin core element set l.jpg

Title

Creator

Subject

Description

Publisher

Contributor

Date

Type

Format

Identifier

Source

Language

Relation

Coverage

Rights

Dublin Core Element Set


Dublin core qualifiers l.jpg
Dublin Core Qualifiers

  • Types

    • Element refinements

    • Encoding schemes

  • Examples

    • Description

      • Table of contents, abstract

    • Date

      • Created, valid, available, issued, modified

    • Subject

      • LCSH, MESH, DDC, LCC, UDC



Slide14 l.jpg
MODS

  • A “bibliographic element set that may be used for a variety of purposes, and particularly for library applications.”

  • Richer than DC, simpler than MARC

  • Does not assume the use of any specific cataloging code

  • Elements: titleInfo, title, name, namePart, originInfo, etc.


Slide15 l.jpg

<?xml version="1.0" encoding="UTF-8"?>

<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">

<mods:titleInfo>

<mods:title>A Jewel of Honesty</mods:title>

</mods:titleInfo>

<mods:genre>Article</mods:genre>

<mods:abstract>clashing oppositions</mods:abstract>

<mods:subject>

<mods:geographic>N/A</mods:geographic>

</mods:subject>

<mods:subject authority="none">

<mods:topic>General interest</mods:topic>

</mods:subject>

<mods:relatedItem type="host">

<mods:titleInfo>

<mods:title>Carnegie Newsletter</mods:title>

<mods:title>Celebration a Spectacle of Hope</mods:title>

</mods:titleInfo>

<mods:name>

<mods:namePart>Pra'N'Ava</mods:namePart>

<mods:role>

<mods:roleTerm authority="marcrelator" type="text">author</mods:roleTerm>

</mods:role>

</mods:name>

<mods:name>

<mods:namePart>N/A</mods:namePart>

<mods:role>

<mods:roleTerm authority="chodarr" type="text">recipient</mods:roleTerm>

</mods:role>

</mods:name>

<mods:part>

<mods:extent unit="pages">

<mods:start>9</mods:start>

<mods:list>9,14</mods:list>

</mods:extent>

</mods:part>

<mods:originInfo>

<mods:dateIssued encoding="iso8601">19870101</mods:dateIssued>

</mods:originInfo>

</mods:relatedItem>

</mods:mods>


Slide16 l.jpg

<?xml version="1.0" encoding="UTF-8"?>

<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">

<mods:titleInfo>

<mods:title>A Jewel of Honesty</mods:title>

</mods:titleInfo>

<mods:genre>Article</mods:genre>

<mods:abstract>clashing oppositions</mods:abstract>

<mods:subject>

<mods:geographic>N/A</mods:geographic>

</mods:subject>

<mods:subject authority="none">

<mods:topic>General interest</mods:topic>

</mods:subject>

<mods:relatedItem type="host">

<mods:titleInfo>

<mods:title>Carnegie Newsletter</mods:title>

<mods:title>Celebration a Spectacle of Hope</mods:title>

</mods:titleInfo>

<mods:name>

<mods:namePart>Pra'N'Ava</mods:namePart>

<mods:role>

<mods:roleTerm authority="marcrelator" type="text">author

</mods:roleTerm>

</mods:role>

</mods:name>


Dcmi collection description l.jpg
DCMI Collection Description

  • Formal description of aggregation or collection of items

  • Can apply to collections where item-level metadata is not available or appropriate, or to collections where it is

  • Sample elements:

    • accrualMethod, accrualPeriodicity

  • Developed as NISO Z39.91

Dublin Core Collection Description Application Profile,

http://www.ukoln.ac.uk/metadata/dcmi/collection-application-profile/2004-02-01/


Slide18 l.jpg
RDA

  • Resource Description and Access, the successor to AACR2

  • Diane Hillmann’s critique

    • Reliance on transcription and specified sources of information

    • Reliance on unstructured notes

    • Multiple versions in one record

    • Full review at http://dublincore.org/usage/meetings/2006/04/seattle/rda-review/RDA_for_who.htm


Slide19 l.jpg
EAD

  • XML schema for encoding archival finding aids

  • Contains elements for all aspects of archival description, from <repository> to <daoloc>

  • <archdesc> is the standard tag for describing fonds, series, subseries, etc. hierarchies


Slide20 l.jpg

<did>

<head>Summary Description of the Tom Stoppard Papers</head>

<repository>

<corpname>The University of Texas at Austin

<subarea>Harry Ransom Humanities Research Center</subarea>

</corpname>

</repository>

<origination>

<persname source="lcnaf" encodinganalog="100">Stoppard,Tom</persname>

</origination>

<unittitle encodinganalog="245">Tom Stoppard Papers, </unittitle>

<unitdate type="inclusive">1944-1995</unitdate>

<physdesc encodinganalog="300">

<extent>68 boxes (28 linear feet)</extent>

</physdesc>

<unitid type="accession">R4635</unitid>

<physloc audience="internal">14E:SW:6-8</physloc>

<abstract>The papers of British playwright Tom Stoppard (b. 1937 encompass

his entire career and consist of multiple drafts of his plays, from the well-known

<title render="italic">Rosencrantz and Guildenstern Are Dead</title> to several

that were never produced, correspondence, photographs, and posters, as

well as materials from stage, screen, and radio productions from around the

world.</abstract>

</did>


Premis l.jpg
PREMIS

  • Data model

    • Digital objects

    • Intellectual entities

    • Agents

    • Events

    • Rights

    • Relationships

  • Data Dictionary contains examples and sections on compliance and implementation

  • Can be encoded in METS



Metsrights l.jpg
METSRights

  • Endorsed by METS Board but useful outside of METS documents

  • XML Elements

    • RightsDeclaration

      • RightsHolder

      • Context

        • Permissions

        • Constraint


Slide24 l.jpg
METS

  • METS: Metadata Encoding & Transmission Standard

  • Encodes descriptive, administrative, and structural metadata in one XML file

  • Preferred data structure for digital library initiatives

  • Goals

    • Manage different types of metadata

    • Migrate resources between repositories


Mets community l.jpg
METS Community

  • Maintenance agency is Library of Congress

  • Website

    • http://www.loc.gov/standards/mets/

  • Implementation registry

    • Lists 33 projects at 24 institutions


Mets components l.jpg
METS Components

  • METS header

  • Descriptive metadata section

  • Administrative metadata section

  • File section

  • Structural map section

  • Structural link section

  • Behavior section


Filesec l.jpg
fileSec

  • Lists all files making up the resource

  • <fileLocat> points to files

  • IDs of <file> elements link to pertinent administrative metadata in <amdSec> using the ADMID attribute


Slide28 l.jpg

<mets:fileSec>

<mets:fileGrp USE="archive image">

<mets:file ID="epi01m" MIMETYPE="image/tiff">

<mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/

full/01.tif" LOCTYPE="URL"/>

</mets:file>

<mets:file> … </mets:file>

</mets:fileGrp>

<mets:fileGrp USE="reference image">

<mets:file ID="epi01r" MIMETYPE="image/jpeg">

<mets:FLocat

xlink:href="http://www.loc.gov/standards/mets/docgroup/jpg/01.jpg"

LOCTYPE="URL"/>

</mets:file>

</mets:fileGrp>

<mets:fileGrp USE="thumbnail image">

<mets:file ID="epi01t" MIMETYPE="image/gif">

<mets:FLocat

xlink:href="http://www.loc.gov/standards/mets/docgroup/gif/01.gif"

LOCTYPE="URL"/>

</mets:file>

</mets:fileGrp>

</mets:fileSec>


Structmap l.jpg
structMap

  • The only required section

  • Defines the hierarchical structure of the resource

  • Can be physical or logical

    • Physical structMaps simply list files in order

      • Pages that make up a book

    • Logical structMaps list files in order but in the context of the intellectural structure of the resource

      • Chapters that make up a book


Slide30 l.jpg

<mets:structMap TYPE="physical">

<mets:div TYPE="book" LABEL="Martial Epigrams II">

<mets:div TYPE="page" LABEL="Blank page">

</mets:div>

<mets:div TYPE="page" LABEL="Page ii: Blank page">

</mets:div>

<mets:div TYPE="page" LABEL="Page iii: Title page">

</mets:div>

<mets:div TYPE="page" LABEL="Page iv: Publication info">

</mets:div>

<mets:div TYPE="page" LABEL="Page v: Table of contents">

</mets:div>

<mets:div TYPE="page" LABEL="Page vi: Blank page">

</mets:div>

<mets:div TYPE="page" LABEL="Page 1: Half title page">

</mets:div>

<mets:div TYPE="page" LABEL="Page 2 (Latin)">

</mets:div>

<mets:div TYPE="page" LABEL="Page 3 (English)">

</mets:div>

</mets:div>

</mets:div>

</mets:structMap>


Dmdsec l.jpg
dmdSec

  • Contains descriptive metadata

  • Descriptive metadatat can be included or linked externally

  • Descriptive metadata can be in any scheme

  • Can accommodate XML (ex., MODS) or binary (ex., MARC) representations of descriptive metadata


Slide32 l.jpg

<mets:dmdSec ID="DMD1">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="MODS">

<mets:xmlData>

<mods:mods version="3.1">

<mods:titleInfo>

<mods:title>Epigrams</mods:title>

</mods:titleInfo>

<mods:name type="personal">

<mods:namePart>Martial</mods:namePart>

</mods:name>

<mods:name type="personal">

<mods:namePart>Ker, Walter C. A. (Walter Charles Alan),

1853-1929

</mods:namePart>

</mods:name>

<mods:typeOfResource>text</mods:typeOfResource>

</mods:mods>

</mets:xmlData>

</mets:mdWrap>

</mets:dmdSec>


Amdsec l.jpg
amdSec

  • Contains info on digital resource, files in the resource, or original analogue source

  • Type of info

    • Technical

    • Intellectual property

    • Provenance


Slide34 l.jpg

<mets:techMD ID="AMD001">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG"

LABEL="NISO Img.Data">

<mets:xmlData>

<niso:MIMEtype>image/tiff</niso:MIMEtype>

<niso:Compression>LZW</niso:Compression>

<niso:PhotometricInterpretation>

8

</niso:PhotometricInterpretation>

<niso:Orientation>

1

</niso:Orientation>

<niso:ScanningAgency>

NYU Press

</niso:ScanningAgency>

</mets:xmlData>

</mets:mdWrap>

</mets:techMD>


Mets header l.jpg
mets Header

  • Contains info about the METS document

  • Sample

<metsHdr CREATEDATE="2006-05-09T15:00:00"

LASTMODDATE=”2006-05-09T21:00:00>

<mets:agent ROLE="CREATOR" TYPE="INDIVIDUAL">

<mets:name>Rick Beaubien</mets:name>

</mets:agent>

<mets:altRecordID TYPE=”LCCN”>20022838</mets:altRecordID>

</metsHdr>


Structlink l.jpg
structLink

  • Adds hyperlinks between elements in a Structural Map

  • Sample

<mets:structLink>

<mets:smLink xlink:from="LINK7" xlink:to="page1145"

xlink:title="projects">

</mets:smLink>

<mets:smLink xlink:from="LINK13" xlink:to="page1145”

xlink:title="projects">

</mets:smLink>

<mets:smLink xlink:from="LINK36" xlink:to="page113"

xlink:title="officers">

</mets:smLink>

<mets:smLink xlink:from="LINK37" xlink:to="page120"

xlink:title="calender">

</mets:smLink>

</mets:structLink>


Behaviorsec l.jpg
behaviorSec

  • Associates executable behaviors (i.e., computer code) with parts of a document/object

  • Sample

<mets:behaviorSec>

<mets:behavior ID="disp1" STRUCTID="top" BTYPE="display”

LABEL="Display Behavior">

<mets:interfaceDef LABEL="EAD Display Definition"

LOCTYPE="URL" xlink:href=

”http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayDef.txt”/>

<mets:mechanism LABEL="EAD Display Mechanism"

LOCTYPE="URN" xlink:href=

“http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayMech.xml

</mets:behavior>

</mets:behaviorSec>


Linking between sections l.jpg
Linking Between Sections

  • Can point to <dmdSec>

    • <file>, <stream>, <div>

  • Can point to <techMD>, <rightsMD>, <sourceID>, <digiprovMD>

    • <dmdSec>, <file>, <fileGrp>, <stream>

  • Can point to <file>

    • <fptr>, <area>

  • Can point to <div>

    • <behavior>


Mets profiles l.jpg
METS Profiles

  • METS is so flexible, it needs to be documented for each particular application or use

  • Components

    • URI

    • Date

    • Abstract

    • Extension schemas

    • Rules of description

    • Vocabularies

    • Structural rules for resources

    • Technical metadata


What is the point of all this l.jpg
What is the point of all this?

  • Management of digital resources requires many types of metadata

  • Managing all this metadata can be difficult

  • METS can do it all, but is complex


Functional requirements l.jpg
Functional Requirements

  • What do you expect your metadata to do?

    • The nature of the resources you are putting in your digital collection

    • The nature of the intended audience(s) for your collection

    • The level of description

    • The size of your collection

    • Importance of interoperability

    • The resources your library has for creation and long-term maintenance of the metadata


Nature of resources l.jpg
Nature of Resources

  • Is there full text?

  • Are they “simple” or “complex”?

  • Do you supply multiple versions of the same resource?

  • Are all resources available to all users?


Nature of users l.jpg
Nature of Users

  • Is your audience general or specialized?

  • How information/network literate are they?

  • How much information will they need to choose appropriate resources?

  • What other assumptions can you safely make about your users, and how do those assumptions impact your metadata planning activities?


Level of description l.jpg
Level of Description

  • How much detail do you want to include in your metadata

  • Related to resources available for creation of metadata, and balance of quantity vs. quality

  • Expensive (e.g., subject) vs. cheap (e.g., file size) descriptive elements


Size of collection l.jpg
Size of Collection

  • Small collections rely less on metadata than large collections do

  • Browsing, faceting, and differentiating functions are more important in large collections

  • In general, the bigger the collection, the more granular the values in your metadata needs to be

    • E.g., subject vocabularies


Importance of interoperability l.jpg
Importance of Interoperability

  • Metadata in local schemes is more difficult to share than metadata in standard schemes

  • Always assume your metadata will be used in contexts different from the original

  • Plan metadata with crosswalks in mind


Resources for managing metadata l.jpg
Resources for Managing Metadata

  • How will metadata of various types be created and managed?

  • Does your institution have a DAM strategy?

  • Will preservation metadata (e.g., PREMIS) be managed?


Frbr s user tasks l.jpg
FRBR’s User Tasks

  • Functional requirements can be expressed in terms of the FRBR data model

    • Find entities which correspond to user’s search criteria

    • Identify an entity

    • Select an entity

    • Acquire or obtain access to the desired entity


Analyzing domains l.jpg
Analyzing Domains

  • Environmental

  • Object class

  • Object format

Jane Greenberg, “Understanding Metadata and Metadata Schemas.” In Metadata: A Cataloguer’s Primer. Ed. Richard P. Smiraglia. New York: Haworth. 2005.


Metadata quality l.jpg
Metadata Quality

  • Completeness

  • Accuracy

  • Provenance

  • Conformance to expectations

  • Logical consistency and coherence

  • Timelines

  • Accessibility

Thomas R. Bruce and Dianne I Hillmann, “The Continuum of Metadata

Quality: Defining, Expressing, Expoiting.” In Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.


Slide51 l.jpg

First

Intermission


Before lunch l.jpg
Before Lunch

  • Metadata life cycle

  • Strategies for creation and management

  • Automated metadata creation

  • Supplementation strategies


Metadata management life cycle l.jpg

Repurposing

Creation

Storage

Supplementation

Sharing

Metadata Management Life Cycle


Strategies for creation and management l.jpg
Strategies for Creation and Management

  • Depend on complexity and completeness of metadata

  • Common strategies

    • Create and manage simple (single type) metadata in one application

    • Create simple metadata in one app and manage in another

    • Create and manage different types of metadata in multiple apps, and combine for use


Premis survey l.jpg
PREMIS survey

  • Most common tool was relational databases

  • XML databases or XML files stored with digital objects

  • Flat files or object-relational databases

  • Most respondents were using two or more of these methods


Creation l.jpg
Creation

  • Avoid recreating metadata

  • Metadata can be created

    • At time of resource creation

      • Born digital

      • Digitized

    • After resource creation

  • Primarily a manual task

  • Variety of tools


Raw xlm l.jpg
Raw XLM

  • Advantages

    • Provides high level of control

    • Requires simple tools

  • Disadvantages

    • XML makes humans’ heads ache

    • Extremely unforgiving of errors


Greenstone l.jpg
Greenstone

  • Open source repository platform from University of Waikato

  • Provides support for several types of metadata and can export METS

  • Provides Java client (Greenstone Librarians’ Interface, a.k.a. GLI) for metadata production

  • Also provides “plugins” for extracting extracting metadata


Contentdm l.jpg
CONTENTdm

  • Commercial repository platform from OCLC

  • Provides support for several types of metadata and can export XML that can be converted into METS

  • Provides Windows client for production (Acquisition Station)

  • Also provides a web interface for creating metadta and ingesting content


Slide63 l.jpg
CWIS

  • Open source “collection manager”

  • Product of the National Science Digital Library

  • Features rich metadata management tools


Alouettecanada l.jpg
AlouetteCanada

  • Metadata Toolkit will provide local management and access

  • Portal will provide centralized access

  • Best practices documents will support creation and management of metadata and content


Alouettecanada metadata toolkit l.jpg
AlouetteCanada Metadata Toolkit

  • A content management system for library, archives, and museum collections

  • Will allow staff to create metadata and manage content

  • Scheme support

    • MODS

    • EAD

    • METS

  • Will allow basic digital assets management


Dam in the toolkit l.jpg
DAM in the Toolkit

  • Tools for managing master and derivative versions of files

  • Tools for creating checksums and managing technical metadata

  • Tools for managing rights tracking

  • Tools for managing administrative metadata


Alouettecanada portal l.jpg
AlouetteCanada Portal

  • Aggregates metadata from participating institutions for centralized searching

  • Points back to Tooklit or whatever else is hosting items

  • Based on the OurOntario Portal


Automated metadata creation l.jpg
Automated Metadata Creation

  • Technical

    • JHOVE, DROID, digitization hardware

  • Descriptive

    • Born-digital document metadata

  • Subject

    • INFOMINE iVia tools

  • Structural

    • Sequential filename generation


Chinese times processing workflow l.jpg
Chinese Times Processing Workflow

  • Line up TIFF image in thumbnail view

  • Create directory with date as name

  • Copy that day’s files into directory

  • Run renamer/metadata creation script

    • Get all files in input dir, create full paths

    • Walk through inputfile list

    • Rename 1st file -01.tif, 2nd file -03.tif, 3rd file -02.tif, etc.

    • Output directory name and metadata file for CONTENTdm

  • Quality control


  • Slide74 l.jpg

    Import directory structure

    Issue-level metadata file for import into CONTENTdm

    Title Date Publisher Rights Description Type Format Language Filename

    Chinese Times, April 1st, 1920 04/01/1920 The Chinese Freemasons Society of Canada Copyright the Chinese Freemasons Society of Canada


    Storage l.jpg
    Storage

    • Some file formats enable internal storage of metadata

    • For external storage, relational databases offer most flexibility

      • Complex metadata can be stored in simple structures

      • Can handle hierarchical data

    • Are agnostic to other phases in metadata life cycle

    • Not highly scalable for text retrieval

      • External indexers eliminate this problem

    • Can export and import XML, MARC, etc.


    Repurposing l.jpg
    Repurposing

    • Different use of metadata than originally intended

    • Often migrated to or imported into an external system

    • Examples

      • Dumping new items lists from ILS for use in external portal

      • Creating MARC records from vendor spreadsheet (demo)


    Sharing l.jpg
    Sharing

    • All metadata should be created to be shared

    • May require exporting, crosswalking, supplementation

    • Basic approaches to sharing: metasearching and harvesting

    • Syntaxes for sharing are easy, semantics for sharing more difficult


    Pkp metadata harvester l.jpg
    PKP Metadata Harvester

    • Open source

    • PHP/MySQL

    • Product of the Public Knowledge Project

    • Features

      • Can harvest any metadata format via OAI

      • Flexible plugin and customization features

      • Defines crosswalks between different schema


    Supplementation strategies l.jpg
    Supplementation Strategies

    • Manually add or update elements

    • Programmatically supplement

    • Add namespaces

    • Virtual supplementation


    Supplementation examples l.jpg
    Supplementation Examples

    • "on the horse" @ Harvard

    • adding namespaces into DC

    • PKP Metadata Harvester

    • CUFTS

      • cufts2marc

      • Subjects and other fields in MARC records in CUFTS

    • Georgia Tech’s Umlaut link resolver


    Example programmatic supplementation l.jpg
    Example:Programmatic Supplementation

    • get_subjects.pl titles.txt (demo)

    • Possible enhancements

      • Harvest complete record and pick out wanted fields

      • Write local MARC record

      • Add heuristics to dedupe and reduce false hits


    Example add namespaces l.jpg
    Example: Add Namespaces

    Creator: Jane Doe

    Title: Travels in Iceland

    Date: 12/07/2003

    Becomes in OAI-PMH

    <oai_dc:dc>

    <dc:creator>Jane Doe</dc:creator>

    <dc:title>Travels in Iceland</dc:title>

    <dc:date>12/07/2003</dc:date>

    </oai_dc:dc>


    Example virtual supplementation l.jpg
    Example: Virtual Supplementation

    • Georgia Tech’s Umlaut link resolver

      • SFX ERM data

      • ILS Oracle database for holdings info

      • OCLC's xISBN service for related ISBNs

      • Google and Yahoo APIs for open access material

      • OCLC's Resolver Registry to determine additional link resolver for user’s IP address

    Ross Singer, posting to NGC4LIB list thread “Link resolvers as loosely

    coupled systems for holdings?” September 10, 2007



    Before the afternoon break l.jpg
    Before the Afternoon Break

    • SFU thesis workflow case study

    • Native vs. derived metadata

    • Crosswalks


    Workflow case study sfu electronic theses l.jpg
    Workflow case Study: SFU Electronic Theses

    • Prototyped several ETD services

    • Was developing an institutional repository program

    • Contacted vendors for retro conversion and discovered we could do it ourselves

    • Saw increasing need to process print theses more efficiently


    Goals l.jpg
    Goals

    • Digitize and provide access to over 4500 SFU theses described in our catalogue

    • Develop efficient current ETD service

    • Add content to SFU’s institutional repository

    • Provide access through both the catalogue and the IR

    • No intent to stop supporting print theses


    Specifications l.jpg
    Specifications

    • Digital versions would be for access only; no need seen to create high-quality masters

    • Theses would be available to all users

    • Metadata should be as rich as possible while remaining efficient to create


    Issues l.jpg
    Issues

    • Rights Management of retro theses

      • “Fair dealing”

      • Use of PDF’s security features

    • Developing efficient workflows for processing current theses

    • Standardization of descriptive metadata

    • Technical issues

      • Dirty OCR and specialized symbols

      • Challenging source documents


    Workflows l.jpg
    Workflows

    • Current (December 2004 - )

      • Digitization

      • Metadata

    • Retrospective (1967 – 1997)

      • Digitization

      • Metadata


    Workflow for current theses l.jpg
    Workflow for Current Theses

    • Thesis Assistant provides master list in MS Excel when previous semester’s submissions “closed”

    • Digitization staff scan unbound copies directly into Adobe Acrobat

      • Filenaming scheme: Unique ID assigned manually

    • Systems staff convert metadata

    • Systems staff import into DSpace

    • Systems staff create MARC in batch

    • Tech Services load into library catalogue


    Slide96 l.jpg

    #!/usr/local/bin/perl

    ##################

    ### Main program ###

    ##################

    &OpenInputFile;

    &OpenOutputFiles;

    <dspace_import>

    <author>….</author>

    <title>…</title>

    <year>…</year>

    <dept>…</dept>

    </dspace_import>

    Scanned

    theses

    PDFs

    Thesis Assistant’s spreadsheet

    with temporary thesis ID added

    theses2dspace.pl

    (Filenames correspond

    to temp. theses IDs)

    DSpace import metadata

    and packages

    LDR 00747nas 2200157za 4500

    005 20040903164118.1

    006 m d d |

    007 cr u||||||||||

    008 040903||||||||||||||||||||d|||||||||||||

    100 00 _aSmith, Student P.

    245 00 _aThe title: _bcontaining some catchy words

    856 04 _uhttp://ir.lib.sfu.ca/handle/1892/99

    #!/usr/local/bin/perl

    ##################

    ### Main program ###

    ##################

    &OpenInputFile;

    &OpenOutputFiles;

    DSpace

    Brief MARC records

    dspace2marc.pl

    III

    Metadata Workflow for Current (Dec 2004 - ) Theses

    DSpace import utility

    thesisID1 1892/99

    thesisID2 1892/100

    thesisID3 1892/101

    Dspace map file

    MARC 856: http://ir.lib.sfu.ca/handle/1892/99


    Slide97 l.jpg

    <dublin_core>

    <dcvalue element="contributor" qualifier="author">

    Henderson, Brian Charles</dcvalue>

    <dcvalue element="title" qualifier="none">

    Operational effectiveness in cellulose fibers business

    of Weyerhaeuser Company: can the cost trends of 2005

    be reversed?</dcvalue>

    <dcvalue element="date" qualifier="issued">2006</dcvalue>

    <dcvalue element="language" qualifier="iso">en</dcvalue>

    <dcvalue element="rights" qualifier="none">Copyright remains

    with the author</dcvalue>

    <dcvalue element="type" qualifier="none">text</dcvalue>

    <dcvalue element="type" qualifier="none">thesis</dcvalue>

    <dcvalue element="description" qualifier="none">Research

    Project (M.B.A.) - Faculty of Business Administration –

    Simon Fraser University</dcvalue>

    <dcvalue element="description" qualifier="abstract">

    The Cellulose Fibers Business of Weyerhaeuser

    Company [...] </dcvalue>

    <dcvalue element="relation“

    qualifier="isformatof">http://troy.lib.sfu.ca/search/t?

    SEARCH=Operational+effectiveness+in+cellulose+fibers+

    business+of+Weyerhaeuser+Company+can+the+cost+trends+


    Slide99 l.jpg

    LDR 00000nam 2200000Ia 4500

    006 m||||||||d||||||||

    007 cr||n||||||d||

    008 070823s2006||||bcc||||||m||||||||||eng||

    035 _fgb

    040 _aCaBVas

    _beng

    100 1 _aBuckham, Catherine Anne

    245 10 _aPublic participation in land use planning:

    _bWhat is the role of social capital? /

    _cby Catherine Anne Buckham

    300 _a leaves

    260 _aBurnaby B.C. :

    _bSimon Fraser University,

    _c2006

    500 _aTheses (Urban Studies Program) / Simon Fraser University

    502 _aResearch Project (M.U.S.) - Simon Fraser University, 2006

    520 3 _aThis study examines […]

    810 2 _aSimon Fraser University.

    _tTheses (Urban Studies Program)

    856 41 _uhttp://ir.lib.sfu.ca/handle/1892/3730

    966 _c2

    _linprc

    _s-

    _i3

    967 _c0


    Workflow for retro theses l.jpg
    Workflow for Retro Theses

    • Master production list derived from MARC records in catalogue

      • Filenaming scheme based on ILS bib record number

    • Digitization staff

      • Scan from microfiche and print copies

      • Remove signatures from approval pages manually

      • Create PDFs from page images


    Slide102 l.jpg

    Pre-scanning

    Preparation

    Check hard drive space

    Create working directory

    Poor

    quality

    Test

    scan

    Scan printed theses

    Good

    quality

    Perform batch scanning

    Please refer to flatbed scanning instructions

    Image processing

    Poor quality

    Quality

    check

    Good quality

    PDF conversion

    Retrospective

    Digitization

    Workflow

    Courtesy of Ian Song,

    Digital Initiatives

    Coordinator,

    SFU


    Slide103 l.jpg

    Metadata Workflow Retrospective (1966 - 1997) Theses

    LDR 00747nas 2200157za 4500

    005 20040903164118.1094254879.1

    006 m d d |

    007 cr u||||||||||

    008 040903||||||||||||||||||||d|||||||||||||

    100 00 _aSmith, Student P.

    245 00 _aThe title: _bcontaining some catchy words

    #!/usr/local/bin/perl

    ##################

    ### Main program ###

    ##################

    &OpenInputFile;

    &OpenOutputFiles;

    <dspace_import>

    <author>….</author>

    <title>…</title>

    <year>…</year>

    <dept>…</dept>

    </dspace_import>

    Scanned

    theses

    PDFs

    (Filenames correspond

    to III .bnumbers)

    marc2dspace.pl

    MARC records from III

    DSpace import metadata

    and packages

    DSpace import utility

    #!/usr/local/bin/perl

    ##################

    ### Main program ###

    ##################

    &OpenInputFile;

    &OpenOutputFiles;

    b18721102 1892/204

    b18762105 1892/205

    b14731140 1892/1206

    035 .b18721102

    856 04 _uhttp://ir.lib.sfu.ca/handle/1892/99

    DSpace

    Dspace map file

    Brief MARC records containing .bnumber

    and 856 field for overlaying on existing

    records

    updatethesesmarc.pl

    MARC 856: http://ir.lib.sfu.ca/handle/1892/99

    III


    Interoperability l.jpg
    Interoperability

    • The ability of one system to communicate with another

    • Can exist on various levels

      • Low-level protocols like TCP/IP

      • High-level like metadata

    • Examples relevant to digital repositories

      • Dublin Core within METS document

      • OAI-PMH

    • Syntactic and semantic interoperability


    How much interoperability l.jpg
    How Much Interoperability?

    • Will your collection be integrated into / linked to a larger one?

    • How important is internal consistency within your collections?

    • Best practices encourage interoperability

    • (Qualified) Dublin Core is safe choice


    Crosswalks l.jpg
    Crosswalks

    • Mappings for converting one schema to another

    • DC to MARC, DC to MODS, MARC to MODS, etc

    • Promote reuse, interoperatbility

    • Sample list


    Lossy and lossless crosswalks l.jpg
    Lossy and Lossless Crosswalks

    • Lossy: crosswalk removes granularity

    • Lossless: no loss of granularity

    • Dummy down vs. smarten up

    • Acid test: round trip a data set


    Native vs derived metadata l.jpg
    Native vs. Derived Metadata

    • Moving metadata from one container to another

    • Crosswalks document correspondences

    • Deriving metadata is part of sharing and reuse



    Example alouette metadata toolkit l.jpg
    Example: Alouette Metadata Toolkit DC.Subject

    • Metadata is stored internally in a relational database and as raw XML files

      • element ID, element ID eelation, info object ID, culture, element, schema, value

      • Attributes are also rows in same

    • It is exported as METS and EAD files


    Slide112 l.jpg

    Second DC.Subject

    Intermission


    Before end of day l.jpg
    Before End of Day DC.Subject

    • Application Profiles

    • OAI-PMH


    Application profiles l.jpg
    Application Profiles DC.Subject

    • A set of metadata elements, policies, and guidelines defined for a particular community or implementation

    • Obligation, legal qualifiers and values, best practice

    • CEN (European Committee For Standardization) CWA 14855

    • Examples

      • CanCore

      • DCMI Library Application Profile

      • DCMI Education Application Profile

      • OhioLINK Digital Media Center (DMC) Metadata Application Profile


    Why are profiles necessary l.jpg
    Why are Profiles Necessary? DC.Subject

    • Among 82 OAI data providers, 71% used only 5 elements (creator, identifier, title, date, and type)

    • 54% of providers used only creator and identifier for over half their records

    Jewel Ward, “Unqualified Dublin Core Usage in OAI-PMH Data Providers” OCLC

    Systems And Services 20.1 (2004), 40-47.


    Invent or borrow l.jpg
    Invent or Borrow? DC.Subject

    • Avoid inventing; borrow instead

    • Overhead of maintaining your own schema

    • Is your material so special?

    • Borrow properties (fields, elements), put effort into values

    • Document and give back your application profile


    Slide118 l.jpg
    OAI DC.Subject

    • OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting

    • Harvesting, not resource discovery

    • Uses standard Web protocols


    Oai pmh model l.jpg

    Verbs DC.Subject

    <OAI-PMH>…

    OAI-PMH Model

    Data providers

    expose metadata

    Service providers

    harvest metadata

    and do something

    useful with it


    Examples of verbs l.jpg
    Examples of Verbs DC.Subject

    http://oai.lib.sfu.ca/oai2.php?verb=Identify

    • verb=ListSets

    • verb=ListRecords&set=cartoons

      &metadataPrefix=oai_dc

    • verb=ListRecords

      &from=2002-06-01T02:00:00Z

      &set=cartoons

      &metadataPrefix=oai_dc


    Selective harvesting l.jpg
    Selective harvesting DC.Subject

    • Sets

      • Used for grouping items

      • May be flat or hierarchical

        • province:british+columbia

        • Type:Reports

    • Datestamps

      • Uses Coordinated Universal Time

      • “from” and “until” arguments

        • verb=…from=2003-01-15Z


    Harvest store repurpose l.jpg

    OIA Rep DC.Subject

    OIA Rep

    OIA Rep

    OIA Rep

    Some other

    harvester

    Search

    New this week

    Harvest, Store, Repurpose

    Harvester /

    Aggregator /

    Data store


    Metadata sharing case study carl harvester and carlcore ap l.jpg
    Metadata Sharing Case Study: DC.SubjectCARL Harvester and CARLCore AP

    • “Canadian Association of Research Libraries / Association des bibliothèques de recherche du Canada's Institutional Repository Metadata Harvester”

    • http://carl-abrc-oai.lib.sfu.ca/

    • Launched June 2004

    • Now contains 35,000+ records

    • Primarily a search engine for the harvested metadata

    • Uses the PKP Metadata Harvester software


    Repositories l.jpg

    Archimede Université Laval DC.Subject

    Collection mémoires et thèses de l'Université Laval

    [email protected]

    eCommons::Research (University of Winnipeg)

    Mspace (University of Manitoba)

    Ozone (Ontario Scholars Portal)

    Papyrus - Dépôt institutionnel numérique (Université de Montréal)

    Simon Fraser University Institutional Repository

    T-Space (University of Toronto)

    University of Saskatchewan Electronic Theses & Dissertations

    University of Waterloo Electronic Theses

    UVicDSpace

    Repositories


    The problem l.jpg
    The Problem DC.Subject

    • Increased dissatisfaction with search capabilities

    The Solutions

    • Improvements to the software

    • Development of an application profile


    Goals130 l.jpg
    Goals DC.Subject

    • Develop a profile that

      • Improves quality of aggregated metadata

      • Is practical

      • Is voluntary

    • Benefits include

      • Better centralized services

      • Streamlined local practices

      • Guidance for new repositories


    Working group l.jpg
    Working Group DC.Subject

    • Mark Jordan (SFU), Chair

    • Sam Kalb (Queen’s)

    • Lynne McAvoy (CISTI)

    • Lisa O’Hara (Manitoba)

    • Sharon Rankin (McGill)

    • Kathleen Shearer (CARL)

    • Nancy Stuart (Victoria)


    Process l.jpg
    Process DC.Subject

    • Analyze the metadata (from June 2005)

    • Develop use cases and functional requirements

    • Survey other application profiles

      • ePrints UK “Using Simple Dublin Core to Describe Eprints”

      • “ARROW Discovery Service Harvesting Guide”


    Timeline past l.jpg
    Timeline (past) DC.Subject

    • October 2004: Proposal to develop AP

    • April 2005: Formation of mailing list

    • September 2005: Meeting in Ottawa

    • March 2006: Formation of AP working group

    • June 2006: Meeting in Québec

    • October 2006: CARLCore Level 1 available for comment


    Timeline future l.jpg
    Timeline (future) DC.Subject

    • November 10, 2006: Deadline for comments

    • January 31, 2007: Final release

      • IR platform-specific implementation guidelines

      • French translation

    • Ongoing: CARLCore Level 2


    Carlcore ap l.jpg
    CARLCore AP DC.Subject

    • Document is a standard application profile

    • Containing…

      • Rationale

      • General principles and recommendations

      • Entries for each uDC element

      • Appendices

        • Implementation guidelines

        • Sample records

        • CARLCore and the CARL Harvester


    Carlcore level 1 l.jpg
    CARLCore Level 1 DC.Subject

    • Uses only unqualified Dublin Core

    • Goal is to make use of the DC elements in OAI as consistent as possible

    • From the “Principles”:

      CARLCore Level 1 parallels the Dublin Core Metadata Element Set in order to supply the richest and most consistent metadata possible within the minimum requirements of the Open Archives Initiative Protocol for Metadata Harvesting.


    Sample elements l.jpg
    Sample Elements DC.Subject

    • Identifier

    • Source

    • Type



    Handling local variations l.jpg
    Handling local variations DC.Subject

    • Top-down approach

      • Dictate shared vocabulary

    • Bottom up approach

      • Provide solution for accommodating both local and centralized needs


    Type map solution l.jpg
    “Type map” solution DC.Subject

    • Harvester uses a “map file” to convert local type values into shared vocabulary

    • Simple XML format

    • Each repository administrator maintains the map file

    • End result is that metadata is processed while being harvested


    Slide145 l.jpg

    dissertation DC.Subject

    picture

    thesis

    image

    Harvester

    Local repository

    verb=ListRecords


    Slide146 l.jpg

    <mappings> DC.Subject

    <mapping from=" " to="Actes de conférence / Conference Proceedings" />

    <mapping from=" " to="Article" />

    <mapping from=" " to="Audio" />

    <mapping from=" " to="Carte, plan / Map, plan" />

    <mapping from=" " to="Chapitre de livre / Book chapter" />

    <mapping from=" " to="Communication, présentation / Paper, Presentation" />

    <mapping from=" " to="Ensemble de données / Dataset" />

    <mapping from=" " to="Image" />

    <mapping from=" " to="Livre / Book" />

    <mapping from=" " to="Logiciel / Software" />

    <mapping from=" " to="Mémoire de maîtrise / Master's thesis" />

    <mapping from=" " to="Objet d'apprentissage / Learning Object" />

    <mapping from=" " to="Partition musicale / Musical Score" />

    <mapping from=" " to="Pré-publication / Preprint" />

    <mapping from=" " to="Rapport / Report" />

    <mapping from=" " to="Thèse de doctorat / Doctoral dissertation" />

    <mapping from=" " to="Vidéo / Video" />

    <mapping from=" " to="Autre / Other" />

    </mappings>


    Carlcore level 2 l.jpg
    CARLCore Level 2 DC.Subject

    • Will add elements to CARLCore Level 1

    • One existing goal is to provide faceted discipline browsing

      • Using OAI sets?

      • Using one ore more non uDC elements?

    • May focus on disciplinary archives

    • Other features leading to “added value” for users


    Implementation issues l.jpg
    Implementation Issues DC.Subject

    • Legacy metadata

    • Conflicts with local IR metadata practice

    • Inflexible OAI gateways in IR platforms

    • Lack of tools to test compliance

    • Yes, using CARLCore is optional… but there is strength in numbers


    Carlcore to do list l.jpg
    CARLCore To Do List DC.Subject

    • Take advantage of PKP Harvester’s data normalization features

    • CARLCore Level 2

    • Stay current with (and collaborate with) IR platforms


    Summary l.jpg
    Summary DC.Subject

    • Metadata requirements for repositories drive decisions

    • Do not reinvent the wheel — instead, adopt or develop an application profile

    • Metadata must be managed

    • Tools should not define your ability to manage your metadata

    • Metadata can be shared


    Recommended online reading l.jpg
    Recommended Online Reading DC.Subject

    • METS Primer and Reference Manual. http://www.loc.gov/standards/mets/METS%20Documentation%20draft%20070310p.pdf

    • DCMI Proceedings. http://www.dcmipubs.org/ojs/index.php/pubs

    • Understanding Metadata. NISO, 2004. http://www.niso.org/standards/resources/UnderstandingMetadata.pdf


    Recommended print reading l.jpg
    Recommended Print Reading DC.Subject

    • Library Technology Reports: Metadata and Its Applications. Ed. Brad Eden. 41.6: November-December 2005.

    • Metadata: A Cataloguer’s Primer. Ed. Richard Pl Smiraglia. New York: Haworth. 2005.

    • Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.


    ad