Metadata for digital repositories
1 / 152

- PowerPoint PPT Presentation

  • Updated On :

Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License Schedule 9:00 - 10:30

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '' - Faraday

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Metadata for digital repositories l.jpg

Metadata for Digital Repositories

Mark Jordan

Repository Redux

University of Prince Edward Island

September 19, 2007

This work is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 2.5 Canada


Schedule l.jpg

  • 9:00 - 10:30

    • Background; types of metadata; major standards; choosing metadata schemes

  • 10:45 - 12:00

    • Metadata life cycle; strategies for creation and management; automated creation; supplementation strategies

  • 1:00 - 2:30

    • SFU theses workflow case study; native vs. derived; crosswalks

  • 2:45 - 4:30

    • Application Profiles; OAI; CARLCore AP case study

What is metadata l.jpg
What is Metadata?

  • Different meanings in different communities

  • Information about information

  • Can describe information at any level

    • Collection

    • Item

    • Item within item

  • Can be embedded within an object or separate from it

Types of metadata l.jpg
Types of Metadata

  • Descriptive

  • Terms and conditions

  • Administrative data

  • Content ratings

  • Provenance

  • Linking or relationship data

  • Structural data

Carl Lagoze, Clifford A. Lynch, and Ron Daniel, Jr. “The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata”. 1996.

Metadata and cataloguing l.jpg
Metadata and Cataloguing

  • Perception that cataloguing is old and metadata is new

  • Traditional cataloguing focuses on descriptions of analogue materials

  • Metadata focuses on management of networked resources

  • For locally created or managed networked resources (such as repositories), cataloguing is insufficient

Metadata schemes l.jpg
Metadata Schemes

  • Defines a collection of elements for supporting a specific function

  • Defines structures for element values

  • Defines formal aspects of the element set, such as name, definition, data type, etc.

  • Some schemes are expressed as XML schemas

Containers vs rules of description l.jpg
Containers vs. Rules of Description

  • Containers dictate structure

  • Rules of description dictate content

  • Common rules of description

    • AACR2

    • RDA

    • RAD

Vs glue standards l.jpg
Vs. Glue Standards

  • OpenURL

    • Syntax for encoding bib data in URLs


  • COinS

    • OpenURLS embedded in HTML <span> tags

  • unAPI

    • Identifiers embedded in HTML <abbr> tags for autodiscovery and “copy and paste”

  • Microformats

    • For example, <a href="" rel="license">cc by 2.0</a>

Selected major standards l.jpg
Selected Major Standards

  • Dublin Core

  • MODS

  • Collection Description

  • RDA

  • EAD


  • METS

Dublin core l.jpg
Dublin Core

  • Standard metadata set for describing resources

  • It is flexible

    • Qualified vs. unqualified

    • Can be expressed in HTML, XML ,or using RDF

  • Dummying down is a good thing

Dublin core element set l.jpg
















Dublin Core Element Set

Dublin core qualifiers l.jpg
Dublin Core Qualifiers

  • Types

    • Element refinements

    • Encoding schemes

  • Examples

    • Description

      • Table of contents, abstract

    • Date

      • Created, valid, available, issued, modified

    • Subject


Slide14 l.jpg

  • A “bibliographic element set that may be used for a variety of purposes, and particularly for library applications.”

  • Richer than DC, simpler than MARC

  • Does not assume the use of any specific cataloging code

  • Elements: titleInfo, title, name, namePart, originInfo, etc.

Slide15 l.jpg

<?xml version="1.0" encoding="UTF-8"?>

<mods:mods xmlns:mods="">


<mods:title>A Jewel of Honesty</mods:title>



<mods:abstract>clashing oppositions</mods:abstract>




<mods:subject authority="none">

<mods:topic>General interest</mods:topic>


<mods:relatedItem type="host">


<mods:title>Carnegie Newsletter</mods:title>

<mods:title>Celebration a Spectacle of Hope</mods:title>





<mods:roleTerm authority="marcrelator" type="text">author</mods:roleTerm>






<mods:roleTerm authority="chodarr" type="text">recipient</mods:roleTerm>




<mods:extent unit="pages">






<mods:dateIssued encoding="iso8601">19870101</mods:dateIssued>




Slide16 l.jpg

<?xml version="1.0" encoding="UTF-8"?>

<mods:mods xmlns:mods="">


<mods:title>A Jewel of Honesty</mods:title>



<mods:abstract>clashing oppositions</mods:abstract>




<mods:subject authority="none">

<mods:topic>General interest</mods:topic>


<mods:relatedItem type="host">


<mods:title>Carnegie Newsletter</mods:title>

<mods:title>Celebration a Spectacle of Hope</mods:title>





<mods:roleTerm authority="marcrelator" type="text">author




Dcmi collection description l.jpg
DCMI Collection Description

  • Formal description of aggregation or collection of items

  • Can apply to collections where item-level metadata is not available or appropriate, or to collections where it is

  • Sample elements:

    • accrualMethod, accrualPeriodicity

  • Developed as NISO Z39.91

Dublin Core Collection Description Application Profile,

Slide18 l.jpg

  • Resource Description and Access, the successor to AACR2

  • Diane Hillmann’s critique

    • Reliance on transcription and specified sources of information

    • Reliance on unstructured notes

    • Multiple versions in one record

    • Full review at

Slide19 l.jpg

  • XML schema for encoding archival finding aids

  • Contains elements for all aspects of archival description, from <repository> to <daoloc>

  • <archdesc> is the standard tag for describing fonds, series, subseries, etc. hierarchies

Slide20 l.jpg


<head>Summary Description of the Tom Stoppard Papers</head>


<corpname>The University of Texas at Austin

<subarea>Harry Ransom Humanities Research Center</subarea>




<persname source="lcnaf" encodinganalog="100">Stoppard,Tom</persname>


<unittitle encodinganalog="245">Tom Stoppard Papers, </unittitle>

<unitdate type="inclusive">1944-1995</unitdate>

<physdesc encodinganalog="300">

<extent>68 boxes (28 linear feet)</extent>


<unitid type="accession">R4635</unitid>

<physloc audience="internal">14E:SW:6-8</physloc>

<abstract>The papers of British playwright Tom Stoppard (b. 1937 encompass

his entire career and consist of multiple drafts of his plays, from the well-known

<title render="italic">Rosencrantz and Guildenstern Are Dead</title> to several

that were never produced, correspondence, photographs, and posters, as

well as materials from stage, screen, and radio productions from around the



Premis l.jpg

  • Data model

    • Digital objects

    • Intellectual entities

    • Agents

    • Events

    • Rights

    • Relationships

  • Data Dictionary contains examples and sections on compliance and implementation

  • Can be encoded in METS

Metsrights l.jpg

  • Endorsed by METS Board but useful outside of METS documents

  • XML Elements

    • RightsDeclaration

      • RightsHolder

      • Context

        • Permissions

        • Constraint

Slide24 l.jpg

  • METS: Metadata Encoding & Transmission Standard

  • Encodes descriptive, administrative, and structural metadata in one XML file

  • Preferred data structure for digital library initiatives

  • Goals

    • Manage different types of metadata

    • Migrate resources between repositories

Mets community l.jpg
METS Community

  • Maintenance agency is Library of Congress

  • Website


  • Implementation registry

    • Lists 33 projects at 24 institutions

Mets components l.jpg
METS Components

  • METS header

  • Descriptive metadata section

  • Administrative metadata section

  • File section

  • Structural map section

  • Structural link section

  • Behavior section

Filesec l.jpg

  • Lists all files making up the resource

  • <fileLocat> points to files

  • IDs of <file> elements link to pertinent administrative metadata in <amdSec> using the ADMID attribute

Slide28 l.jpg


<mets:fileGrp USE="archive image">

<mets:file ID="epi01m" MIMETYPE="image/tiff">

<mets:FLocat xlink:href="

full/01.tif" LOCTYPE="URL"/>


<mets:file> … </mets:file>


<mets:fileGrp USE="reference image">

<mets:file ID="epi01r" MIMETYPE="image/jpeg">






<mets:fileGrp USE="thumbnail image">

<mets:file ID="epi01t" MIMETYPE="image/gif">







Structmap l.jpg

  • The only required section

  • Defines the hierarchical structure of the resource

  • Can be physical or logical

    • Physical structMaps simply list files in order

      • Pages that make up a book

    • Logical structMaps list files in order but in the context of the intellectural structure of the resource

      • Chapters that make up a book

Slide30 l.jpg

<mets:structMap TYPE="physical">

<mets:div TYPE="book" LABEL="Martial Epigrams II">

<mets:div TYPE="page" LABEL="Blank page">


<mets:div TYPE="page" LABEL="Page ii: Blank page">


<mets:div TYPE="page" LABEL="Page iii: Title page">


<mets:div TYPE="page" LABEL="Page iv: Publication info">


<mets:div TYPE="page" LABEL="Page v: Table of contents">


<mets:div TYPE="page" LABEL="Page vi: Blank page">


<mets:div TYPE="page" LABEL="Page 1: Half title page">


<mets:div TYPE="page" LABEL="Page 2 (Latin)">


<mets:div TYPE="page" LABEL="Page 3 (English)">





Dmdsec l.jpg

  • Contains descriptive metadata

  • Descriptive metadatat can be included or linked externally

  • Descriptive metadata can be in any scheme

  • Can accommodate XML (ex., MODS) or binary (ex., MARC) representations of descriptive metadata

Slide32 l.jpg

<mets:dmdSec ID="DMD1">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="MODS">


<mods:mods version="3.1">




<mods:name type="personal">



<mods:name type="personal">

<mods:namePart>Ker, Walter C. A. (Walter Charles Alan),









Amdsec l.jpg

  • Contains info on digital resource, files in the resource, or original analogue source

  • Type of info

    • Technical

    • Intellectual property

    • Provenance

Slide34 l.jpg

<mets:techMD ID="AMD001">

<mets:mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG"

LABEL="NISO Img.Data">











NYU Press





Mets header l.jpg
mets Header

  • Contains info about the METS document

  • Sample

<metsHdr CREATEDATE="2006-05-09T15:00:00"



<mets:name>Rick Beaubien</mets:name>


<mets:altRecordID TYPE=”LCCN”>20022838</mets:altRecordID>


Structlink l.jpg

  • Adds hyperlinks between elements in a Structural Map

  • Sample


<mets:smLink xlink:from="LINK7" xlink:to="page1145"



<mets:smLink xlink:from="LINK13" xlink:to="page1145”



<mets:smLink xlink:from="LINK36" xlink:to="page113"



<mets:smLink xlink:from="LINK37" xlink:to="page120"




Behaviorsec l.jpg

  • Associates executable behaviors (i.e., computer code) with parts of a document/object

  • Sample


<mets:behavior ID="disp1" STRUCTID="top" BTYPE="display”

LABEL="Display Behavior">

<mets:interfaceDef LABEL="EAD Display Definition"

LOCTYPE="URL" xlink:href=


<mets:mechanism LABEL="EAD Display Mechanism"

LOCTYPE="URN" xlink:href=




Linking between sections l.jpg
Linking Between Sections

  • Can point to <dmdSec>

    • <file>, <stream>, <div>

  • Can point to <techMD>, <rightsMD>, <sourceID>, <digiprovMD>

    • <dmdSec>, <file>, <fileGrp>, <stream>

  • Can point to <file>

    • <fptr>, <area>

  • Can point to <div>

    • <behavior>

Mets profiles l.jpg
METS Profiles

  • METS is so flexible, it needs to be documented for each particular application or use

  • Components

    • URI

    • Date

    • Abstract

    • Extension schemas

    • Rules of description

    • Vocabularies

    • Structural rules for resources

    • Technical metadata

What is the point of all this l.jpg
What is the point of all this?

  • Management of digital resources requires many types of metadata

  • Managing all this metadata can be difficult

  • METS can do it all, but is complex

Functional requirements l.jpg
Functional Requirements

  • What do you expect your metadata to do?

    • The nature of the resources you are putting in your digital collection

    • The nature of the intended audience(s) for your collection

    • The level of description

    • The size of your collection

    • Importance of interoperability

    • The resources your library has for creation and long-term maintenance of the metadata

Nature of resources l.jpg
Nature of Resources

  • Is there full text?

  • Are they “simple” or “complex”?

  • Do you supply multiple versions of the same resource?

  • Are all resources available to all users?

Nature of users l.jpg
Nature of Users

  • Is your audience general or specialized?

  • How information/network literate are they?

  • How much information will they need to choose appropriate resources?

  • What other assumptions can you safely make about your users, and how do those assumptions impact your metadata planning activities?

Level of description l.jpg
Level of Description

  • How much detail do you want to include in your metadata

  • Related to resources available for creation of metadata, and balance of quantity vs. quality

  • Expensive (e.g., subject) vs. cheap (e.g., file size) descriptive elements

Size of collection l.jpg
Size of Collection

  • Small collections rely less on metadata than large collections do

  • Browsing, faceting, and differentiating functions are more important in large collections

  • In general, the bigger the collection, the more granular the values in your metadata needs to be

    • E.g., subject vocabularies

Importance of interoperability l.jpg
Importance of Interoperability

  • Metadata in local schemes is more difficult to share than metadata in standard schemes

  • Always assume your metadata will be used in contexts different from the original

  • Plan metadata with crosswalks in mind

Resources for managing metadata l.jpg
Resources for Managing Metadata

  • How will metadata of various types be created and managed?

  • Does your institution have a DAM strategy?

  • Will preservation metadata (e.g., PREMIS) be managed?

Frbr s user tasks l.jpg
FRBR’s User Tasks

  • Functional requirements can be expressed in terms of the FRBR data model

    • Find entities which correspond to user’s search criteria

    • Identify an entity

    • Select an entity

    • Acquire or obtain access to the desired entity

Analyzing domains l.jpg
Analyzing Domains

  • Environmental

  • Object class

  • Object format

Jane Greenberg, “Understanding Metadata and Metadata Schemas.” In Metadata: A Cataloguer’s Primer. Ed. Richard P. Smiraglia. New York: Haworth. 2005.

Metadata quality l.jpg
Metadata Quality

  • Completeness

  • Accuracy

  • Provenance

  • Conformance to expectations

  • Logical consistency and coherence

  • Timelines

  • Accessibility

Thomas R. Bruce and Dianne I Hillmann, “The Continuum of Metadata

Quality: Defining, Expressing, Expoiting.” In Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.

Slide51 l.jpg



Before lunch l.jpg
Before Lunch

  • Metadata life cycle

  • Strategies for creation and management

  • Automated metadata creation

  • Supplementation strategies

Metadata management life cycle l.jpg






Metadata Management Life Cycle

Strategies for creation and management l.jpg
Strategies for Creation and Management

  • Depend on complexity and completeness of metadata

  • Common strategies

    • Create and manage simple (single type) metadata in one application

    • Create simple metadata in one app and manage in another

    • Create and manage different types of metadata in multiple apps, and combine for use

Premis survey l.jpg
PREMIS survey

  • Most common tool was relational databases

  • XML databases or XML files stored with digital objects

  • Flat files or object-relational databases

  • Most respondents were using two or more of these methods

Creation l.jpg

  • Avoid recreating metadata

  • Metadata can be created

    • At time of resource creation

      • Born digital

      • Digitized

    • After resource creation

  • Primarily a manual task

  • Variety of tools

Raw xlm l.jpg

  • Advantages

    • Provides high level of control

    • Requires simple tools

  • Disadvantages

    • XML makes humans’ heads ache

    • Extremely unforgiving of errors

Greenstone l.jpg

  • Open source repository platform from University of Waikato

  • Provides support for several types of metadata and can export METS

  • Provides Java client (Greenstone Librarians’ Interface, a.k.a. GLI) for metadata production

  • Also provides “plugins” for extracting extracting metadata

Contentdm l.jpg

  • Commercial repository platform from OCLC

  • Provides support for several types of metadata and can export XML that can be converted into METS

  • Provides Windows client for production (Acquisition Station)

  • Also provides a web interface for creating metadta and ingesting content

Slide63 l.jpg

  • Open source “collection manager”

  • Product of the National Science Digital Library

  • Features rich metadata management tools

Alouettecanada l.jpg

  • Metadata Toolkit will provide local management and access

  • Portal will provide centralized access

  • Best practices documents will support creation and management of metadata and content

Alouettecanada metadata toolkit l.jpg
AlouetteCanada Metadata Toolkit

  • A content management system for library, archives, and museum collections

  • Will allow staff to create metadata and manage content

  • Scheme support

    • MODS

    • EAD

    • METS

  • Will allow basic digital assets management

Dam in the toolkit l.jpg
DAM in the Toolkit

  • Tools for managing master and derivative versions of files

  • Tools for creating checksums and managing technical metadata

  • Tools for managing rights tracking

  • Tools for managing administrative metadata

Alouettecanada portal l.jpg
AlouetteCanada Portal

  • Aggregates metadata from participating institutions for centralized searching

  • Points back to Tooklit or whatever else is hosting items

  • Based on the OurOntario Portal

Automated metadata creation l.jpg
Automated Metadata Creation

  • Technical

    • JHOVE, DROID, digitization hardware

  • Descriptive

    • Born-digital document metadata

  • Subject

    • INFOMINE iVia tools

  • Structural

    • Sequential filename generation

Chinese times processing workflow l.jpg
Chinese Times Processing Workflow

  • Line up TIFF image in thumbnail view

  • Create directory with date as name

  • Copy that day’s files into directory

  • Run renamer/metadata creation script

    • Get all files in input dir, create full paths

    • Walk through inputfile list

    • Rename 1st file -01.tif, 2nd file -03.tif, 3rd file -02.tif, etc.

    • Output directory name and metadata file for CONTENTdm

  • Quality control

  • Slide74 l.jpg

    Import directory structure

    Issue-level metadata file for import into CONTENTdm

    Title Date Publisher Rights Description Type Format Language Filename

    Chinese Times, April 1st, 1920 04/01/1920 The Chinese Freemasons Society of Canada Copyright the Chinese Freemasons Society of Canada

    Storage l.jpg

    • Some file formats enable internal storage of metadata

    • For external storage, relational databases offer most flexibility

      • Complex metadata can be stored in simple structures

      • Can handle hierarchical data

    • Are agnostic to other phases in metadata life cycle

    • Not highly scalable for text retrieval

      • External indexers eliminate this problem

    • Can export and import XML, MARC, etc.

    Repurposing l.jpg

    • Different use of metadata than originally intended

    • Often migrated to or imported into an external system

    • Examples

      • Dumping new items lists from ILS for use in external portal

      • Creating MARC records from vendor spreadsheet (demo)

    Sharing l.jpg

    • All metadata should be created to be shared

    • May require exporting, crosswalking, supplementation

    • Basic approaches to sharing: metasearching and harvesting

    • Syntaxes for sharing are easy, semantics for sharing more difficult

    Pkp metadata harvester l.jpg
    PKP Metadata Harvester

    • Open source

    • PHP/MySQL

    • Product of the Public Knowledge Project

    • Features

      • Can harvest any metadata format via OAI

      • Flexible plugin and customization features

      • Defines crosswalks between different schema

    Supplementation strategies l.jpg
    Supplementation Strategies

    • Manually add or update elements

    • Programmatically supplement

    • Add namespaces

    • Virtual supplementation

    Supplementation examples l.jpg
    Supplementation Examples

    • "on the horse" @ Harvard

    • adding namespaces into DC

    • PKP Metadata Harvester

    • CUFTS

      • cufts2marc

      • Subjects and other fields in MARC records in CUFTS

    • Georgia Tech’s Umlaut link resolver

    Example programmatic supplementation l.jpg
    Example:Programmatic Supplementation

    • titles.txt (demo)

    • Possible enhancements

      • Harvest complete record and pick out wanted fields

      • Write local MARC record

      • Add heuristics to dedupe and reduce false hits

    Example add namespaces l.jpg
    Example: Add Namespaces

    Creator: Jane Doe

    Title: Travels in Iceland

    Date: 12/07/2003

    Becomes in OAI-PMH


    <dc:creator>Jane Doe</dc:creator>

    <dc:title>Travels in Iceland</dc:title>



    Example virtual supplementation l.jpg
    Example: Virtual Supplementation

    • Georgia Tech’s Umlaut link resolver

      • SFX ERM data

      • ILS Oracle database for holdings info

      • OCLC's xISBN service for related ISBNs

      • Google and Yahoo APIs for open access material

      • OCLC's Resolver Registry to determine additional link resolver for user’s IP address

    Ross Singer, posting to NGC4LIB list thread “Link resolvers as loosely

    coupled systems for holdings?” September 10, 2007

    Before the afternoon break l.jpg
    Before the Afternoon Break

    • SFU thesis workflow case study

    • Native vs. derived metadata

    • Crosswalks

    Workflow case study sfu electronic theses l.jpg
    Workflow case Study: SFU Electronic Theses

    • Prototyped several ETD services

    • Was developing an institutional repository program

    • Contacted vendors for retro conversion and discovered we could do it ourselves

    • Saw increasing need to process print theses more efficiently

    Goals l.jpg

    • Digitize and provide access to over 4500 SFU theses described in our catalogue

    • Develop efficient current ETD service

    • Add content to SFU’s institutional repository

    • Provide access through both the catalogue and the IR

    • No intent to stop supporting print theses

    Specifications l.jpg

    • Digital versions would be for access only; no need seen to create high-quality masters

    • Theses would be available to all users

    • Metadata should be as rich as possible while remaining efficient to create

    Issues l.jpg

    • Rights Management of retro theses

      • “Fair dealing”

      • Use of PDF’s security features

    • Developing efficient workflows for processing current theses

    • Standardization of descriptive metadata

    • Technical issues

      • Dirty OCR and specialized symbols

      • Challenging source documents

    Workflows l.jpg

    • Current (December 2004 - )

      • Digitization

      • Metadata

    • Retrospective (1967 – 1997)

      • Digitization

      • Metadata

    Workflow for current theses l.jpg
    Workflow for Current Theses

    • Thesis Assistant provides master list in MS Excel when previous semester’s submissions “closed”

    • Digitization staff scan unbound copies directly into Adobe Acrobat

      • Filenaming scheme: Unique ID assigned manually

    • Systems staff convert metadata

    • Systems staff import into DSpace

    • Systems staff create MARC in batch

    • Tech Services load into library catalogue

    Slide96 l.jpg



    ### Main program ###













    Thesis Assistant’s spreadsheet

    with temporary thesis ID added

    (Filenames correspond

    to temp. theses IDs)

    DSpace import metadata

    and packages

    LDR 00747nas 2200157za 4500

    005 20040903164118.1

    006 m d d |

    007 cr u||||||||||

    008 040903||||||||||||||||||||d|||||||||||||

    100 00 _aSmith, Student P.

    245 00 _aThe title: _bcontaining some catchy words

    856 04 _u



    ### Main program ###





    Brief MARC records


    Metadata Workflow for Current (Dec 2004 - ) Theses

    DSpace import utility

    thesisID1 1892/99

    thesisID2 1892/100

    thesisID3 1892/101

    Dspace map file

    MARC 856:

    Slide97 l.jpg


    <dcvalue element="contributor" qualifier="author">

    Henderson, Brian Charles</dcvalue>

    <dcvalue element="title" qualifier="none">

    Operational effectiveness in cellulose fibers business

    of Weyerhaeuser Company: can the cost trends of 2005

    be reversed?</dcvalue>

    <dcvalue element="date" qualifier="issued">2006</dcvalue>

    <dcvalue element="language" qualifier="iso">en</dcvalue>

    <dcvalue element="rights" qualifier="none">Copyright remains

    with the author</dcvalue>

    <dcvalue element="type" qualifier="none">text</dcvalue>

    <dcvalue element="type" qualifier="none">thesis</dcvalue>

    <dcvalue element="description" qualifier="none">Research

    Project (M.B.A.) - Faculty of Business Administration –

    Simon Fraser University</dcvalue>

    <dcvalue element="description" qualifier="abstract">

    The Cellulose Fibers Business of Weyerhaeuser

    Company [...] </dcvalue>

    <dcvalue element="relation“




    Slide99 l.jpg

    LDR 00000nam 2200000Ia 4500

    006 m||||||||d||||||||

    007 cr||n||||||d||

    008 070823s2006||||bcc||||||m||||||||||eng||

    035 _fgb

    040 _aCaBVas


    100 1 _aBuckham, Catherine Anne

    245 10 _aPublic participation in land use planning:

    _bWhat is the role of social capital? /

    _cby Catherine Anne Buckham

    300 _a leaves

    260 _aBurnaby B.C. :

    _bSimon Fraser University,


    500 _aTheses (Urban Studies Program) / Simon Fraser University

    502 _aResearch Project (M.U.S.) - Simon Fraser University, 2006

    520 3 _aThis study examines […]

    810 2 _aSimon Fraser University.

    _tTheses (Urban Studies Program)

    856 41 _u

    966 _c2




    967 _c0

    Workflow for retro theses l.jpg
    Workflow for Retro Theses

    • Master production list derived from MARC records in catalogue

      • Filenaming scheme based on ILS bib record number

    • Digitization staff

      • Scan from microfiche and print copies

      • Remove signatures from approval pages manually

      • Create PDFs from page images

    Slide102 l.jpg



    Check hard drive space

    Create working directory





    Scan printed theses



    Perform batch scanning

    Please refer to flatbed scanning instructions

    Image processing

    Poor quality



    Good quality

    PDF conversion




    Courtesy of Ian Song,

    Digital Initiatives



    Slide103 l.jpg

    Metadata Workflow Retrospective (1966 - 1997) Theses

    LDR 00747nas 2200157za 4500

    005 20040903164118.1094254879.1

    006 m d d |

    007 cr u||||||||||

    008 040903||||||||||||||||||||d|||||||||||||

    100 00 _aSmith, Student P.

    245 00 _aThe title: _bcontaining some catchy words



    ### Main program ###













    (Filenames correspond

    to III .bnumbers)

    MARC records from III

    DSpace import metadata

    and packages

    DSpace import utility



    ### Main program ###




    b18721102 1892/204

    b18762105 1892/205

    b14731140 1892/1206

    035 .b18721102

    856 04 _u


    Dspace map file

    Brief MARC records containing .bnumber

    and 856 field for overlaying on existing


    MARC 856:


    Interoperability l.jpg

    • The ability of one system to communicate with another

    • Can exist on various levels

      • Low-level protocols like TCP/IP

      • High-level like metadata

    • Examples relevant to digital repositories

      • Dublin Core within METS document

      • OAI-PMH

    • Syntactic and semantic interoperability

    How much interoperability l.jpg
    How Much Interoperability?

    • Will your collection be integrated into / linked to a larger one?

    • How important is internal consistency within your collections?

    • Best practices encourage interoperability

    • (Qualified) Dublin Core is safe choice

    Crosswalks l.jpg

    • Mappings for converting one schema to another

    • DC to MARC, DC to MODS, MARC to MODS, etc

    • Promote reuse, interoperatbility

    • Sample list

    Lossy and lossless crosswalks l.jpg
    Lossy and Lossless Crosswalks

    • Lossy: crosswalk removes granularity

    • Lossless: no loss of granularity

    • Dummy down vs. smarten up

    • Acid test: round trip a data set

    Native vs derived metadata l.jpg
    Native vs. Derived Metadata

    • Moving metadata from one container to another

    • Crosswalks document correspondences

    • Deriving metadata is part of sharing and reuse

    Example alouette metadata toolkit l.jpg
    Example: Alouette Metadata Toolkit DC.Subject

    • Metadata is stored internally in a relational database and as raw XML files

      • element ID, element ID eelation, info object ID, culture, element, schema, value

      • Attributes are also rows in same

    • It is exported as METS and EAD files

    Slide112 l.jpg

    Second DC.Subject


    Before end of day l.jpg
    Before End of Day DC.Subject

    • Application Profiles

    • OAI-PMH

    Application profiles l.jpg
    Application Profiles DC.Subject

    • A set of metadata elements, policies, and guidelines defined for a particular community or implementation

    • Obligation, legal qualifiers and values, best practice

    • CEN (European Committee For Standardization) CWA 14855

    • Examples

      • CanCore

      • DCMI Library Application Profile

      • DCMI Education Application Profile

      • OhioLINK Digital Media Center (DMC) Metadata Application Profile

    Why are profiles necessary l.jpg
    Why are Profiles Necessary? DC.Subject

    • Among 82 OAI data providers, 71% used only 5 elements (creator, identifier, title, date, and type)

    • 54% of providers used only creator and identifier for over half their records

    Jewel Ward, “Unqualified Dublin Core Usage in OAI-PMH Data Providers” OCLC

    Systems And Services 20.1 (2004), 40-47.

    Invent or borrow l.jpg
    Invent or Borrow? DC.Subject

    • Avoid inventing; borrow instead

    • Overhead of maintaining your own schema

    • Is your material so special?

    • Borrow properties (fields, elements), put effort into values

    • Document and give back your application profile

    Slide118 l.jpg
    OAI DC.Subject

    • OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting

    • Harvesting, not resource discovery

    • Uses standard Web protocols

    Oai pmh model l.jpg

    Verbs DC.Subject


    OAI-PMH Model

    Data providers

    expose metadata

    Service providers

    harvest metadata

    and do something

    useful with it

    Examples of verbs l.jpg
    Examples of Verbs DC.Subject

    • verb=ListSets

    • verb=ListRecords&set=cartoons


    • verb=ListRecords




    Selective harvesting l.jpg
    Selective harvesting DC.Subject

    • Sets

      • Used for grouping items

      • May be flat or hierarchical

        • province:british+columbia

        • Type:Reports

    • Datestamps

      • Uses Coordinated Universal Time

      • “from” and “until” arguments

        • verb=…from=2003-01-15Z

    Harvest store repurpose l.jpg

    OIA Rep DC.Subject

    OIA Rep

    OIA Rep

    OIA Rep

    Some other



    New this week

    Harvest, Store, Repurpose

    Harvester /

    Aggregator /

    Data store

    Metadata sharing case study carl harvester and carlcore ap l.jpg
    Metadata Sharing Case Study: DC.SubjectCARL Harvester and CARLCore AP

    • “Canadian Association of Research Libraries / Association des bibliothèques de recherche du Canada's Institutional Repository Metadata Harvester”


    • Launched June 2004

    • Now contains 35,000+ records

    • Primarily a search engine for the harvested metadata

    • Uses the PKP Metadata Harvester software

    Repositories l.jpg

    Archimede Université Laval DC.Subject

    Collection mémoires et thèses de l'Université Laval

    [email protected]

    eCommons::Research (University of Winnipeg)

    Mspace (University of Manitoba)

    Ozone (Ontario Scholars Portal)

    Papyrus - Dépôt institutionnel numérique (Université de Montréal)

    Simon Fraser University Institutional Repository

    T-Space (University of Toronto)

    University of Saskatchewan Electronic Theses & Dissertations

    University of Waterloo Electronic Theses



    The problem l.jpg
    The Problem DC.Subject

    • Increased dissatisfaction with search capabilities

    The Solutions

    • Improvements to the software

    • Development of an application profile

    Goals130 l.jpg
    Goals DC.Subject

    • Develop a profile that

      • Improves quality of aggregated metadata

      • Is practical

      • Is voluntary

    • Benefits include

      • Better centralized services

      • Streamlined local practices

      • Guidance for new repositories

    Working group l.jpg
    Working Group DC.Subject

    • Mark Jordan (SFU), Chair

    • Sam Kalb (Queen’s)

    • Lynne McAvoy (CISTI)

    • Lisa O’Hara (Manitoba)

    • Sharon Rankin (McGill)

    • Kathleen Shearer (CARL)

    • Nancy Stuart (Victoria)

    Process l.jpg
    Process DC.Subject

    • Analyze the metadata (from June 2005)

    • Develop use cases and functional requirements

    • Survey other application profiles

      • ePrints UK “Using Simple Dublin Core to Describe Eprints”

      • “ARROW Discovery Service Harvesting Guide”

    Timeline past l.jpg
    Timeline (past) DC.Subject

    • October 2004: Proposal to develop AP

    • April 2005: Formation of mailing list

    • September 2005: Meeting in Ottawa

    • March 2006: Formation of AP working group

    • June 2006: Meeting in Québec

    • October 2006: CARLCore Level 1 available for comment

    Timeline future l.jpg
    Timeline (future) DC.Subject

    • November 10, 2006: Deadline for comments

    • January 31, 2007: Final release

      • IR platform-specific implementation guidelines

      • French translation

    • Ongoing: CARLCore Level 2

    Carlcore ap l.jpg
    CARLCore AP DC.Subject

    • Document is a standard application profile

    • Containing…

      • Rationale

      • General principles and recommendations

      • Entries for each uDC element

      • Appendices

        • Implementation guidelines

        • Sample records

        • CARLCore and the CARL Harvester

    Carlcore level 1 l.jpg
    CARLCore Level 1 DC.Subject

    • Uses only unqualified Dublin Core

    • Goal is to make use of the DC elements in OAI as consistent as possible

    • From the “Principles”:

      CARLCore Level 1 parallels the Dublin Core Metadata Element Set in order to supply the richest and most consistent metadata possible within the minimum requirements of the Open Archives Initiative Protocol for Metadata Harvesting.

    Sample elements l.jpg
    Sample Elements DC.Subject

    • Identifier

    • Source

    • Type

    Handling local variations l.jpg
    Handling local variations DC.Subject

    • Top-down approach

      • Dictate shared vocabulary

    • Bottom up approach

      • Provide solution for accommodating both local and centralized needs

    Type map solution l.jpg
    “Type map” solution DC.Subject

    • Harvester uses a “map file” to convert local type values into shared vocabulary

    • Simple XML format

    • Each repository administrator maintains the map file

    • End result is that metadata is processed while being harvested

    Slide145 l.jpg

    dissertation DC.Subject





    Local repository


    Slide146 l.jpg

    <mappings> DC.Subject

    <mapping from=" " to="Actes de conférence / Conference Proceedings" />

    <mapping from=" " to="Article" />

    <mapping from=" " to="Audio" />

    <mapping from=" " to="Carte, plan / Map, plan" />

    <mapping from=" " to="Chapitre de livre / Book chapter" />

    <mapping from=" " to="Communication, présentation / Paper, Presentation" />

    <mapping from=" " to="Ensemble de données / Dataset" />

    <mapping from=" " to="Image" />

    <mapping from=" " to="Livre / Book" />

    <mapping from=" " to="Logiciel / Software" />

    <mapping from=" " to="Mémoire de maîtrise / Master's thesis" />

    <mapping from=" " to="Objet d'apprentissage / Learning Object" />

    <mapping from=" " to="Partition musicale / Musical Score" />

    <mapping from=" " to="Pré-publication / Preprint" />

    <mapping from=" " to="Rapport / Report" />

    <mapping from=" " to="Thèse de doctorat / Doctoral dissertation" />

    <mapping from=" " to="Vidéo / Video" />

    <mapping from=" " to="Autre / Other" />


    Carlcore level 2 l.jpg
    CARLCore Level 2 DC.Subject

    • Will add elements to CARLCore Level 1

    • One existing goal is to provide faceted discipline browsing

      • Using OAI sets?

      • Using one ore more non uDC elements?

    • May focus on disciplinary archives

    • Other features leading to “added value” for users

    Implementation issues l.jpg
    Implementation Issues DC.Subject

    • Legacy metadata

    • Conflicts with local IR metadata practice

    • Inflexible OAI gateways in IR platforms

    • Lack of tools to test compliance

    • Yes, using CARLCore is optional… but there is strength in numbers

    Carlcore to do list l.jpg
    CARLCore To Do List DC.Subject

    • Take advantage of PKP Harvester’s data normalization features

    • CARLCore Level 2

    • Stay current with (and collaborate with) IR platforms

    Summary l.jpg
    Summary DC.Subject

    • Metadata requirements for repositories drive decisions

    • Do not reinvent the wheel — instead, adopt or develop an application profile

    • Metadata must be managed

    • Tools should not define your ability to manage your metadata

    • Metadata can be shared

    Recommended online reading l.jpg
    Recommended Online Reading DC.Subject

    • METS Primer and Reference Manual.

    • DCMI Proceedings.

    • Understanding Metadata. NISO, 2004.

    Recommended print reading l.jpg
    Recommended Print Reading DC.Subject

    • Library Technology Reports: Metadata and Its Applications. Ed. Brad Eden. 41.6: November-December 2005.

    • Metadata: A Cataloguer’s Primer. Ed. Richard Pl Smiraglia. New York: Haworth. 2005.

    • Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.