Best Practices in Developing a Digital Library Repository Using Fedora at the UVa Library

1. Best Practices in Developing a Digital Library Repository Using Fedora at the UVa Library Leslie Johnston Head of Digital Access Services, University of Virginia Library Fedora Users� Meeting, Open Repositories 2007

2. Underlying Object Architecture Must be First The object architecture guides all repository development steps. Based on core architectural and service assumptions, such as these at UVa: We have complex objects with many relationships. We have simple objects that users just want to find and use. Any given resource can be associated with and presented in any number of contexts. The Repository will have a public interface to support discovery and use of the collections by the UVa community, accompanied by tools for their use in instruction and research.

3. Underlying UVa Object Architecture At UVA, an atomic or granular approach: Media Objects (art and architecture images, page images, video, audio) are the most granular, and are always children of work objects. The datastreams are metadata and media files Work Objects represent a logical object or concept (a visual resources concept of site or work, a text volume, an episode of a television news show, an oral history interview). Work Objects provide the principal mode of discovery for Media Objects. Work Objects are not required to have Media Object children. The datastreams are metadata and inline content, such as TEI transcriptions and EAD finding aids A Work Object can be a child of an Aggregation Object, representing a real or virtual collection, a serial publication, a television news series, etc. The datastreams are metadata for the aggregation set and rules specific to the set used to present all or subsets of its members.

4. Underlying UVa Object Architecture Many relationships are possible: Media Objects can be the children of multiple Work Objects. A drawing of a building in a letter can be part of a visual resources object representing the structure, as well as the child of a text object for the letter). Work Objects may be part of many Aggregation Objects. A painting may be part of a collection of work by that one artist, and part of a virtual collection representing a particular subject theme. Aggregation Objects may be the children of other Aggregation Objects, where appropriate.

5. Disseminators in a Granular Architecture Media Objects have disseminators that primarily provide access to the datastreams. Examples: The uvaDefault disseminator includes a getDefaultContent behavior which gets whichever datastream is considered the default view for a user. The uvaImage disseminator includes the behaviors getPreview and getScreen, which simply get specific datastreams. It also includes a getImageViewer behavior which gets an image datastream and wraps it in an applet that allows manipulation of the image. The uvaDefault and uvaMeta disseminators include behaviors that return xml (get) or styled html versions (view) of descriptive or administrative metadata, or Dublin Core.

6. Disseminators in a Granular Architecture Work Objects have disseminators that provide access to the work and its media children. Examples: The default disseminator includes the getFullView behavior which returns a styled presentation of a text and its child page images, if there are any. The uvaPageBook disseminator includes the getPageTurner behavior which gets the child images and presents them for navigation in a page turner. The uvaDefault and uvaMeta disseminators include behaviors that return xml (get) or styled html versions (view) of descriptive or administrative metadata, or Dublin Core.

7. Disseminators in a Granular Architecture Aggregation Objects primarily have disseminators that provide access to a set of Work Objects. Examples: getCollection presents the default view of the collection. getMembers returns a list of members � all or only those which meet criteria expressed through an XPath statement. getPids returns a list of all PIDS for member Work Objects. viewDublin Core presents styled Dublin Core data for the aggregation.

8. Content Model Development Not surprisingly, the first step is the identification of functional requirements: Coordinated in a format-by-format basis involving research and development staff, repository operations staff, and public services staff who are specific format experts. Identify all functional requirements for end-user use, including default delivery options, interaction with the objects through the web interface, and downloading possibilities.

9. Content Model Development Inventorying of digital media: Detailed review of the legacy collections as to media formats, configurations of media files, available metadata, and metadata format.

10. Content Model Development Development of production standards: Production standards are developed out of the inventory of legacy materials and metadata and the wish list for functional requirements, where we identify the appropriate formats and configuration of files to enable the desired functionality. The standards must jibe with what we have, what we can migrate legacy collections to, and generate in a scaleable way in all future production. http://www.lib.virginia.edu/digital/reports/uvalib_production_standards.htm http://www.lib.virginia.edu/digital/metadata/

11. Content Model Development Translation of the output of all those processes into content models and disseminators: The key to developing content models is limiting the number of content models you will have for each format class, such as images or electronic texts or video. Multiple content models for each format class are often needed. Each content model for each format class corresponds to a necessary variation of the configuration of media files, which require different mechanisms for the behaviors and therefore constitute a different content model. All content models have a default disseminator, and metadata disseminator, and an asset definition disseminator, plus disseminators that are specific to the media type.

12. Content Model Development Translation of functional specifications into content models and disseminators: Each functional requirement for the end-user interface translates to one or more disseminators with behaviors that objects should be able to present, such as delivering subsets of their content or metadata, delivery of static files or on-the-fly transformations (such as raw XML delivery versus delivery of XHTML generated from the same XML file), or supporting the download of image files. Great care was needed to confirm that the desired functionality for the search and delivery in the public interface matched behaviors in the content models, and that the correct number and types of media file and metadata datastreams were present to support those behaviors.

13. UVa Content Models Media Objects: Bitonal 1 Bitonal TIFF datastream 1 static JPEG thumbnail datastream HighRes 1 MRSID datastream 1 static JPEG screen size datastream 1 static JPEG thumbnail datastream LowRes 1 JPEG screen size datastream 1 JPEG thumbnail datastream Work Objects: GDMS (image metadata for works) EAD (metadata describing archival collection) GenText (TEI transcription only) Book (TEI transcription and page images) PageBook (page images only) Aggregation Objects � Still under review for final content models http://www.lib.virginia.edu/digital/reports/content_models.htm

14. Supporting it all: Production Workflows Translation of the production standards and configuration of content models into scaleable, sustainable workflows. Coordinated by research and development staff, repository operations staff, and production staff.

15. Production Workflows Subject librarians select all content, with the additional step of a technical assessment. Assessment includes review of media formats and metadata against production standards. After the assessment, new production and legacy collection migration are prioritized. Ease of production, e.g., existing workflows in place Need for metadata enrichment or normalization Time constraints, such as instructional deadlines Funding availability for staff and infrastructure Workflows are in place for Images, Electronic Texts, and EAD Finding Aids. This includes all steps for handoffs of materials for digitization, digitization, cataloging and export of cataloging records from native systems to XML, programmatic creation of administrative and technical metadata, QA, handoff to repository operations, object building and ingest, and indexing for delivery.

18. Ongoing Process Review and revision of what we�ve done � architecture, standards, and disseminators. Don�t be afraid to assess and update. Still adding new format types � video, datasets, and printed music are under way, with audio and GIS next.

Best Practices in Developing a Digital Library Repository Using Fedora at the UVa Library

Best Practices in Developing a Digital Library Repository Using Fedora at the UVa Library

Presentation Transcript

Developing Digital Literacy at Your Library

Developing a Digital Library for the Humanities

Best Practices with Enterprise Library

A Digital Library Repository Utilizing the Open Archives Initiative

Best Practices for Library Publishing: the Library Publishing Toolkit

Digital Library Technologies at the Grainger Library

Fedora Repository

Using the Library

Fedora Commons Educational Digital Library Projects

The digital library

Adventures in Digital Asset Management: Fedora at the National Library of Wales

Fedora TM and Repository Implementation at UVa

The Fedora Digital Repository Project and the National Science Digital Library (NSDL)

Using OCMS Digital Library

UVA Digital Library Assumptions:

DEVELOPING AN INSTITUTIONAL REPOSITORY USING GREENSTONE DIGITAL LIBRARY (GSDL) SOFTWARE

Fedora at the National Library of Scotland

Digital Library Network at Developing Countries Case Study: The Indonesian Digital Library Network

Indiana Library Federation Best Practices for Supervised Visits in the Library

USING THE LIBRARY

DEVELOPING AN INSTITUTIONAL REPOSITORY USING GREENSTONE DIGITAL LIBRARY (GSDL) SOFTWARE