Digital preservation tools for repository managers
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Digital Preservation Tools for Repository Managers PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

Digital Preservation Tools for Repository Managers. A practical course in five parts Revision with Steve Hitchcock. By Chris Blakeley. A rapid recap of tools from the KeepIt course : what they do, what they look like, what we did with them. Tools Module 1.

Download Presentation

Digital Preservation Tools for Repository Managers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Digital preservation tools for repository managers

Digital Preservation Tools for Repository Managers

A practical course in five parts

Revision with Steve Hitchcock

By

Chris Blakeley

A rapid recap of tools from the KeepIt course:

what they do, what they look like, what we did with them


Tools module 1

Tools Module 1

  • The Data Asset Framework (DAF), Sarah Jones, University of Glasgow, and Harry Gibbs, University of Southampton

  • The AIDA toolkit: Assessing Institutional Digital Assets, Ed Pinsent, University of London Computer Centre


Themes addressed in daf surveys

Themes addressed in DAF surveys

  • Data: type / format, volume, description, creator, funder

  • Creation: policy, naming, versioning, metadata & documentation

  • Management: storage, backup, roles and responsibilities, planning

  • Access: restrictions, rights, security, frequency, ease of retrieval, publish

  • Sharing:collaborators, requirements to share, methods, concerns

  • Preservation: selection / retention, repository services, obsolescence

  • Gaps / needs: services, advice, support, infrastructure


The methodology

The methodology

http://www.data-audit.eu/DAF_Methodology.pdf


Digital preservation tools for repository managers

How would you scope:1) the range of data being created at your institution? 2) user expectations / requirements on the repository to help manage and preserve those data?

  • What would you want to find out?

    • what would your key questions be?

  • How would you go about collecting information?

  • How would you ensure participation?


Relevance to this course

Relevance to this Course

  • AIDA can…

    • Measure your ability to manage digital content effectively

    • Show how good you are sustaining continued access

    • Be directly relevant to managing a repository (access, sharing, and usage)

    • Helps you find out where you are

    • Help you decide what to do next


Exercise

Exercise

  • Divide into four teams

  • One element from each leg, relating to one activity

  • Agree on the scope of what you will assess - work on a single Institution (real or imaginary)

  • Assess the capacity for this activity

  • Expected results:

    • A score for the element in each leg and at each level (6 scores in all)

    • Explain why you arrived at that decision

    • Roles / job titles of people consulted

    • Outline evidentiary sources that might help


Tools module 2

Tools Module 2

  • Keeping Research Data Safe (KRDS), Costs, Policy, and Benefits in Long-term Digital Preservation, Neil Beagrie, Charles Beagrie Ltd consultancy

  • LIFE3: Predicting Long Term Preservation Costs, Brian Hole, The British Library


What was produced

What was Produced?

  • A cost framework consisting of:

    • activity model in 3 parts: pre-archive, archive, support services

    • Key cost variables divided into economic adjustments and service adjustments

    • Resources template for Transparent Costing (TRAC)

  • 4 detailed case studies (ADS, Cambridge, KCL, Southampton)

  • Data from other services.


Benefits framework

Benefits Framework


Group exercise

Group Exercise

  • Agree a spokesperson and “recorder”

  • Using KRDS2 Benefits Taxonomy:

    • Q1 Identify which benefits can be costed?

    • Q2 Select 3 Key benefits (include costed and uncosted)

    • Q3 Identify the information you might need for measuring them

  • Report back at 12.10 !


Life 3 estimating preservation costs

Content Profile

Cost

Estimation

Tool

Predicted Lifecycle Cost

Organisational Profile

Context

LIFE3: Estimating preservation costs

  • The LIFE3 Project:

    • Aim: To develop the ability to estimate preservation costs across the digital lifecycle

    • The Project is developing:

      • A series of costing models for each stage and element of the digital lifecycle

      • An easy to use costing tool

      • Support to enable easy input of data

      • Integration to facilitate use of the results


Life 3 costing tool outputs estimated costs

LIFE3 costing tool outputs – estimated costs

Lifecycle Stage

Creation or Purchase

Acquisition

Ingest

Bit-stream Preservation

Content Preservation

Access

Lifecycle Elements

....

Selection

Quality Assurance

Repository Admin

Preservation Watch

Access Provision

....

Submission Agreement

Metadata

Storage Provision

Preservation Planning

Access Control

....

IPR & Licensing

Deposit

Refreshment

Preservation Action

User Support

....

Ordering & Invoicing

Holdings

Update

Backup

Re-ingest

Obtaining

Inspection

Disposal

Reference Linking

  • Check-in


Exercise1

Exercise

  • Excel model

    • The Content Profile

    • Refining the calculations

  • Feedback

    • Do you feel that this approach is sound?

    • Have we included all relevant factors?

    • Is the model suitable for the kind of content your repository deals with?

    • Are we making correct assumptions, and is it clear what these are?

    • How could we improve it?


Tools module 3

Tools Module 3

  • Significant characteristics, Stephen Grace and Gareth Knight, King’s College London

  • PREMIS, Open Provenance Model


Digital preservation tools for repository managers

Preservation workflow

Check

Analyse

Action

  • Format identification, versioning

  • File validation

  • Virus check

  • Bit checking and checksum calculation

  • Tools

  • e.g. DROID

  • JHOVE

  • FITS

Preservation planning

Characterisation:

Significant properties and technical characteristics, provenance, format, risk factors

Risk analysis

Tools

Plato (Planets)

PRONOM (TNA)

P2 risk registry (KeepIt)

INFORM (U Illinois)

KB

  • Migration

  • Emulation

  • Storage selection


A group task on format risks

A group task on format risks

Choose two formats to compare (e.g. Word vs PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG)

By working through the (surviving) list of format risks select a winner (or a draw) between your chosen formats for each risk category (1 point for win)

Total the scores to find an overall winning format

Suggest one reason why the winning format using this method may not be the one you would choose for your repository


Digital preservation tools for repository managers

Determine expected behaviours

  • What activities would a user – any type of stakeholder – perform when using an email?

  • Draw upon list of property descriptions performed in the previous step, formal standards and specifications, or other information sources.

    Task 2:

    Identify the type of actions that a user would be able to perform using the email (Groups. 15 mins).

  • E.g. Establish name of person who sent email

  • E.g. May want to confirm that email originated from stated source.


Exercise overview

Exercise overview

  • Analyse the content of an email

    • Analyse structure of email message

    • Determine purpose that each technical property performs

  • Consider how email will be used by stakeholders

    • Identify set of expected behaviours

    • Classify set of behaviours into functions for recording


Jhove demo

JHOVE Demo


Define sample objects

Define Sample Objects


Some revision from keepit module 3

Some revision from KeepIt Module 3

  • Preservation workflow

    • Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

  • Significant properties

    • We considered which characteristics might be significant using the function-behaviour-structure (FBS) framework, and classifying the functions of formatted emails

    • We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

  • Documentation

    • We looked at two means to document these characteristics, and the changes over time

    • Broad and established (PREMIS)

    • Focussed, and work-in-progress (Open Provenance Model)

  • Provenance in action: transmission and recording

    • Through a simple game we learned that if we don’t recognise the necessary properties at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with


Tools module 4

Tools Module 4

  • Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton

  • Plato, preservation planning tool from the Planets project, Andreas Rauber and Hannes Kulovits, TU Wien


Hybrid storage policies

Hybrid Storage Policies


Eprints storage manager

EPrints Storage Manager


Digital preservation tools for repository managers

Risk Analysis In EPrints

Risk Analysis

Preservation - Analyse

EPrints File Classification + Risk Analysis


Digital preservation tools for repository managers

Risk Analysis In EPrints

Migration?

Transformation?

Preservation - Action

Mock up Transformation Interface

Migration Tools

Tool Preservation Level

PPT -> PPTX

PPT -> PDF


Viewing high risk objects

Viewing high-risk objects


Exercise eprints adding at risk image collection

Exercise: EPrintsAdding ‘at risk’ image collection


Preservation planning

Preservation Planning


Digital preservation tools for repository managers

Preservation Planning with Plato

Plato

  • Assists in analyzing the collection

    • Profiling, analysis of sample objects via Pronom and other services

  • Allows creation of objective tree

    • Within application or via import of mindmaps

  • Allows the selection of Preservation action tools


Digital preservation tools for repository managers

Preservation Planning with Plato

Plato

  • Runs experiments and documents results

  • Allows definition of transformation rules, weightings

  • Performs evaluation, sensitivity analysis,

  • Provides recommendation (ranks solutions)


Exercise time the scenario

Exercise Time! The Scenario

  • National library

  • Scanned yearbooks archive

  • GIF images

  • The purpose of this plan is to find a strategy on how to preserve this collection for the future, i.e. choose a tool to handle our collection with.

  • The tool must be compatible with our existing hardware and software infrastructure, to install it within our server and network environment.

  • The files haven't been touched for several years now and no detailed description exists. However, we have to ensure their accessibility for the next years.

  • Re-scanning is not an option because of costs and some pages from the original newspapers do not exist anymore.


Exercise eprints adding at risk image collection1

Exercise: EPrintsAdding ‘at risk’ image collection


Exercise plato eprints plan migrate review

Exercise: Plato-EPrintsPlan-migrate-review


Tools module 5

Tools Module 5

  • TRAC, Trusted Repository Audit and Certification: criteria and checklist

  • DRAMBORA, Digital Repository Audit Method Based On Risk Assessment, Martin Donnelly, Digital Curation Centre, University of Edinburgh


Trustworthy repositories audit certification trac criteria and checklist

Trustworthy Repositories Audit & Certification (TRAC) Criteria and Checklist

  • RLG/NARA assembled an International Task Force to address the issue of repository certification

    • TRAC is a set of criteria applicable to a range of digital repositories and archives, from academic institutional preservation repositories to large data archives and from national libraries to third-party digital archiving services

  • Provides tools for the audit, assessment, and potential certification of digital repositories

  • Establishes audit documentation requirements required

  • Delineates a process for certification

  • Establishes appropriate methodologies for determining the soundness and sustainability of digital repositories


Trac criteria checklist

TRAC Criteria Checklist

  • Within TRAC, there are 84 individual criteria

Only 82 criteria to go!


To certify or not to certify that is the question

To certify or not to certify?That is the question

  • Take a spreadsheet with all 84 TRAC criteria.

  • Select one.

  • Decide whether you could certify your repository for this, based on where your repository is now or where you think it might be after participating in this course.

by Cayusa

by fabiux


Drambora method

DRAMBORA Method

  • Discrete phases of (self-)assessment, reflecting the realities of audit

  • Preservation is fundamentally a risk management process:

    • Define Scope

    • Document Context and Classifiers

    • Formalise Organisation

    • Identify and Assess Risks

  • Builds audit into internal repository management procedures


Repository administration

Repository Administration


Part i identify a risk 30 minutes

Part I – Identify a risk (30 minutes)

Each group should identify one risk (based on your own

experiences wherever possible), and complete the

DRAMBORA worksheet.

Groups should complete:

  • name and description of the risk;

  • example manifestations of the risk;

  • nature of the risk;

  • risk owner(s);

  • stakeholders who would be affected;

  • if possible, relationships with other risks.


Part ii mitigate the risk 30 minutes

Part II – Mitigate the risk (30 minutes)

Now identify what steps your archive might take to manage and mitigate the identified risk over time…

Each group should complete:

Risk management strategy/-ies;

Risk management activities;

Risk management activity owner(s).


  • Login