Dspace gets real technical metadata
1 / 16

BitstreamFormat Renovation: - PowerPoint PPT Presentation

  • Uploaded on

DSpace Gets Real Technical Metadata. BitstreamFormat Renovation:. BitstreamFormat Renovation Prototype. Benefits (Why Should I Care?) ‏. The new format identifier corrected or fixed unidentified data formats of 858 Bitstreams in DSpace@MIT. How many are mis-identified in your repository?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' BitstreamFormat Renovation:' - maxine-romero

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dspace gets real technical metadata

DSpace Gets Real Technical Metadata

BitstreamFormat Renovation:

Benefits why should i care

BitstreamFormat Renovation Prototype

Benefits (Why Should I Care?)‏

  • The new format identifier corrected or fixed unidentified data formats of 858 Bitstreams in DSpace@MIT. How many are mis-identified in your repository?

  • Accurate MIME-types improve delivery to Web clients

  • Quality preservation requires accurate data format knowledge

  • Interoperability with internal and external tools relies on correct technical metadata in commonly-recognized standards

  • Without automated tools, maintenance of format technical metadata is a tedious manual job for repository managers

Data formats

BitstreamFormat Renovation Prototype

Data Formats

A “Data Format” is defined as:

Technical Metadata that describes how abstract information is encoded and structured in a digital document.

“Abstract Information” refers to the actual intellectual content contained in the digital object.

Problems with current format technical metadata

BitstreamFormat Renovation Prototype

Problems with Current Format Technical Metadata

  • Formats are identified with arbitrary names; that hinders interoperability

  • No means to collect additional format technical metadata, e.g. format specification documents.

  • Identifying formats only by filename extension is imprecise and unreliable

  • Current internal format model is inflexible


BitstreamFormat Renovation Prototype


  • Format Registry

    • PRONOM

    • GDFR

  • Identification Plugins

    • DROID

    • JHOVE

  • Interoperable Format Identifiers

    • MIME Type

    • PUID (PRONOM Unique IDentifier)‏

Object model architecture connected to pronom

BitstreamFormat Renovation Prototype

Object-Model Architecture Connected to PRONOM

Bitstreamformat renovation

BitstreamFormat Renovation Prototype

Identification of Two BitstreamFormat Types

Bitstreamformat renovation

BitstreamFormat Renovation Prototype

The Local “DSpace” and Provisional Registries

Interface to external registries

BitstreamFormat Renovation Prototype

Interface to External Registries

  • Get Synonyms

    • Returns a list of identifiers that are bound to the same format record

  • Import

    • Turns an external format description into a new BitstreamFormat entry, initializing its metadata fields from the external registry

  • Update

    • Refresh the metadata fields of a BitstreamFormat to keep up with changes

  • ConformsTo

    • Tests whether the format described by one identifier “conforms to” or is a sub-type of another format

The bitstream format metadata admin panel

BitstreamFormat Renovation Prototype

The Bitstream Format Metadata Admin Panel

Importing new bitstream formats

BitstreamFormat Renovation Prototype

Importing New Bitstream Formats

Editing bitstreamformat metadata

BitstreamFormat Renovation Prototype

Editing BitstreamFormat Metadata

Digital preservation strategies

BitstreamFormat Renovation Prototype

Digital Preservation Strategies

  • Pluggable architecture allows for access to external identification and technical metadata tools

  • Access and preservation rely on accurate format identification

  • Migration / Obsolescence tools are only effective with correct and precise identification, because format versions matter

  • The creation of derivatives (i.e. thumbnails or delivery versions) via MediaFilter will also rely on accurate identification

Interoperability benefits

BitstreamFormat Renovation Prototype

Interoperability Benefits

  • Avoids platform lock-in

  • Reliable delivery functionality

  • Consistent object description semantics (ORE)‏

  • Interoperability with digital preservation services

Quantitative results

BitstreamFormat Renovation Prototype

Quantitative Results

  • Before:

    • 1,020 Unidentified (0.65%)‏

  • After:

    • 162 Unidentified (0.104%)‏


(155,000 Bitstreams)‏

Related links

BitstreamFormat Renovation Prototype

Related Links

  • http://wiki.dspace.org/index.php/BitstreamFormat_Renovation

  • http://www.nationalarchives.gov.uk/PRONOM/

  • http://droid.sourceforge.net/wiki/index.php/Introduction

  • https://collaborate.oclc.org/wiki/gdfr/about.html

  • http://www.loc.gov/standards/premis/

  • http://pilot.apsr.edu.au/wiki/index.php/AONS_II

  • http://www.ijdc.net/ijdc/article/view/53/

  • http://hul.harvard.edu/jhove/

  • http://web.mit.edu/sands/www/bfr