FITS: The File Information Tool Set
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

FITS: The File Information Tool Set PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

FITS: The File Information Tool Set. Background. FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008

Download Presentation

FITS: The File Information Tool Set

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fits the file information tool set

FITS: The File Information Tool Set


Background

Background

  • FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors.

  • Developed Fall 2008

  • First public release Spring 2009: http://fits.googlecode.com


Fits the file information tool set

Why?

  • Needed an automatic way to identify and extract metadata for a wide range of file types

  • No single file analysis tool satisfied our needs


Design goals

Design Goals

  • Act as a wrapper around other open source tools

  • Extensible

  • Needs to be a standalone command line tool and also provide an API

  • Allow priority setting for tools

  • Open source


The tools

The Tools

  • Current tools:

    • Jhove 1.5

    • Exiftool

    • National Library of New Zealand Metadata Extractor (NLNZ)

    • DROID

    • FFIdent

    • File Utility

  • 3 Categories

    • File Identification (all of them)

    • Metadata Extraction (Jhove, Exiftool, NLNZ)

    • format Validation (Jhove)


Process

Process


Features

Features

  • Conflict management

  • Value normalization

    • “inches” vs “2”

  • Tool prioritization

  • Format tree for understanding more specific format identities.

    • PDF/A is a more specific version of PDF


Example output

Example Output

  • <fits>

  • <identification>

  • <identity format="Graphics Interchange Format" mimetype="image/gif">

  • <tool toolname="Jhove" toolversion="1.5" />

  • ...

  • </identity>

  • </identification>

  • <fileinfo>

    • <size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">40149</size>

    • <md5checksum toolname="OIS File Information" toolversion="0.1"

    • status="SINGLE_RESULT">265c9345ebf93c89d472766fda095de4</md5checksum>

  • ...

  • </fileinfo>

  • <filestatus>

  • <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed>

  • <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</valid>

  • </filestatus>

  • <metadata>

  • <image>

  • <height toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">1024</height>

  • ...

  • </image>

  • </metadata>

  • </fits>


Configuration

Configuration

  • All settings are in the fits.xml config file

  • Enable/disable tools (available in the API too)

  • Prevent tools from processing files with specific file extensions

  • Set tool priority

  • Add new tools

  • Use your own consolidator code

  • Report or ignore conflicts

  • Options to display original tool output


Sample configuration file

Sample Configuration File

  • <fits_configuration>

  • <!-- Order of the tools determines preference -->

  • <tools>

  • <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process -->

  • <tool class="edu.harvard.hul.ois.fits.tools.jhove.Jhove" exclude-exts="dng,mbx"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.fileutility.FileUtility" exclude-exts="dng,wps"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.exiftool.Exiftool" exclude-exts="txt,wps,vsd"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.droid.Droid" exclude-exts="dng"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor" exclude-exts="dng,zip,odb,ott,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,odm,oth"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.FileInfo"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.XmlMetadata"/>

  • <tool class="edu.harvard.hul.ois.fits.tools.ffident.FFIdent" exclude-exts="dng,wps,vsd"/>

  • </tools>

  • <output>

  • <dataConsolidator class="edu.harvard.hul.ois.fits.consolidation.OISConsolidator"/>

  • <display-tool-output>true</display-tool-output>

  • <report-conflicts>true</report-conflicts>

  • <validate-tool-output>false</validate-tool-output>

  • <internal-output-schema>xml/fits_output.xsd</internal-output-schema>

  • <external-output-schema>http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd</external-output-schema>

  • <fits-xml-namespace>http://hul.harvard.edu/ois/xml/ns/fits/fits_output</fits-xml-namespace>

  • </output>

  • <!-- file name of the droid signature file to use in tools/droid/-->

  • <droid_sigfile>DROID_SignatureFile_V35.xml</droid_sigfile>

  • </fits_configuration>

10


Some limitations

Some Limitations...

  • Speed

  • Technical metadata only returned if the tool that reported it is in the first <identity> block

  • FITS considers a successful identification to be a combination of the format name and mime type


Future plans

Future Plans

  • More tools

    • Apache Tika (text document formats)

    • Jhove 2

    • Aduna Aperture (text, documents, email formats)

    • Mediainfo (audio and video formats)

  • Better audio and video format support as we add object support for them to DRS2


Wrap up

Wrap Up

  • http://fits.googlecode.com

  • http://ots-schemas.googlecode.com

    • Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, TextMD, DocumentMD, and soon AES audio metadata

  • More information on DRS2: http://hul.harvard.edu/ois/systems/drs/enhancements.html


  • Login