slide1 n.
Download
Skip this Video
Download Presentation
FITS: The File Information Tool Set

Loading in 2 Seconds...

play fullscreen
1 / 13

FITS: The File Information Tool Set - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

FITS: The File Information Tool Set. Background. FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'FITS: The File Information Tool Set' - kenna


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
background
Background
  • FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors.
  • Developed Fall 2008
  • First public release Spring 2009: http://fits.googlecode.com
slide3
Why?
  • Needed an automatic way to identify and extract metadata for a wide range of file types
  • No single file analysis tool satisfied our needs
design goals
Design Goals
  • Act as a wrapper around other open source tools
  • Extensible
  • Needs to be a standalone command line tool and also provide an API
  • Allow priority setting for tools
  • Open source
the tools
The Tools
  • Current tools:
    • Jhove 1.5
    • Exiftool
    • National Library of New Zealand Metadata Extractor (NLNZ)
    • DROID
    • FFIdent
    • File Utility
  • 3 Categories
    • File Identification (all of them)
    • Metadata Extraction (Jhove, Exiftool, NLNZ)
    • format Validation (Jhove)
features
Features
  • Conflict management
  • Value normalization
    • “inches” vs “2”
  • Tool prioritization
  • Format tree for understanding more specific format identities.
    • PDF/A is a more specific version of PDF
example output
Example Output
  • <fits>
  • <identification>
  • <identity format="Graphics Interchange Format" mimetype="image/gif">
  • <tool toolname="Jhove" toolversion="1.5" />
  • ...
  • </identity>
  • </identification>
  • <fileinfo>
    • <size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">40149</size>
    • <md5checksum toolname="OIS File Information" toolversion="0.1"
    • status="SINGLE_RESULT">265c9345ebf93c89d472766fda095de4</md5checksum>
  • ...
  • </fileinfo>
  • <filestatus>
  • <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed>
  • <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</valid>
  • </filestatus>
  • <metadata>
  • <image>
  • <height toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">1024</height>
  • ...
  • </image>
  • </metadata>
  • </fits>
configuration
Configuration
  • All settings are in the fits.xml config file
  • Enable/disable tools (available in the API too)
  • Prevent tools from processing files with specific file extensions
  • Set tool priority
  • Add new tools
  • Use your own consolidator code
  • Report or ignore conflicts
  • Options to display original tool output
sample configuration file
Sample Configuration File
  • <fits_configuration>
  • <!-- Order of the tools determines preference -->
  • <tools>
  • <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process -->
  • <tool class="edu.harvard.hul.ois.fits.tools.jhove.Jhove" exclude-exts="dng,mbx"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.fileutility.FileUtility" exclude-exts="dng,wps"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.exiftool.Exiftool" exclude-exts="txt,wps,vsd"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.droid.Droid" exclude-exts="dng"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor" exclude-exts="dng,zip,odb,ott,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,odm,oth"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.FileInfo"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.XmlMetadata"/>
  • <tool class="edu.harvard.hul.ois.fits.tools.ffident.FFIdent" exclude-exts="dng,wps,vsd"/>
  • </tools>
  • <output>
  • <dataConsolidator class="edu.harvard.hul.ois.fits.consolidation.OISConsolidator"/>
  • <display-tool-output>true</display-tool-output>
  • <report-conflicts>true</report-conflicts>
  • <validate-tool-output>false</validate-tool-output>
  • <internal-output-schema>xml/fits_output.xsd</internal-output-schema>
  • <external-output-schema>http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd</external-output-schema>
  • <fits-xml-namespace>http://hul.harvard.edu/ois/xml/ns/fits/fits_output</fits-xml-namespace>
  • </output>
  • <!-- file name of the droid signature file to use in tools/droid/-->
  • <droid_sigfile>DROID_SignatureFile_V35.xml</droid_sigfile>
  • </fits_configuration>

10

some limitations
Some Limitations...
  • Speed
  • Technical metadata only returned if the tool that reported it is in the first <identity> block
  • FITS considers a successful identification to be a combination of the format name and mime type
future plans
Future Plans
  • More tools
    • Apache Tika (text document formats)
    • Jhove 2
    • Aduna Aperture (text, documents, email formats)
    • Mediainfo (audio and video formats)
  • Better audio and video format support as we add object support for them to DRS2
wrap up
Wrap Up
  • http://fits.googlecode.com
  • http://ots-schemas.googlecode.com
    • Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, TextMD, DocumentMD, and soon AES audio metadata
  • More information on DRS2: http://hul.harvard.edu/ois/systems/drs/enhancements.html