Usability issues facing 21st century data archives
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Usability Issues Facing 21st Century Data Archives PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Usability Issues Facing 21st Century Data Archives. Joey Mukherjee and David Winningham [email protected] Future Scientists. Mission. Archive. Team. Write Papers. Current Archiving Goal. Raw Data. Processed Data. Data Iteration. Quality Data. Quality Data. Future Scientists.

Download Presentation

Usability Issues Facing 21st Century Data Archives

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Usability issues facing 21st century data archives

Usability Issues Facing 21st Century Data Archives

Joey Mukherjee and David Winningham

[email protected]


Current archiving goal

Future Scientists

Mission

Archive

Team

Write Papers

Current Archiving Goal

Raw

Data

Processed

Data

Data

Iteration

Quality

Data

Quality

Data


Current archiving reality

Future Scientists

Mission

Home Institution Archive

Permanent Archive

Team

Write Papers

Current Archiving Reality

Raw

Data

Processed

Data

Data

Iteration

Public

Data

Data

Subsets

Unchecked

Data


New goal

Future Scientists

Archive

Mission

Team

Write Papers

New Goal

Raw

Data

Processed

Data

Data

Iteration

Processed

Data

Processed

Data


Standardizing howto

Standardizing HOWTO

  • Make it easy

  • Make it useful

  • Make it extensible


Make it easy

Make it Easy

  • Reading / writing files must be super easy (i.e. cheap!)

    • Either with tools or libraries

  • Tools can be command line or GUI


Make it useful

Make it Useful

  • How do I look at it?

    • Plots/Analysis

  • What else can I do with it?

    • Read into IDL, Matlab, Excel, etc.

  • Must have immediate benefits


Make it extensible

Make it Extensible

  • Must be possible for others to add value added services

  • Must be able to hold varieties of data

  • Must agree to give up control on content


Case studies html

Case Studies: HTML

  • Easy to create!

  • Once done, look at in browser

  • Embrace / Extend


Case studies spase

Case Studies: SPASE

  • Creation is slow and difficult

  • Once created, no real benefits yet

  • VxOs have embraced, no one extended yet


Case studies idfs

Case Studies: IDFS

  • Until recently, difficult to create, complex

  • Once in, easy to look at, use, archive, etc.

  • Somewhat extensible


Things right with idfs

Things right with IDFS

  • Efficient

  • Self documenting

  • Calibrations stored in text file

  • Science units derived instead of stored

  • Little to no reprocessing ever needed


Other idfs benefits

Other IDFS Benefits

  • Can store most types of space physics data from raw telemetry to highly processed science units

  • Reversible from science units to raw telemetry

  • Usable by data processor, scientist, and data archiver


Things wrong with idfs

Things wrong with IDFS

  • Overly complex format and API

  • Not enough support in other tools - poor buy-in

  • Analysis routines merged with the file format - tried to do too much!


Implementation plan

Implementation Plan

  • Develop a simple file format that can contain any and all types of time series space physics data

  • Develop tools that allow someone to create and inspect files in this format

  • Merge in the best parts of IDFS, CDF, netCDF, HDF, FITS, etc... without breaking paradigm of simplicity


Simple file format

Simple File Format

  • Format might already exist:

    • HDF5

    • XML

    • JSON

    • Other data models?


Making it useful

Making it useful

  • Get buy-in from visualization tools (SDDAS, DataShop, VisBard, IDL DLM, etc.)

  • Get buy-in from archives sites (PDS, PSA, NSSDC, etc.)

  • Seed money is essential


Advantages

Advantages

  • Providers

  • Users

  • Management


Advantages providers

Advantages: Providers

  • Instrument teams now have something to work toward

  • Can develop expertise


Advantages users

Advantages: Users

  • Quick ways to create plots or access data

  • Expertise again!


Advantages management

Advantages: Management

  • Homogenous archives are infinitely easier to manage and maintain

  • Value added services are a natural extension of quality archives


Conclusion

Conclusion

  • Why now? Because SPASE is gaining traction, this is the next logical step.

  • This will save money for everyone in the long run.

  • Everyone benefits with value added services.


  • Login