Afternoon session the archival problem and infrastructure for solutions
This presentation is the property of its rightful owner.
Sponsored Links
1 / 7

Afternoon session: The archival problem and infrastructure for solutions PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on
  • Presentation posted in: General

Afternoon session: The archival problem and infrastructure for solutions. Prof John R Helliwell [email protected] Interactive Publications and the Record of Science ICSTI Winter Workshop Paris , Monday, February 8, 2010. JRH research, publications background.

Download Presentation

Afternoon session: The archival problem and infrastructure for solutions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Afternoon session the archival problem and infrastructure for solutions

Afternoon session:The archival problem and infrastructure for solutions

Prof John R [email protected]

Interactive Publications and the Record of Science

ICSTI Winter Workshop

Paris, Monday, February 8, 2010


Jrh research publications background

JRH research, publications background

  • Professor of Structural Chemistry

  • DSc Physics

  • Approx 200 research papers; 5 books (2 as monographs)

  • Editor-in-Chief of journals published by IUCr 1996-2005 (Acta Crystallographica, Journal of Applied Crystallography, Journal of Synchrotron Radiation)

  • IUCr Representative to ICSTI


What needs to be in place for interactive content to be available in the future

What needs to be in place for interactive content to be available in the future?

  • Emulation of legacy software environments?

  • How to package, identify and interlink the independent components of a complex article?

  • Can we handle distributed articles?

  • Can we identify and retrieve slices through large archived data sets?

  • How to work with changing data sets?

  • What is worth keeping anyway?


The importance of data for publication

The importance of data for publication

  • Interactive figures depend on data

  • Semantic value is added to data, or forms additional (meta)data

  • Fundamental principle of research publication: the work is reproducible

    • exact experimental conditions are given

    • data are preserved/accessible

    • in recent case of animal clones, ‘samples’ also had to be made available upon request

  • Increasing requirement to archive primary data


Data and publication in crystallography

Data and publication in crystallography

  • A reasonable state of affairs ...

    • molecular models archived by journals (CIFs: interactive figures)

    • reduced diffraction data preserved by databases or some journals (data validation; retracted papers)

  • ... but with room for improvement

    • molecular dynamics for the crystalline state difficult to interpret; whole diffraction images preferable for archiving

    • scientific fraud in structural biology/chemistry: archiving of diffraction images provides better security against such frauds

    • but diffraction data images from crystal diffraction experiments are uncompressed, file sizes large. Thus limited appetite (and resources) to preserve it


Crystals diffraction spots and smears molecules and dynamics

Crystals, diffraction spots and smears, molecules and dynamics

Zoom


Some archive technical details

Some archive technical details

  • Protein Data Bank: 60,000 macromolecular structures

    • 80% derived from crystal structure analysis

    • archive doubling in size every 2 to 3 years

    • coordinate file for typical protein ~0.25 Mb; derived from core diffraction data of 1Mb; extracted from ~1 Gb of diffraction images data.

    • data sets need to be archived in quintuplicate (EBI Director to JRH Jan 12 2010)

    • thus 60,000 x 1Gb x 5= 300 Terabytes of primary data for PDB currently

    • cost estimate for PDB to be the sole primary archive provider ca GBP 200,000 per annum: unable to take on this responsibility on

  • Currently researcher agrees to hold project diffraction images for at least 5 years and release them upon request; no archiving commitment from research sponsor

  • Solution in distributed or federated archives (experimental facilities / laboratories / data repositories)?


  • Login