Afternoon session the archival problem and infrastructure for solutions
Download
1 / 7

Afternoon session: The archival problem and infrastructure for solutions - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Afternoon session: The archival problem and infrastructure for solutions. Prof John R Helliwell [email protected] Interactive Publications and the Record of Science ICSTI Winter Workshop Paris , Monday, February 8, 2010. JRH research, publications background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Afternoon session: The archival problem and infrastructure for solutions' - studs


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Afternoon session the archival problem and infrastructure for solutions

Afternoon session:The archival problem and infrastructure for solutions

Prof John R [email protected]

Interactive Publications and the Record of Science

ICSTI Winter Workshop

Paris, Monday, February 8, 2010


Jrh research publications background
JRH research, publications background

  • Professor of Structural Chemistry

  • DSc Physics

  • Approx 200 research papers; 5 books (2 as monographs)

  • Editor-in-Chief of journals published by IUCr 1996-2005 (Acta Crystallographica, Journal of Applied Crystallography, Journal of Synchrotron Radiation)

  • IUCr Representative to ICSTI


What needs to be in place for interactive content to be available in the future
What needs to be in place for interactive content to be available in the future?

  • Emulation of legacy software environments?

  • How to package, identify and interlink the independent components of a complex article?

  • Can we handle distributed articles?

  • Can we identify and retrieve slices through large archived data sets?

  • How to work with changing data sets?

  • What is worth keeping anyway?


The importance of data for publication
The importance of data for publication available in the future?

  • Interactive figures depend on data

  • Semantic value is added to data, or forms additional (meta)data

  • Fundamental principle of research publication: the work is reproducible

    • exact experimental conditions are given

    • data are preserved/accessible

    • in recent case of animal clones, ‘samples’ also had to be made available upon request

  • Increasing requirement to archive primary data


Data and publication in crystallography
Data and publication in crystallography available in the future?

  • A reasonable state of affairs ...

    • molecular models archived by journals (CIFs: interactive figures)

    • reduced diffraction data preserved by databases or some journals (data validation; retracted papers)

  • ... but with room for improvement

    • molecular dynamics for the crystalline state difficult to interpret; whole diffraction images preferable for archiving

    • scientific fraud in structural biology/chemistry: archiving of diffraction images provides better security against such frauds

    • but diffraction data images from crystal diffraction experiments are uncompressed, file sizes large. Thus limited appetite (and resources) to preserve it



Some archive technical details
Some archive technical details dynamics

  • Protein Data Bank: 60,000 macromolecular structures

    • 80% derived from crystal structure analysis

    • archive doubling in size every 2 to 3 years

    • coordinate file for typical protein ~0.25 Mb; derived from core diffraction data of 1Mb; extracted from ~1 Gb of diffraction images data.

    • data sets need to be archived in quintuplicate (EBI Director to JRH Jan 12 2010)

    • thus 60,000 x 1Gb x 5= 300 Terabytes of primary data for PDB currently

    • cost estimate for PDB to be the sole primary archive provider ca GBP 200,000 per annum: unable to take on this responsibility on

  • Currently researcher agrees to hold project diffraction images for at least 5 years and release them upon request; no archiving commitment from research sponsor

  • Solution in distributed or federated archives (experimental facilities / laboratories / data repositories)?


ad