1 / 14

Publishing Data

Publishing Data. Catherine Jones Library Systems Development Manager, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK 15 th May 2007. Contents. Set the scene Definition of publication Complexities Making data permanently available Quality control

martinlori
Download Presentation

Publishing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Publishing Data • Catherine Jones • Library Systems Development Manager, STFC Rutherford Appleton Laboratory • CLADDIER workshop, Chilworth, Southampton, UK 15th May 2007

  2. Contents • Set the scene • Definition of publication • Complexities • Making data permanently available • Quality control • User requirements • Issues

  3. Microsoft’s Science 2020 Report • Modern scientific communication relies on both journals and databases. At present these are not integrated. • By 2020 mutual linking will be commonplace and publications just containing peer-reviewed data will become available. • http://research.microsoft.com/towards2020science/downloads.htm

  4. Publication concept • In this context “publication” is defined as the process through which data is fixed and made retrievable over the long term, and may imply that there has been some quality control process.

  5. Complexities of Data These all show the same data at different levels of processing.

  6. Making data permanently available • Three areas: • Defining what is to be kept: encapsulation • Ensuring that it is described effectively: metadata • Identifying who is responsible for the data management: trusted repository

  7. Encapsulation • A method of identifying a fixed collection of meaningful data so that it can be preserved as a clearly defined unchanging entity. • Datasets which are still growing • Versions of datasets • Format translations

  8. Metadata • Needs to be created to ensure that the data is usable now and over the long term. • Semantic encapsulation is important as this is likely to be used in citation.

  9. Trusted repository • To ensure that the data is available over the long term, the Data Centre needs to be on a secure footing and well managed.

  10. Quality Control • Usability of the dataset. This is one of the roles of the Data Centres. • Usefulness of the dataset. This is the role of domain experts.

  11. User requirements for citation • Need for an unambiguous reference to a well defined permanent entity • This reference/citation needs to be understandable for humans • Author and publication year, or equivalents, are important • An unambiguous data reference, in this area includes the activity or tool which produced the data • Source of the data (i.e. the repository) may be as important as the producer and needs to be unambiguous

  12. Requirements from data producers • Traceable to the data provider/producer • Usable for usage metrics • To be recognised as intellectually equivalent to academic papers • Able to be used to search for papers citing data

  13. Citation format • Author, title, [medium], publisher, publication date, identifier, feature, [access date, available at] • Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. British Atmospheric Data Centre (BADC), 1990- urn badc.nerc.ac.uk/data/mst/v3/upd15032006, feature 200409031205 [http://featuretype.registry/VerticalProfile] [cited 2006 Apr 25. Available from http://badc.nerc.ac.uk/data/mst.]

  14. Issues for consideration • The ability to cite data is strongly linked to the definition of the data. • Dynamic datasets pose additional issues for long-term accessibility. • Versioning of the data and the processing/analysing software are big issues to resolve. • Peer review of the data is important. • Identification of datasets where a facility may provide data from a set of instruments is a complex decision.

More Related