1 / 18

Data Publication at the British Atmospheric Data Centre

This publication discusses the challenges and potential solutions for data publication and citation in atmospheric science, with a focus on the British Atmospheric Data Centre (BADC). Topics covered include the role of BADC, data sets available, the CLADDIER project, and the need for improved data citation practices. The conclusion highlights the importance of formalizing data packaging and implementing an external data review process.

tgoldie
Download Presentation

Data Publication at the British Atmospheric Data Centre

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Publication at the British Atmospheric Data Centre CLADDIER S J Pepler, B Lawrence, P Simpson, J Hey, C Jones

  2. Overview • Work Context • BADC • CLADDIER • Citing Data sets • The encapsulation problem. • The publisher problem. • Conclusions DCC 2006

  3. What is the BADC • NERC’s designated data centre for atmospheric science. • "The role of the British Atmospheric Data Centre (BADC) is to assist UK atmospheric researchers to locate, access and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by Natural Environment Research Council (NERC) projects.“ • Curation and Facilitation. • http://badc.nerc.ac.uk/ • Part of NCAS DCC 2006

  4. Data Sets “A collection of files with a common theme and administration” • Ground based observation networks Met Office surface stations • Model output NWP, ECMWF reanalyses & Climate models • Satellite data TOMS, Envisat & MSG • NERC programmes data UTLS, CWVC & URGENT DCC 2006

  5. MST radar data • One dataset • 444GB, 322,000 files • Lots of docs • Multiple formats • Multiple version • Multiple products • More data every hour DCC 2006

  6. CLADDIER • Citation, Location, And Deposition In Discipline & Institutional Repositories • Aims: to provide discovery and citation of data and documents between repositories. • Provide inter-repository communication of citation information. DCC 2006

  7. DCC 2006

  8. How do scientists want to cite data? • We asked a range of scientists what they Scientists and data providers what do you want to cite? • Unambiguous and persistent. Identifiers good. • Readable. Just Identifiers bad. • Should look like a paper reference. Author, publication date, etc. • Broad scale to avoid reference bloat, but… • … refer to subsets of data by product type, version, or other specific semantics. • Probably put specifics in the text of the article. • Dataset should be defined by instruments, activities of observation platforms. DCC 2006

  9. Edition could also be “Mesosphere, Spectra Widths” Publisher? Citation of BADC data set Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility [Thomas, L.; Vaughan, G.] Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. Version 2, Cartesian products. British Atmospheric Data Centre (BADC), 1990- [cited 2006 Apr 25]. Available from http://badc.nerc.ac.uk/data/mst. DCC 2006

  10. Problem 1: Edition • Is it OK for the author to make up the edition? • No, otherwise the data referred to is not clearly defined. • What we need it a way of referencing the semantics. • The NERC DataGrid is already developing the Climate Sciences Mark-up Language (CSML) to do this. Their aim is data manipulation. DCC 2006

  11. CSML Feature types • defined on basis of geometric and topologic structure DCC 2006

  12. ProfileSeriesFeature ProfileFeature GridFeature Climate ScienceModelling Language • CSML feature types • examples... Collections of features are allowed. DCC 2006

  13. Problem 2: Publisher • The Publisher makes the items avaliable and performs quality control measures; most notably peer review. • Can we do peer review of our data sets? • Option1: The BADC becomes a proper publishing organisation and organise peer review. • This is a lot of work, who is going to pay? • Option 2: A existing publisher does the peer review for us using existing processes. • This is a lot of work, who is going to pay? DCC 2006

  14. Using the RMetS for Peer-review • Using the RMetS would accelerate acceptance of datasets as peers of papers. • Needs to fit in with current practices. • Needs to fit in with current software tools for managing the peer review process. • Needs a sustainable business model. • Overlay journal? DCC 2006

  15. Citation in Data Journal Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility [Thomas, L.; Vaughan, G.] . Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. Version 2, Cartesian products. RMetS Data Publications, 1990-[cited date] . Available from http://badc.nerc.ac.uk/data/mst/v2cart200602.xml. [doi:10233/23498234] DCC 2006

  16. Conclusions • A formalised packaging of data needs to be put in place to clarify the boundaries of these multi-object datasets. These not only help authors to reference the data, but also the data creators to track the use of the data, and archive managers as data is stored, reviewed and collected. • An external data review process needs to be put in place to elevate the status of data sets. Using an existing publisher to coordinate the reviews may accelerate the acceptance for data publication by authors. DCC 2006

  17. DCC 2006

  18. (SAX) demarshalling <CSML> Climate ScienceModelling Language • Provides semantic abstraction layer • Provides ‘wrapper’ architecture for legacy data files • Composite design pattern for aggregation instantiateNetCDF(DatasetID, FeatureID) DCC 2006

More Related