1 / 13

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009 Good data management good research high quality data needs to be planned specific for purpose

benjamin
Download Presentation

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data quality control,Data formats and preservation,Versioning and authenticity,Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009

  2. Good data management • good research • high quality data • needs to be planned • specific for purpose • data can be understood and used now and in future • data can then be shared and re-used

  3. Can you understand / use these data?

  4. Quality control Data quality control at various stages: • data collection • e.g. instrument calibration; expert opinion; multiple measurements; computer assisted interviews • data entry, digitisation, transcription and coding - standardised and consistent procedures • e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files • data checking and verifying - automated and/or manual • e.g. double entry; check for out-of-range values; apply random sample validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness

  5. Data formats • choice of software format for digital data: • planned data analyses • software availability • hardware used • discipline specific standards and customs • digital data software dependent • digital data endangered by obsolescence of software/hardware • best formats for long-term preservation - standard formats, interchangeable formats, open formats • e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML

  6. Data format conversions • convert data for preservation or back-up, e.g. export, save as • beware of conversion errors: • loss of internal metadata • e.g. convert MS Access to tab-delimited tables • loss of editing, formatting, formulae • e.g. convert MS Word to RTF • truncation or loss of data • e.g. string variables lost in SPSS – STATA conversion • check for errors and changes after conversion Example 1: MS Excel to tab-delimited Example 2: Word to XML Example 3: Proprietary audio file (DVF) to WAV

  7. MS Excel format Tab–delimited text format

  8. Version control • keep track of different copies or versions of data files • which methods: • single site vs. across locations • single vs. multiple users • different versions to be stored vs. files to be synchronised • single user of data files: • file naming – unique file names with date or version number (avoid spaces!) e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04 • version control table or file history within or alongside data file • version control facility within software, e.g. MS Windows software • multiple users of data files • same as above • control rights to file editing: read/write permissions, e.g. Windows Explorer • versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3 • manual merging of multiple entries/edits • synchronise files, e.g. MS SyncToy software

  9. Authenticity of data • master files • assign responsibility for master files • record changes to master files

  10. Data storage • digital storage media unreliable • file formats and physical storage media ultimately become obsolete • optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and subject to physical degradation Best practice: • use data formats with long-term readability • storage strategy with at least two different forms of storage • copy/migrate data files to new media between two and five years after first created • check data integrity of stored data files at regular intervals (checksum) • know your back-up strategy: institutional/personal; network server/PC/laptop • maintain original copy, external local copy and external remote copy • test file recovery • Data Protection Act and data back-up – may require minimal data copies for personal data; secure storage

  11. Example: data storage and preservation at UKDA • preservation copy (UKDA) • shadow copy (UKDA) • dissemination copy to reduce load on main system • near-site online copy (on campus) • off-site online copy • tape-based offline copy (UKDA) Multi-copy, multi-storage media and multi version resilience: scheduled nightly robotic 3-monthly

  12. Good data management practice • plan data management early • assign roles and responsibilities • design data management according to needs and purpose of research • data management throughout research

  13. Resources • ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf • Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdf • UK Data Archive (2009). Manage and Share Data. http://www.data-archive.ac.uk/sharing/ See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp

More Related