data quality control data formats and preservation versioning and authenticity data storage l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage PowerPoint Presentation
Download Presentation
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage

Loading in 2 Seconds...

play fullscreen
1 / 13

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage - PowerPoint PPT Presentation


  • 426 Views
  • Uploaded on

Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009 Good data management good research high quality data needs to be planned specific for purpose

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data quality control, Data formats and preservation, Versioning and authenticity, Data storage' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data quality control data formats and preservation versioning and authenticity data storage

Data quality control,Data formats and preservation,Versioning and authenticity,Data storage

Managing research data well workshop

London, 30 June 2009

Manchester, 1 July 2009

good data management
Good data management
  • good research
  • high quality data
  • needs to be planned
  • specific for purpose
  • data can be understood and used now and in future
  • data can then be shared and re-used
quality control
Quality control

Data quality control at various stages:

  • data collection
    • e.g. instrument calibration; expert opinion; multiple measurements; computer assisted interviews
  • data entry, digitisation, transcription and coding - standardised and consistent procedures
    • e.g. set up validation rules for data entry; use input masks; detailed variable labelling; missing value coding; use controlled vocabularies or choice lists; best structure to organise data and data files
  • data checking and verifying - automated and/or manual
    • e.g. double entry; check for out-of-range values; apply random sample validation; statistical analyses (descriptives, frequencies, means, range, clustering) to detect errors or find anomalous values; verify data completeness
data formats
Data formats
  • choice of software format for digital data:
    • planned data analyses
    • software availability
    • hardware used
    • discipline specific standards and customs
  • digital data software dependent
  • digital data endangered by obsolescence of software/hardware
  • best formats for long-term preservation - standard formats, interchangeable formats, open formats
    • e.g. tab-delimited; comma-delimited (CSV); ASCII; OpenDocument format; SPSS portable; XML
data format conversions
Data format conversions
  • convert data for preservation or back-up, e.g. export, save as
  • beware of conversion errors:
    • loss of internal metadata
      • e.g. convert MS Access to tab-delimited tables
    • loss of editing, formatting, formulae
      • e.g. convert MS Word to RTF
    • truncation or loss of data
      • e.g. string variables lost in SPSS – STATA conversion
  • check for errors and changes after conversion

Example 1: MS Excel to tab-delimited

Example 2: Word to XML

Example 3: Proprietary audio file (DVF) to WAV

slide7

MS Excel format

Tab–delimited text format

version control
Version control
  • keep track of different copies or versions of data files
  • which methods:
    • single site vs. across locations
    • single vs. multiple users
    • different versions to be stored vs. files to be synchronised
  • single user of data files:
    • file naming – unique file names with date or version number (avoid spaces!)

e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTests_06-04-2008; BGHSurveyProcedures_00_04

    • version control table or file history within or alongside data file
    • version control facility within software, e.g. MS Windows software
  • multiple users of data files
    • same as above
    • control rights to file editing: read/write permissions, e.g. Windows Explorer
    • versioning/file sharing software: check files out/in, e.g. SVN, VSS, Google Docs, Amazon S3
    • manual merging of multiple entries/edits
  • synchronise files, e.g. MS SyncToy software
authenticity of data
Authenticity of data
  • master files
  • assign responsibility for master files
  • record changes to master files
data storage
Data storage
  • digital storage media unreliable
  • file formats and physical storage media ultimately become obsolete
  • optical (CD, DVD) and magnetic media (hard drive, tapes) vulnerable and subject to physical degradation

Best practice:

  • use data formats with long-term readability
  • storage strategy with at least two different forms of storage
  • copy/migrate data files to new media between two and five years after first created
  • check data integrity of stored data files at regular intervals (checksum)
  • know your back-up strategy: institutional/personal; network server/PC/laptop
  • maintain original copy, external local copy and external remote copy
  • test file recovery
  • Data Protection Act and data back-up – may require minimal data copies for personal data; secure storage
example data storage and preservation at ukda
Example: data storage and preservation at UKDA
  • preservation copy (UKDA)
  • shadow copy (UKDA)
  • dissemination copy to reduce load on main system
  • near-site online copy (on campus)
  • off-site online copy
  • tape-based offline copy (UKDA)

Multi-copy, multi-storage media and multi version resilience:

scheduled nightly

robotic

3-monthly

good data management practice
Good data management practice
  • plan data management early
  • assign roles and responsibilities
  • design data management according to needs and purpose of research
  • data management throughout research
resources
Resources
  • ESDS (2008). Guide to good practice: micro data handling and security. http://www.esds.ac.uk/news/publications/microDataHandlingandSecurity.pdf
  • Finch, L. & Webster, J. (2008). Caring for CDs and DVDs. NPO Preservation Guidance. Preservation in Practice Series. London, National Preservation Office. Available at http://www.bl.uk/npo/pdf/cd.pdf
  • UK Data Archive (2009). Manage and Share Data. http://www.data-archive.ac.uk/sharing/

See: http://www.data-archive.ac.uk/sharing/furtherstorage.asp