the importance of data management n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The importance of data management PowerPoint Presentation
Download Presentation
The importance of data management

Loading in 2 Seconds...

play fullscreen
1 / 24

The importance of data management - PowerPoint PPT Presentation


  • 159 Views
  • Uploaded on

The importance of data management. Paul Lambert, 31 st January 2012 Talk to the seminar ‘Data management in the social sciences and the contribution of the DAMES Node’, a session organised as part of the Data Management through e-Social Science ESRC research Node www.dames.org.uk.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The importance of data management' - dafydd


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the importance of data management

The importance of data management

Paul Lambert, 31st January 2012

Talk to the seminar ‘Data management in the social sciences and the contribution of the DAMES Node’, a session organised as part of the Data Management through e-Social Science ESRC research Node

www.dames.org.uk

DAMES, 31/JAN/2012, T1

today s session 2v1 2v3
Today’s session (2V1/2V3)

DAMES, 31/JAN/2012, T1

data management though e social science
‘Data Management though e-Social Science’
  • DAMES – www.dames.org.uk
  • ESRC funded research Node

Funded 2008-11, with ongoing work into 2012 with the NeISS (www.neiss.org.uk) and ‘eStat’ (www.bristol.ac.uk/cmm/research/estat/) projects

  • Aim: Useful social science provisions
      • Specialist data topics – occupations; education qualifications; ethnicity; social care; health
      • Computer science research on secure data models; metadata and linking data; workflows
      • Programme of case studies and provisions

DAMES, 31/JAN/2012, T1

data management means
‘Data management’ means…
  • ‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’[…DAMES Node..]
    • Usually performed by social scientists themselves
    • Most overt in quantitative survey data analysis
      • ‘variable constructions’, ‘data manipulations’
      • navigating abundance of data – thousands of variables
    • Usually a substantial component of the work process
    • Here we differentiate from archiving / controlling data itself

DAMES, 31/JAN/2012, T1

some components
Some components…
  • Manipulating data
    • Recoding categories / ‘operationalising’ variables
  • Linking data
    • Linking related data (e.g. longitudinal studies)
    • combining / enhancing data (e.g. linking micro- and macro-data)
  • Secure access to data
    • Linking data with different levels of access permission
    • Detailed access to micro-data cf. access restrictions
  • Harmonisation standards
    • Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’)
    • Recommendations on particular ‘variable constructions’
  • Cleaning data
    • ‘missing values’; implausible responses; extreme values

DAMES, 31/JAN/2012, T1

the significance of data management for social survey research
‘The significance of data management for social survey research’
  • The data manipulations described above are a major component of the social survey research workload
      • Pre-release manipulations performed by distributors / archivists
        • Coding measures into standard categories; Dealing with missing records
      • Post-release manipulations performed by researchers
        • Re-coding measures into simple categories
        • All serious researchers perform extended post-release management

(and have the scars to show for it)

  • We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently
  • So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

DAMES, 31/JAN/2012, T1

being more effective probably involves
..being more effective probably involves..
  • Knowing about, using and citing previous standard measures/strategies
  • Effective documentation/dissemination of information on the approach used
  • Being proactive (not just relying on the most convenient measure to hand)
  • Trying a few alternatives – sensitivity analysis

DAMES, 31/JAN/2012, T1

documentation and its dissemination is probably the key
‘Documentation’ (and its dissemination) is probably the key…
  • By documentation we mean the ‘paper trail’
    • (such as data & syntax files during secondary survey research)
  • For scientists, this is the log book / journal / laboratory notebook
  • For social sciences, there are few agreed standards

Effective documentation is possible, but requires some effort (e.g. Long, 2009)

Image of Alexander Graham Bell’s 1876 notebook, taken from: http://sandacom.wordpress.com/2010/03/11/the-face-rings-a-bell/

good levels of documentation are not engrained in the social sciences
..good levels of documentation are not engrained in the social sciences!
  • “…Little or nothing is systematically archived from these electronic sources. How many of us routinely keep copies of our old word-processing files once they are no longer of current relevance for research or teaching activities. We have been reminded…of the insecurity and non-survival of departmental and professional files stored in broom cupboards, but how many electronic files even get into that cupboard in the first place?” (p142 of Scott, J. (2005) ‘Some principal concerns in the shaping of sociology’, in Halsey, A.H. and Runciman, W. (eds) British Sociology: See from without and within. London: British Academy)

...Yet, ‘documentation for replication’ is a reasonable expectation for a scientific model of research

(e.g. Steuer, Dale, Freese)…

Steuer, M. (2003). The Scientific Study of Society. Boston: Kluwer Academic.

Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.

Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods & Research, 36(2), 153-71.

DAMES, 31/JAN/2012, T1

a bit of focus
A bit of focus…
  • Most of the DAMES applications aim to facilitate one of two data management activities, their documentation, and the dissemination of that documentation:
  • Variable constructions
      • Coding and re-coding values
  • Linking datasets
      • Internal and external linkages

DAMES, 31/JAN/2012, T1

documentation for replication supports replication of
‘Documentation for replication’ supports replication of..
  • Your own analysis
      • in response to comments, revisions, requests for access)
  • Others’ analysis
      • To build upon – cumulative science
      • To critique / cross-examine
  • In secondary survey research
      • Complex data is often updated (new related records; revised and re-released; re-weighted or re-standardardised; new levels of access/linkage)
      • New analysis feasible - variable operationalisations; new statistical methods
  • Most documentation requirements are achieved by effective use of software (‘syntax’ programming)
      • See our training workshops, www.dames.org.uk/workshops

DAMES, 31/JAN/2012, T1

keep clear records of your dm activities
Keep clear records of your DM activities!

Reproducible (for self)

Replicable (for all)

Paper trail for whole lifecycle

  • In survey research, this means using clearly annotated syntax files

(e.g. SPSS/Stata)

Syntax Examples:

www.dames.org.uk/workshops

DAMES, 31/JAN/2012, T1

we ve written a guide for researchers
We’ve written a guide for researchers...
  • ‘Software Session 1: Documentation and workflows with popular software packages’ (www.dames.org.uk/workshops/stir10/docs_workflows_2010.html)
  • Dozens of sample command files in SPSS, Stata and R from DAMES Node workshops at www.dames.org.uk

DAMES, 31/JAN/2012, T1

slide17

For data distributors, the provision of systematic metadata is also beneficial

Example of DDI format metadata

(see also talk 5)

DAMES, 31/JAN/2012, T1

nesstar
NESSTAR

DAMES, 31/JAN/2012, T1

what more is needed for good data management
What more is needed for good data management?
  • Good standards in the operationalisation of variables
      • See yesterday’s workshop sessions (www.dames.org.uk)
      • Most options have already been studied!
      • Using GEODE/GEMDE/GEEDE to facilitate sensitivity analysis and comparisons of alternative plausible measures
      • Collect documentation/metadata on specialist records
      • Promote more effective measurement options

e.g. effect proportional scaling; replication of measures used before; derivation of recommended standards

DAMES, 31/JAN/2012, T1

dames gesde tools online services for data coordination organisation
DAMES ‘GESDE’ tools: online services for data coordination/organisation

Tools for handing variables in social science data

Recoding measures; standardisation / harmonisation; Linking; Curating

slide22

Predictors of ‘poor health’ in Sweden

(comparison of different occupation-based measures, from DAMES, TP 2011-1)

what more is needed for good data management1
What more is needed for good data management?
  • Incentives/disincentives
      • Arguably, good data management is penalised at present (‘Don’t get it right, get it published’)
      • Few formalised requirements of documentation or data management activity

(cf. metadata publishing standards such as DDI)

      • Citation rankings might incentivise here (citation of your do files..)
      • Prospects are probably rather bleak for good science..!!

DAMES, 31/JAN/2012, T1

summary
Summary

the ‘significance’ of DM is about how much better research might be if we did things more effectively…

  • Can (try to) provide data oriented facilities supporting improved data management
  • May also need a cultural change in expectations…

DAMES, 31/JAN/2012, T1