1 / 28

Documentation and Workflows

Documentation and Workflows. Paul Lambert, 24-25 August 2009 Talk to the ‘Data Management for Social Survey Research’ training workshop, part of the Data Management through e-Social Science research Node of the National Centre for e-Social Science www.dames.org.uk / www.ncess.acuk.

ratana
Download Presentation

Documentation and Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Documentation and Workflows Paul Lambert, 24-25 August 2009 Talk to the ‘Data Management for Social Survey Research’ training workshop, part of the Data Management through e-Social Science research Node of the National Centre for e-Social Science www.dames.org.uk / www.ncess.acuk

  2. Manipulating data • Operations performed on datasets by researchers and/or data distributors • At any stage of the research lifecycle • Of considerable consequence to analytical results • DAMES Node: • ‘Data Management’ = manipulation of data, and documenting/assisting the processes of manipulation • E-Social Science approach to facilitating data manipulation (metadata resources; data access facilities; ‘workflow models’)

  3. ‘Documentation for replication’ ..as a reasonable expectation for scientific research that is cumulative and based upon empirical observation… Steuer, M. (2003). The Scientific Study of Society. Boston: Kluwer Academic. Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171.

  4. What needs replication? • Your own analysis (in response to comments, revisions, requests for access) • Others’ analysis • To build upon – cumulative science • To critique / cross-examine • In secondary survey research • Complex data is often updated (new related records; revised and re-released; re-weighted or re-standardardised; new levels of access/linkage) • New analysis feasible - variable operationalisations; new statistical methods

  5. J. Scott Long (2009) • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. 1-5: Programming in Stata 6: Cleaning your data 7: Analysing data and presenting results 8: Protecting your work

  6. Treiman (2009) • Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass. Good professional practice = • Suitable choice of analytical methods to test ideas • Documentation of choices and data operations

  7. How to approach Documentation for Replication in social survey research? • Made easy by secondary access to datasets and standardised software • Careful syntactical documentation • Metadata documents • Metadata standards

  8. Keep clear records of your DM activities! Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007 • In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata) Syntax Examples: www.longitudinal.stir.ac.uk

  9. Stata syntax example (‘do file’)

  10. 1) Syntax documentation Long (2009) is highly prescriptive {may not be wholly attainable} Key issues: • Organisation of syntax files Master files and subfiles (and macros) • Setting consistent paths to source data • Reasonable level of manual annotation of files • Use a text editor!!

  11. The idea of workflows • Workflow modelling is exciting future.. • Workflow documentation • MyExperiment [http://www.myexperiment.org/] • Social survey analysis [Dale, 2006; Freese, 2007; Long, 2009] • At present… • Waiting for tool development • Depositing workflows might impose constraints/burdens

  12. Model1: Analytical file Spouse CAMSIS BHPS, wave A individuals Graphics Spouse SOC Current job RGSC Gender BHPS wave B individuals. Age (yrs) Wave C Age bands Text interface Invoked manually or in response to manipulating graphs

  13. ..good levels of documentation are new in the social sciences... • “…Little or nothing is systematically archived from these electronic sources. How many of us routinely keep copies of our old word-processing files once they are no longer of current relevance for research or teaching activities. We have been reminded…of the insecurity and non-survival of departmental and professional files stored in broom cupboards, but how many electronic files even get into that cupboard in the first place?” Scott (2005: 142)

  14. 2) Metadata documents.. …for documentation for replication • Metadata documents can/should be stored / distributed / disseminated • Main relevant types of metadata documents: • Annotated syntax files • Handwritten workbooks • Codebooks and data file metadata

  15. a) Annotated syntax files • Storage: • Supply authorship details, conditions of access, origins and context of data, software version • ‘Robustify’ your programme (generic locations; ‘capture drop’) • Dissemination: • Available from authors archive • Repec – http://ideas.repec.org/ (Economics) • GEODE/DAMES – www.dames.org.uk (Occupations, Education) • UKDA/ESDS and related data providers (monitored) • Personal webpages – e.g. www.camsis.stir.ac.uk/downloads/data/other/casoc_isco.do

  16. b) Handwritten workbooks • Key here is that they must be published.. • Technical papers • Websites • …. • An emerging payoff - citation indexing! • Croxford, L. (2004). Construction of Social Class Variables. Edinburgh: Working Paper 4 of the ESRC research project on Education and Youth Transitions in England, Wales and Scotland, 1984-2002, Centre for Educational Sociology, University of Edinburgh, and http://www.ces.ed.ac.uk/eyt/EYT_papers/WP04.pdf.

  17. “Because claims in published papers that additional materails are “available from author” usually prove false, at least after a few months, the California Center for Population Research at UCLA recently implemented a mechanism by which additional materials, for example, -do- and –log- files, can be attached to papers posted in its Population Working Paper archive. Other research centers are to be encouraged to do the same” (Treiman, 2009: 404)

  18. E-Science and workflow documentation tools.. • …seek to capture the full record of the work process and all files relevant for documentation (e.g. http://www.myexperiment.org/)

  19. c) Codebooks and data file metadata • Codebook log using data_file_name_codebook.log, replace text disp "DateTime: $S_DATE $S_TIME" notes datasignature codebook, compress codebook describe labelbook, detail log close • See UKDA: data_dictionary.rtf

  20. 3) Metadata standards • Formal standards for recording data exist • most widely used is the ‘DDI’, Data Documentation Initiative, http://www.icpsr.umich.edu/DDI/) • Xml format typewritten or software derived, can be read by software / browsers • Includes options for variable labels, recodes, text descriptions • See UKDA, study_information.htm • NESSTAR

  21. NESSTAR

  22. Summary: Documentation and workflows • Achieving good documentation is facilitated by effective workflows • File locations / stamps / transferability • Variable metadata • Structured logs of all operations – syntax programs • …Documentation - Is it worth it..?

More Related