1 / 11

SHARE Data Cleaning General rules and procedures

SHARE Data Cleaning General rules and procedures. Stephanie Stuck MEA Antwerp February 6 th /7 th 2008. General philosophy. Respondents are experts of their own lives, in general we (still ) take their answers very seriously

erv
Download Presentation

SHARE Data Cleaning General rules and procedures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHARE Data CleaningGeneral rules and procedures Stephanie Stuck MEA Antwerp February 6th/7th 2008

  2. General philosophy • Respondents are experts of their own lives, in general we (still ) take their answers very seriously • Only change data if you are sure it is wrong, if answers seem implausible but you are not sure what to do indicate this via flag variable

  3. General rules • Please use data files with sampid for data cleaning (don’t use data version with sampid2) • Always write programs to correct data (STATA do or SPSS sps files) please never change data directly (e.g. no changes in editors)

  4. Program files (do or sps) should always start with: • Name of author & date of program • Data version (date) and modules • Short description of program • Sequence of programs

  5. in programs always • Keep original variables (“varname_original”) STATA: • generate dn003_original = dn003_ SPSS: • compute dn003_original = dn003_ • do not change variables called “varname_original” • but change variables with “varname” STATA: • replace dn003_ = 1919 if sampid == “1206211111100” & respid == 1 SPSS: • if (sampid == “1206211111100” & respid == 1) dn003_ = 1919

  6. in programs always • Add flag variables to indicate changes (“varname_flag”) STATA • generate dn003_flag = 0 • replace dn003_flag = 1 if dn003_original ~= dn003 SPSS • compute dn003_flag = 0 • if (dn003_original ~= dn003) dn003_flag = 1 • Please label flag variables • “0” should always be used for “no changes/ok” • Other values can be used as needed e.g.: “1: year of birth changed”“2: implausible”

  7. Always • Save corrected data files with new name • save “filename_corrected_1”) • save “filename_corrected_2”)

  8. General procedures • Country teams send program files to MEA • MEA runs files and creates new data versions • MEA uploads files to web site on new internal SHARE site • New data versions will be named with numbers in the end: share_w2_`module’_1 • Country teams download files and can go on checking and cleaning data

  9. Wave 1 data • Please don’t take wave 1 information for granted, it can be wrong, too • sometimes we will have to change wave 1 data, too • CentERdata and MEA currently prepares a version of wave 1 data that includes • Respid for all eligibles (right now respid is only included for respondents) • Flags for changes during cleaning wave 1 data • we will have another release of wave 1 data together with the public release of wave 2

  10. What I learned • You need more ‘step by step’ guidelines, clear instructions, • Where to start – priority list • What exactly to do – programs, examples • When to do it – schedule

  11. Very next steps • Send the programs you have written to MEA • Send drop offs and vignette forms to MEA (paper versions), also check them for country specific deviations • Imputations group and MEA send around priority list and more instruction • MEA and CentERdata prepare updated wave 1 and wave 2 files incl. sampid, respid for all eligibles & a new merging variable

More Related