1 / 13

SHARE Data Cleaning

SHARE Data Cleaning. Stephanie Stuck MEA Vienna November 5/6 th. General philosophy. Respondents are experts of their own lives, in general we (still ) take their answers very seriously

ewa
Download Presentation

SHARE Data Cleaning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6th

  2. General philosophy • Respondents are experts of their own lives, in general we (still ) take their answers very seriously • Only change data if you are sure it is wrong, if answers seem implausible but you are not sure what to do indicate this via flag variable

  3. General rules • Please use data files with original sampid to check and correct data (don’t use data version with sampid2) • Always write programs to correct data (STATA do or SPSS sps files) please never change data directly (e.g. no changes in editors)

  4. General rules • Keep original variables (name: "varname_original”) • Add flag variables to indicate changes(name: "varname_flag) • Save corrected data files with new name (e.g. “filename_corrected”)

  5. General rules • don’t always take wave 1 information for granted, it can be wrong, too • sometimes we will have to change wave 1 data, too • we will have another release of wave 1 data together with the public release of wave 2 • Probably we will already have a minor update of release 2.0.1 early next year

  6. Very next steps • Check for country specific deviations! e.g. especially routing errors, ep071, ep098, hc module etc. • Send information on all country specific deviations to MEA, please don’t forget an English translation or explanation of deviations • Information on important deviations in central variables should be available to all FRB authors together with release 0

  7. Very next steps Check financial amounts for implausible values, e.g. negative or very high amounts • outliers • zero values • wrong currencies • typing errors • “drunken interviewers” problem also consider frequencies of payments etc.

  8. Wrong sampid, cvid or respid MEA already checks for mismatches within and between waves • Please ask survey agencies and send all information you have on renamed cases, mismatches etc. to MEA • Whenever you find new information on mismatches e.g. in remarks send the information to MEA • Please send data files with old and new ids for renamed cases to MEA, provide information on date and reason (if possible) in additional variables • Sometimes only the CV or only the individual modules (DN etc.) have to be renamed (especially but not only if respondents are exchanged within households). Please don’t forget to provide information where changes have to be done. MEA will correct files and send lists with hard cases to country teams to check/ask survey agencies again

  9. General checks • Corrections based on checks of frequency distributions, e.g. outliers, values out of range • Corrections based on consistency checks • within and between modules and waves

  10. More concrete • Check for empty cases • Check for duplicates • Check year of birth between coverscreen (cv_r and cv_h) and dn module, drop-offs and vignettes respectively, and possibly with the gross sample • Check gender CV/DN vs. drop-off/vignettes • Check for consistency of dates: • Check information on marital status: • Check respondent dummies • Check ch module against coververscreen • Check relation to coverscreen respondent

  11. Interviewer remarks • Go through remarks • a lot of them are not helpful, but some are very important (e.g. exchanged respondent, amounts apply to all familiy members, different time horizons etc.) • Categorize problems as much as possible • Write programs to correct data if possible • Flag cases where unsure • Collect information on questions that caused a lot of problems / didn’t work for future waves

  12. Open questions • Go through open questions and code answers into original values if possible • Priority list of variableseducation, employment status

  13. Howtogoon • Your experience is very appreciated • Please send information on what you have done, what problems you found etc. to MEA • MEA will send out more information, results of our discussion now, ‘checking lists’, ‘common problems’, etc. • We should have another meeting/workshop maybe in February or we could have an extra meeting e.g. in Mannheim

More Related