1 / 55

Part II – Introduction to SILC Data Structure and Documentation

Part II – Introduction to SILC Data Structure and Documentation. DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth. Aims of this session. Introduce the rotational design Explain the concept of the selected respondent

dore
Download Presentation

Part II – Introduction to SILC Data Structure and Documentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part II – Introductionto SILC Data StructureandDocumentation DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth

  2. Aims of thissession • Introduce the rotational design • Explain the concept of the selectedrespondent • Explain the organisation of the data • Point out somereading: Documents of priority

  3. Illustration of the rotational design

  4. Rotational design - Illustration 2006 Initial sample

  5. Rotational design – Illustration cross-sectional 2006

  6. Rotational design – Illustration longitudinal

  7. Rotational design – Illustration longitudinal 2006 e.g. longitudinal data 2011

  8. Rotational design – empirical Not equivalent to the number of years of participation

  9. Rotational design – empirical tab DB075 HHYNR HHYNR (numberofhh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

  10. Rotational design - empirical tab HHYNR YEAR HHYNR (numberofhh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

  11. Rotational design - empirical tab HHYCOUNT HHYNR HHYNR HHYCOUNT HHYCOUNT (= count of household-years) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

  12. Observation UnitsConcept of the selected respondent

  13. Selected respondent

  14. Example: PH030- Limitation in activities because of health problems (register countries) (mainly) not selected respondents (see PH030_F) Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta

  15. Organisation of the data

  16. Organisation of the data EU-SILC consists of 4 separate files for the cross-sectional data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE

  17. Organisation of the data … and of 4 separate data files for the longitudinal data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE

  18. Household Files- longitudinal • Household Register • D-File • Household Data • H-File • Includes every selectedhousehold(also those where the address could not be contacted or which could not be interviewed) • > 19 variables: household identifier, sampling design information, region • Only households which have been contacted and completed a hh interview andat least one hh member has complete data in the personal data file • > 180 variables (incl. flag-variables & imputation-factors): basic data, social exclusion, income, housing • UDB_l11D_ver 2011-1 from 01-08-2013: N = 542 942 households • UDB_l11H_ver 2011-1 from 01-08-2013: N = 411 189 households

  19. Personal Files - longitudinal • Personal Register • R-File • Personal Data • P-File • Only reference population (persons aged 16 and over) and only persons for whom the information could be completed by interview (personal/proxy) and/or register • > 190 variables (incl. flag variables & imputation factors): e.g. demographic, income, work and unemployment • Every person currently living in hh or temporarily absent. • Longitudinal file: also persons registered in the R-File of the previous year or living at least 3 months in the hh during the income reference period. • > 50 variables (incl. flag variables): basic information e.g. relationship between household members • UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons • UDB_l11R_ver 2011-1 from 01-08-2013 N=1,079,261 persons

  20. Depending on the research question: Use of separate datasets Household Register Personal Register Personal Data Household Data

  21. …. or a combination of different datasets Household Register Personal Register Personal Data Household Data

  22. Household Register Household Register • Personal • Register Personal Register Organisation of the data While for both, c-s and longitudinal data all 4 files are linkable among each other, c-s and longitudinal data are not linkable • Household • Data Household Data Personal Data • Personal • Data longitudinal data cross-sectional data

  23. HH • Register • HH • Register • Personal • Register • Personal • Register Organisation of the data … as well as cross-sectional data are not linkable over time (HH-ID and related identifaction variables are randomized) • HH • Data • hh • Data • Personal • Data • Personal • Data t t+1

  24. Organisation of the data… combine different datasets – Key Variables • In order to link (combine) the four files D, H, R and P among each others all observations must have a unique link to the respective three other files This link is achieved by the following 4 key variables (1) Year of Survey (2) Country (3) Household ID (4) Personal ID

  25. Organisation of the data… combine different datasets – Key Variables Household Register Personal Register Personal Data Household Data Year of Survey Country Household ID Personal ID Year of Survey Country Household ID Year of Survey Country Household ID

  26. Organisation of the data Household ID – Personal ID • Household ID • Cross-sectional (max. 6 digits) = hh number 1-999999 • Longitudinal (max. 8 digits) = hh number 1-999999 + split number • Default split number = 00 • Personal ID • Cross-sectional = hh-id + personal number (max 2 digits) • Longitudinal = hh number + default split number (00) + personal number • In the longitudinal survey the Personal ID never changes, even if the person moves to a different household • in the cross-sectional survey, from year to year the Household ID and Personal ID may change

  27. The 4 key variables – illustration (longitudinal data)

  28. Combining information from two separate files at a 1:1 level

  29. Combined data

  30. Combining information from two separate files at a1:n level

  31. Combined data

  32. Use of separate sub datasets Create household level variables from personal level data, e.g. • number of current household members • persons < 18 in household • age of the youngest child in household • Number of unemployed hh-members • Highest educational level in household • …

  33. Create new household level summary variables from person level information, e.g. household size, number of children, age of youngest child (< 18 years)

  34. Some reading – Documents of priority

  35. Some reading – Documents of priority Guidelines_Doc65_2011.pdf • General technical information on sample design, weights, etc. • List of all variables included in the original EU-SILC data base • Description of (cross-sectionaland longitudinal) variables DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc • List of variables removed or added to UserdataBase (UDB) • Methods of anonymisation SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls National and EU Quality reports • http://epp.eurostat.ec.europa.eu/portal/page/portal/income_social_inclusion_living_conditions/quality

  36. Some reading – Documents of priorityGuidelines_Doc65_2011.pdf Source: Guidelines_Doc65_2011.pdf

  37. Some reading – Documents of priority Flag Variable HH020_F Source: Guidelines_Doc65_2011.pdf

  38. Some reading – Documents of priority Flag Variable HH021_F Source: Guidelines_Doc65_2011.pdf

  39. Some reading – Documents of priorityCross-sectional data 2011 Source: UDB_c11H_ver 2011-2 from 01-08-13.dta

  40. Some reading – Documents of priorityLongitudinal data 2011 New (HH021) Old (HH020) Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta

  41. Some reading – Documents of priorityExample: variable included in the cross-sectionaland longitudinal data Source: Guidelines_Doc65_2011.pdf

  42. Some reading – Documents of priorityExample: variable included in the cross-sectionalonly Source: Guidelines_Doc65_2011.pdf

  43. Some reading – Documents of priorityExample: variable included in longitudinal dataonly Source: Guidelines_Doc65_2011.pdf

  44. Some reading – Documents of priorityExample: selectedrespondent Source: Guidelines_Doc65_2011.pdf

  45. Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (cross-sectionalfile)

  46. Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (longitudinal file) Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc

  47. Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (cross-sectionalfile)

  48. Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (longitudinal file)

  49. Some reading – Documents of prioritySILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls

More Related