140 likes | 150 Views
This article provides an overview of the editing and imputation methods used in the Italian censuses, including the strategies for the 2011 census and likely innovations. It discusses the impact on editing and validation procedures and highlights the use of the DIESIS system and data-driven and minimum change approaches. The article also addresses the identification of respondent paths, validation of person 1 in the household, and the importance of E&I for small but important groups in the population.
E N D
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting on Population and Housing Censuses Geneva, 13-15 May, 2008
Outline • Features of 2001 E&I strategy • E&I strategy for 2011 Census • Likely innovations for 2011 Census • Impact on editing and validation procedures • Conclusions
Features of 2001 E&I strategy • Main E&I purpose: provide a complete and consistent set of data by performing plausible imputations and preserving the maximum amount of collected information • E&I strategy: divide the E&I problem into simpler sub-problems and find appropriate solutions for each of them • Overall E&I process composed of several (connected) procedures addressing to specific problems and implementing suitable methods • Development and use of new techniquesand software tools
E&I strategy for 2011 Census • Built on the useful experience of the 2001 Census, taking account of: • The innovations in the survey design • Eurostat timeliness constraints In particular: • Census variables split into topics processed in pre-determined order (first demographic, then socio-economic) by appropriate procedures • Adaptation of 2001 procedures to the innovations and developing of new procedures by means of highly efficient algorithms • Proper planning, implementation and managing of the E&I procedures
Main elements of the 2011 strategy • Use of DIESIS* system developed in 2001 by ISTAT and academic researchers (Department of Computer and Systems Science of the University of Roma “La Sapienza”). Based on optimization techniques, allows: • Treatment of qualitative and quantitative variables • Between-unit and within-unit edit rules • Joint use of data driven and minimum change approaches • DIESIS will process 2011 demographic variables and, likely, some socio-economic variables • * Data Imputation and Edit System - Italian Software
Main elements of the 2011 strategy • Joint use of data driven and minimum change approaches by DIESIS system • When reduced pool of donors the data driven approach can require imputing too many values • Minimum change approach used to minimize the number of values to be changed
Main elements of the 2011 strategy • Identification of the respondent path • Respondent paths used to: • Compute the Subset of Admissible Values (SAV) of Year of birth, a strata variable for the imputation of demographic variables – connection between demographic and socio-economic steps • Define strata for the imputation of socio-economic variables • Missing responses or errors can make uncertain the identification of the right respondent path • Automatic procedure for the identification of the most likely path based on the analysis of the responses given to filter and dependent questions
Main elements of the 2011 strategy • Validation of Person 1 in the household • Based on optimization techniques implemented in the DIESIS system • The minimum change algorithm assigns the role of Person 1 to the person that minimizes the number of changes needed for the record to be consistent • Identification of potential couples • Components of couples having non-unique relationship to Person 1 identified prior to editing • Score based on the responses provided to the demographic variables
Main elements of the 2011 strategy • Especial care in E&I of small but important groups in the population E.g. Centenarians validation • 2001 procedure: • Automatic match of individuals enumerated in the 2001 with same individuals enumerated in the 1991 • Automatic check for internal consistency of unlinked records • Manual check for consistency with questionnaire images of some ambiguous cases • New procedure supported by availability of local population registers
Likely innovations for 2011 • Short-long form questionnaires • Short: (mainly) demographic variables • Long: demographic and socio-economic variables • Availability of registers • Local population registers (residing individuals) • Integrative registers from auxiliary sources • Residential address lists • Use of multi-mode data collection • Enumerators, CATI, mail, web
Impact on E&I and validation • Socio-economic characteristics collected on sample basis (by long-form) • Two procedures for computing the SAV of Year of birth (one for short-form, one for long-form) • The reducedpool of donorsfor imputation of long-form variables requires careful managing of data collection and donor pool selection phases • Sampling weights required for data validation after E&I of long-form variables
Impact on E&I and validation • Availability of registers : • Improvement of the quantitative control of the forms • Imputation of missing or inconsistent census values by matching census data and register data (Record linkage procedure) • availability of unique record identifiers • same time reference than census data • good quality of register data • Imputation of missing or inconsistent census values by adding register data to census data - enlarging the donor pool
Impact on E&I and validation • Use of multi-mode data collection • Improvement of the collected data quality due to editing performed at the data capturing (CATI, web) • Procedure aiming at verifying duplicate questionnaires is required
Conclusions • E&I strategy for 2011 Census based on 2001 experiences • The new survey design aims to reduce the respondent burden but requires a careful monitoring during production and a more complex E&I process • High efficient procedures need to be developed in order to meet the timeliness requirement E&I is an achievable but hard task