Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Outline • Purpose: to plan a sampling strategy taking into account for municipal undercoverage of next Italian Census round • Sketch of 2011 Italian Census • Sources of data useful in planning Post Enumeration Survey (PES) • Sampling strategies considered for comparison • Construction of a fictitious, but plausible, population for simulations of sampling universe • Results of simulation study Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Key innovations of the 2011 Italian census • From traditional enumeration method… • Search for households and people on the field • … to a register-supported census • Municipal population registers so to mail out questionnaires to people • Data collection method based on web, mail back and municipal data collection centres • Reduction of the number of enumerators • Data collection from late respondents • Coverage evaluation activities Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Coverage evaluation program • Requested by Eurostat quality report, it is anyhow crucial in this context of extensive process and methods innovations • Over-coverage: people no more living in the municipality who are still enlisted into the population registers • Checked by interviewers during contact of late-respondents • Under-coverage: people living in the municipality being not yet enlisted in population registers • Supplemental lists of people • Extensive search on the field • Statistical estimation based on capture-recapture techniques Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Overview of Italian census undercount • Gross undercoverage of population registers • Estimated by Fortini and Gallo (2009) in about 400,000 people (up to 560,000) through administrative data and mixture model analysis to account for underreporting in the source • Gross undercoverage of 2001 Census (enumeration based) • 2001 Post Enumeration Survey estimates that about 800,000 people were missed • Both estimates are based on strong assumptions • However, this evidence makes reasonable the use of municipal population registers as the main source for households enumeration Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Capture-Recapture Approach • Correction for population register undercount through a second source based on independent field enumeration • x1+people enlisted into municipal register • estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs) • estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area • Petersen estimator of the hidden population is (Wolter, 1986) Main goal: municipality estimates of population counts Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling design for the 2011 Post-Enumeration Survey • About 1300 municipalities and 1,200,000 people will be sampled • Two alternative two-stage sampling design with municipalities and enumeration areas as primary and secondary sampling units • Design A - region by class of population size (less than 5000, 5000-20000, 20000-50000, more than 50000) • Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE) • Stratification and selection of municipalities according to their population size is considered for both designs • It is necessary to sample among municipalities in order to control costs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Estimators • Direct estimates of census counts are available only at planned domain level • small area estimation methods are needed at least for municipalities not included in the sample • Possible available predictors at area level modelling • Population counts coming from register • Demographic indicators (e.g. dependencyratios) • Socio economic indicators • In what follows we consider • Direct estimation at regional level (Planned domains) • Synthetic estimator at municipality level • Assumption of invariance among municipal under-coverage rates at planned domain level Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Direct Estimators Simple Expansion estimators Inverse of the selection probability Calibrated Expansion estimators Final weight Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Synthetic Estimator • Based on invariance assumption of under-coverage rates for municipalities belonging to the same planned domain For each system of weights, the coverage ratio is computed at domain level From the ratios, simple and calibrated synthetic estimators are obtained for municipalities Simple Calibrated Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Empirical study • It is based on simulation study • Two pseudo-populations of 335,643 Italian EAs were considered • Sources of information • 2001 Italian Post Enumeration Census • Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005) • For every non empty EAs belonging to the 8101 Italian municipalities, the following counts were generated • Observed count from population register (X1+) • True (N) population count • Field enumeration count (X+1) • Count of people enumerated by both the sources (X11 ) Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assemble the Pseudo-population For each Municipality EA Population register counts come from 2001 Census counts Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign True population counts to municipality For each Municipality EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign True population counts to EAs For each Municipality EA Population register counts come from 2001 Census counts True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations) 1/r True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign survey counts to EAs Each Municipality EA Survey counts – True N multiplied by coverage rate ‘rs’ ‘rs’ from beta - binomial distribution “alpha” and “beta” such that mean and variance of 2001 PES coverage rates is reproduced (5 macro regions by 4 classes of munic. pop. size) 536 rs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign survey counts to municipality Each Municipality Municipal count is obtained summing up value of the EAs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign number of people enumerated by both the lists Each Municipality People enumerated by both lists: Hypergeometric distribution at EA level with parameters True N, P.Reg, Survey Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Assign number of people enumerated by both the lists Each Municipality Municipal count is obtained summing up EAs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

St. dev. of coverage rates among municipalities • About 400,000 and 900,000 missing people were generated for pseudo-Register and pseudo-Survey respectively • Population register variability is larger for POP2 than for POP1 • Survey variability is larger than its respective Population register variability (because of its lower coverage rate) • Survey variability is not so close to PES variability, even though their order of magnitude is the same Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Variability of coverage rates among EAs – Population registers Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too many points here • Simulated EAs show too many large units with very small coverage rate, which seems not realistic in our context Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Variability of coverage rates among EAs – Control survey Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs) Too few points here • Simulated EAs show too few small units with small coverage rate in this case Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Simulation of the sampling space • Four tests: designs A and B for populations 1 and 2 • Each simulation is based on 500 sample replications • Sampling of municipalities with probability proportional to their population size • Simple random sampling of EAs within municipalities • Simple and weighted direct estimation at domain level • Synthetic estimation at municipality level • Population counts coming from population registers are used here as benchmark for comparisons • downwards biased but available at zero cost of achievement Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Results – Bias of registers vs. synthetic estimates • Main results • Direct estimates have good performance in terms of bias and MSE at domain level • Calibrated estimates overcome the simple ones in terms of MSE, both for direct and synthetic estimators • The less-aggregate design B does not significantly improve the estimates, so only design A is shown here • In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2 • In terms of maximum bias the improvement is not so noticeable Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size Bisectors delimit the zone where synthetic estimates are better than simple register counts in term of bias Less than 5,000 5,000 – 19,000 • Synthetic estimator almost always improve registers in terms of bias • However, the improvement does not seem so prominent 20,000 – 49,000 50,000 and more Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size • Same conclusion for POP2 with worst results for larger municipalities Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Results – MSE of synthetic and direct estimators • Direct estimator can be applied to self-representative municipalities • It is reported in the table for the two classes of larger municipalities • On average, synthetic estimator overcome the direct, which seems not useful even in sampled municipalies • MSE of synthetic estimates is much larger than Bias (in Table 2) • Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50,000 inh. • The most part of municipalities larger than 50,000 inh. show better Synthetic MSE (negative values) • Direct and Synthetic estimates are equivalent for larger municipalities (>250,000 inh.), but only for in POP1 Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census • Concluding Remarks • Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments • Slight improvement in census counts from registers is obtained from synthetic estimates • Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts • Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance • Further developments • Better definition of pseudo-populations with respect to coverage ratios between EAs • Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010

Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT

Loredana Di Consiglio, Marco Fortini, Stefano Falorsi ISTAT

Presentation Transcript

“ Strategies for using SDMX at Istat ” Giuseppe Sindoni, Laura Vignola, Stefano De Francisci ISTAT

ISTAT

Francesco Rizzo (ISTAT - Italy ) Stefano De Francisci (ISTAT – Italy )

Tiziana Tuoto, Nicoletta Cibella, Marco Fortini Italian National Statistical Institute

Giancarlo Carbonetti, Marco Fortini

Marco Di Felice

Marco Malgarini (ISTAT) MASSIMO MANCINI (ISTAT) Lia Pacelli (UNIVERSITY OF TURIN AND LRR)

Francesco Rizzo (ISTAT - Italy ) Stefano De Francisci (ISTAT – Italy )

Piero Demetrio Falorsi , Paolo Righi  falorsi@istat.it , parighi@istat.it 