Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data
Download
1 / 16

Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data. Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi S., Solari F. ~ Istat Pratesi M., Salvati N. ~ Università di Pisa Q2008 ~ Rome, July 11 th. OUTLINE . Italian Labour Force Survey

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data' - levi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data

Ranalli M.G. ~ Università di Perugia

D’Alo’ M., Di Consiglio L., Falorsi S., Solari F. ~ Istat

Pratesi M., Salvati N. ~ Università di Pisa

Q2008 ~ Rome, July 11th


Outline
OUTLINE Istat’s Survey Data

  • Italian Labour Force Survey

  • Standard small area estimators for LFS

  • Small area estimators that incorporate spatial information

  • Model based direct estimator (MBDE)

  • Semi-parametric models (based on p-splines)

  • Experimental study

  • Analysis of results

  • Final remarks


Labour Force Survey Istat’s Survey Data description

  • Labour Force Survey (LFS) is a quarterly two stage survey with partial overlap of sampling units according to a rotation scheme of type (2-2-2).

  • In each province the municipalities are classified as Self-Representing Areas (SRAs) and the Non Self-Representing Areas (NSRAs).

  • From each SRAs a sample of households is selected.

  • In NSRAs the sample is based on a stratified two stage sampling design. The municipalities are the primary sampling units (PSUs), while the households are the Secondary Sampling Units (SSUs).

  • For each quarterly sample about 1350 municipalities and 200,000 individuals are involved.


Small area estimation on Istat’s Survey DataLFS

  • Since 2000, ISTAT disseminates yearly LFS estimates of employed and unemployed counts related to the 784 Local Labour Market Areas (LLMAs).

  • LLMAs are unplanned domains obtained as clusters of municipalities cutting acrossprovinces which are the LFS finest planned domains.

  • The direct estimates are unstable due to very small LLMA sample sizes (more than 100 LLMAs have zero sample size). SAE methods are necessary.

  • Until 2003, a design based composite type estimator was adopted.

  • Starting from 2004, after the redesign of LFS sampling strategy, a unit-level EBLUP estimator with spatially autocorrelated random area effects has been introduced.


Standard small area estimators – design based Istat’s Survey Data

Direct and GREG estimator

  • The direct estimator is given by

  • The GREG estimator is based on the standard linear model:

and can be expressed as an adjustment of the direct estimator

for differences between the sample and population area means of covariates


and is given by

Standard small area estimators – model based

Unit level Synthetic and EBLUP

  • The EBLUP estimator assumes the same model but is given by


Enhanced small area estimators model with unit-specific auxiliary variables,

1. Unit level EBLUP with spatial correlation of area effects

  • The EBLUP-S estimator is based on the following unit level linear mixed model:

  • The matrix A depends on the distances among the areas and on an unknown

  • parameter connected to the spatial correlation coefficient among the areas.


Enhanced small area estimators model with unit-specific auxiliary variables,

2. Model Based Direct Estimator (Chambers & Chandra, 2006)

  • The MBD estimator is based on a unit level linear mixed model and is given by

where the weights are such that is the (E)BLUP of

under the model (Royall, 1976).

  • Calibrated with respect to the total of x.

  • Reduces bias vs EBLUP

  • Does not allow estimation for non-sampled areas

  • Less efficient than EBLUP


Enhanced small area estimators model with unit-specific auxiliary variables,

3. Nonparametric EBLUP (Opsomer et al., 2008)

In the literature there are many nonparametric regression methods (kernel, local polynomial, wavelets…) BUT difficult to incorporate in a Small area model

Methods based on penalized splines(Eilers e Marx, 1996; Ruppert et al., 2003) can be estimated by means of mixed models -> promising candidate for SAE methods

  • Great Flexibility in definition of model

  • Estimable with existing software using REML

  • Hard to estimate efficiency and test for terms significance (via bootstrap?)


LFS empirical study model with unit-specific auxiliary variables,

The simulation study on LFS has been carried out to estimate the unemployment rate at LLMA level

  • 500 two-stage LFS sample have been drawn from 2001 census data set.

  • The performances of the methods have been evaluated for the estimation of the unemployment rate in the 127 LLMAs belonging to the geographical area “Center of Italy ”.

  • GREG, Synthetic, EBLUP small area estimators have been applied considering two different sets of auxiliary variables

    Case A - LFS real covariates = sex by 14 age classes + employment indicator at previous census;

    Case B – LFS real covariates + geographic coordinates(latitude and longitude of the municipality the sampling unit belongs to).


Enhanced Small area estimators model with unit-specific auxiliary variables,

  • Spatial EBLUP:Aspatial correlation in the variance matrix of the random effects has been considered (EBLUP SP) + Case A covariates

  • MBD:Model based direct estimation is performed on sampled LLMAs, while synthetic estimators based on unit level linear mixed model is considered for non sampled LLMAs (Case A covariates)

  • Nonparametric EBLUP:twosemiparametric representations based on penalized splines have been applied (fitted as additional random effects):

    • geographical coordinates of the municipality (EBLUP-SPLINE SP): this allows for a finer representation of the spatial component vs EBLUP SP (at municipality level instead of LLMA).

    • age (EBLUP-SPLINE AGE & EBLUP SP-SPLINE AGE)


Evaluation Criteria model with unit-specific auxiliary variables,

  • % Relative Bias:

  • % Relative Root Mean Squared Error:

Average Absolute RB:

Average RRMSE:

Maximum Absolute RB:

Maximum RRMSE:


Results – A: LFS covariates; B = A + geog. coord. mun. model with unit-specific auxiliary variables,


Analysis of results model with unit-specific auxiliary variables,

  • The results of GREG, SYNTH and EBLUB in case B, when geographical information is considered in the fixed term, display better performances in terms of bias.

  • In terms of MSE standard estimators in case A outperform standard estimators in case B if the ARRMSE is considered as overall evaluation criteria, while better results are obtained in case B if MRRMSE is considered

  • Area level estimators (not shown here) perform a little better in terms of Bias but much worse in terms of MSE.


Analysis of results model with unit-specific auxiliary variables,

  • EBLUP SP can be compared with the unit level EBLUP with geographical information included as covariates and the EBLUP-SPLINE SP.

    • EBLUP SP show better performances in terms of MSE, while the unit level EBLUP outperform the other estimators in terms of bias.

    • The EBLUP-SPLINE SP displays performances in between the other estimators.

  • EBLUP-SPLINE AGE performs similarly to the unit level EBLUP in Case A

    • The use of the age in a nonparametric way is an alternative use of auxiliary information. With respect to case A the model is more parsimonious.

  • As it was expected MBDE shows better results in term of bias and performs poorly in term of MSE than other SAE methods

  • The use of autocorrelation structure together with the spline on the variable age doesn’t improve the performances


Final remarks model with unit-specific auxiliary variables,

  • The model group is a small portion of Italy (center); hence the area specific effects are smaller than they could be if an overall model was considered for all the country: the introduction of geographical information should be analyzed considering a larger model level group

  • Sensitivity to smoothing parameters’ choice in the splines approach has to be investigated.

  • The introduction of the sampling weighs should be considered to try to achieve benchmarking with direct estimates produced at regional level

  • The response in a 0-1 variable: a logistic mixed model is currently being investigated


ad