ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON Joachim R. Frickand Markus M. Grabka DIW Berlin and IZA Bonn DIW Berlin Presentation at the IARIW 29th General Conference, Joensuu, Finland, 22 August 2006 Presented by:
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON
Joachim R. FrickandMarkus M. Grabka
DIW Berlin and IZA Bonn DIW Berlin
Presentation at the IARIW 29th General Conference,
Joensuu, Finland, 22 August 2006
Professor Ian Plewis
Centre for Longitudinal Studies
Bedford Group for Lifecourse and Statistical Studies
Institute of Education, University of London
Main features of the paper:
1.Item non-response for income (as for Hawkes and Plewis)
3.Cross-national comparisons (SOEP, Germany; HILDA, Australia; BHPS, GB)
4.Imputation as used in the three studies.
Income non-response at time t predicts income non-
response at time t+1 (supported by Hawkes and
Income non-response at time t predicts attrition at
time t+1 (also supported by Hawkes and Plewis).
More generally, the literature suggests that the more
item non-response there is at time t in any
longitudinal study, the more likely is attrition at time
This suggests that it might be worth directing more
resources at these ‘frail’ respondents.
Predictors of income non-response (combining waves
using (?) probits or logits):
A very strong effect of being self-employed: the self-
employed are very much less likely to report their
income (supported by Hawkes and Plewis), although
less so in Germany than in GB and Australia.
Is change in employment status associated with
change in response behaviour?
Two kinds of imputation methods are used:
1.Predictive mean matching from a regression model in BHPS.
2.‘Row and column’ imputation as set out by Little and Su (1989), in HILDA and SOEP.
The authors argue, on the basis of previous research, that the second method is the better of the two.
Both are single imputation methods, presumably
devised to fill in holes in public release datasets.
However, most of the statistical literature now
favours multiple imputation in order properly to
represent the sampling variability induced by
The authors consider the effects of the imputation methods used for three issues:
1. Cross-sectional measures of inequality.
2. Longitudinal measures of income mobility.
3.Fixed effects wage regressions – are the fixed effects individuals or sweeps?
We collect panel data to measure and model change
and so we should perhaps focus on the effects of
imputation on change and on dynamic models.
The authors show that income mobility across
quintiles is considerably higher when imputed cases
are combined with observed or complete cases than
it is when using only the observed cases.
However, this difference emerges because there is
considerable mobility for the imputed cases and
some of this must be due to measurement error
generated by the imputations.
A difficulty here is that the authors use cross-
sectional imputation i.e. imputing an income value
for each sweep whereas the real interest is in
imputing mobility or change across sweeps.
Suppose we have a panel study with just two sweeps
with income measured in quintiles at each sweep
and with item non-response at each sweep.
We have three sets of information:
1.Cases with measured income at each sweep, located in the internal 25 cells of a five by five contingency table.
2.The marginal distribution for cases measured at
sweep one but not at sweep two.
3.The marginal distribution for cases measured at
sweep two but not at sweep one.
Little and Rubin (2002, Ch. 13) show how to use the
EM algorithm to estimate the contingency table for
all cases, both fully and partially classified, and this
approach (or a variant of it that accounts for the
ordering of the quintiles) might be more appropriate
for this particular question.
One of the interesting findings from the estimated
wage equations is that the effect of being self-
employed on wages is, for all three studies, more
positive once the imputed cases are introduced into
1.This is a very interesting and thought-provoking
2.It shows that imputation for missing income responses can alter substantive conclusions about, for example, income mobility.
3.BUT the single imputation methods currently used by these panel studies are not those most favoured in the statistical literature.
4.AND imputing levels and taking differences might not be the best way of imputing for change.
5.ALSO income non-response is just one facet of missing data and ideally needs to be considered along with unit non-response at the outset of the panel and attrition as the panel ages.
6.AS ALWAYS, SENSITIVITY ANALYSES ARE CRUCIAL.