ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON Joachim R. Frick and Markus M. Grabka DIW Berlin and IZA Bonn DIW Berlin Presentation at the IARIW 29th General Conference, Joensuu, Finland, 22 August 2006 Presented by:
ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON
Joachim R. Frick and Markus M. Grabka
DIW Berlin and IZA Bonn DIW Berlin
Presentation at the IARIW 29th General Conference,
Joensuu, Finland, 22 August 2006
Professor Ian Plewis
Centre for Longitudinal Studies
Bedford Group for Lifecourse and Statistical Studies
Institute of Education, University of London
Main features of the paper: SURVEYS: A CROSS-NATIONAL COMPARISON
1. Item non-response for income (as for Hawkes and Plewis)
2. Panel data
3. Cross-national comparisons (SOEP, Germany; HILDA, Australia; BHPS, GB)
4. Imputation as used in the three studies.
Income non-response at time t predicts income non- SURVEYS: A CROSS-NATIONAL COMPARISON
response at time t+1 (supported by Hawkes and
Income non-response at time t predicts attrition at
time t+1 (also supported by Hawkes and Plewis).
More generally, the literature suggests that the more
item non-response there is at time t in any
longitudinal study, the more likely is attrition at time
This suggests that it might be worth directing more
resources at these ‘frail’ respondents.
Predictors of income non-response (combining waves SURVEYS: A CROSS-NATIONAL COMPARISON
using (?) probits or logits):
A very strong effect of being self-employed: the self-
employed are very much less likely to report their
income (supported by Hawkes and Plewis), although
less so in Germany than in GB and Australia.
Is change in employment status associated with
change in response behaviour?
Two kinds of imputation methods are used: SURVEYS: A CROSS-NATIONAL COMPARISON
1. Predictive mean matching from a regression model in BHPS.
2. ‘Row and column’ imputation as set out by Little and Su (1989), in HILDA and SOEP.
The authors argue, on the basis of previous research, that the second method is the better of the two.
Both are single imputation methods, presumably SURVEYS: A CROSS-NATIONAL COMPARISON
devised to fill in holes in public release datasets.
However, most of the statistical literature now
favours multiple imputation in order properly to
represent the sampling variability induced by
The authors consider the effects of the imputation methods used for three issues:
1. Cross-sectional measures of inequality.
2. Longitudinal measures of income mobility.
3. Fixed effects wage regressions – are the fixed effects individuals or sweeps?
We collect panel data to measure and model change used for three issues:
and so we should perhaps focus on the effects of
imputation on change and on dynamic models.
The authors show that income mobility across used for three issues:
quintiles is considerably higher when imputed cases
are combined with observed or complete cases than
it is when using only the observed cases.
However, this difference emerges because there is
considerable mobility for the imputed cases and
some of this must be due to measurement error
generated by the imputations.
A difficulty here is that the authors use cross- used for three issues:
sectional imputation i.e. imputing an income value
for each sweep whereas the real interest is in
imputing mobility or change across sweeps.
Suppose we have a panel study with just two sweeps used for three issues:
with income measured in quintiles at each sweep
and with item non-response at each sweep.
We have three sets of information:
1. Cases with measured income at each sweep, located in the internal 25 cells of a five by five contingency table.
2. The marginal distribution for cases measured at
sweep one but not at sweep two.
3. The marginal distribution for cases measured at used for three issues:
sweep two but not at sweep one.
Little and Rubin (2002, Ch. 13) show how to use the used for three issues:
EM algorithm to estimate the contingency table for
all cases, both fully and partially classified, and this
approach (or a variant of it that accounts for the
ordering of the quintiles) might be more appropriate
for this particular question.
One of the interesting findings from the estimated used for three issues:
wage equations is that the effect of being self-
employed on wages is, for all three studies, more
positive once the imputed cases are introduced into
Concluding remarks used for three issues:
1. This is a very interesting and thought-provoking
2. It shows that imputation for missing income responses can alter substantive conclusions about, for example, income mobility.
3. BUT the single imputation methods currently used by these panel studies are not those most favoured in the statistical literature.
Concluding remarks used for three issues:
4. AND imputing levels and taking differences might not be the best way of imputing for change.
5. ALSO income non-response is just one facet of missing data and ideally needs to be considered along with unit non-response at the outset of the panel and attrition as the panel ages.
6. AS ALWAYS, SENSITIVITY ANALYSES ARE CRUCIAL.