Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation ...
Download
1 / 13

María García , Chandra Erdman, and Ben Klemens - PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) . María García , Chandra Erdman, and Ben Klemens. Outline. Background on the Survey of Income and Program Participation (SIPP) Methods for missing data imputation - Randomized Hot deck

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' María García , Chandra Erdman, and Ben Klemens' - viet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP)

MaríaGarcía, Chandra Erdman, and Ben Klemens


Outline
Outline Survey of Income and Program Participation (SIPP)

  • Background on the Survey of Income and Program Participation (SIPP)

  • Methods for missing data imputation

    - Randomized Hot deck

    - SRMI

  • Simulation study

  • Evaluation

  • Concluding remarks


Background on the sipp
Background on the SIPP Survey of Income and Program Participation (SIPP)

  • Longitudinal survey, data collected in panels with interviews at set frequencies (2- 4 years)

  • Demographic characteristics, assets, liabilities, labor force participation, earnings, etc.

  • Provide comprehensive information about income and program participation

  • Evaluate federal, state, and local programs and provide measures of economic well-being


Background on the sipp1
Background on the SIPP Survey of Income and Program Participation (SIPP)

  • Hot deck for most missing data imputation

  • Recent major redesign

  • Research ways to improve data processing.

    • Explore alternative imputation methods

    • Focus on missing monthly job-level earnings (twelve variables)

    • Sequential Regression Multivariate Imputation (SRMI, Raghunathan et al., 2001)


Sequential regression multivariate imputation srmi
Sequential Regression Multivariate Imputation (SRMI) Survey of Income and Program Participation (SIPP)

  • Data matrix

  • Each column

  • Imputations are based on univariatedistributions

  • Instead of drawing from a joint distribution for variables, draw times from the univariate conditional distribution for each variable,


SRMI Survey of Income and Program Participation (SIPP)

Impute missing values sequentially conditioning on observed and imputed variables

  • Regression model

  • Impute sequentially for each variable:

    1. Draw from )

    2. Draw from |; )


Simulation study
Simulation Study Survey of Income and Program Participation (SIPP)

  • SRMI

    -R package mi (Su et al., 2011)

    - Job-level earnings indicator – logistic regression

    - Monthly earnings indicator imputed to positive – impute corresponding missing earnings using SRMI

  • Hot deck

    - TEA’s randomized hot deck (Klemens, 2012)

  • Multiple imputation


Simulation study1
Simulation Study Survey of Income and Program Participation (SIPP)

  • Simulation data

    - Complete 2004 SIPP panel data – “true”

    - Randomly select multiple sets of 10% of observations for which the job-level earnings are to be set to missing (100 repetitions)

  • Explanatory variables

    - Age, sex, race, education, occupation, industry, firm size, job-type, hours, lead, lag, etc.


Average difference in rmse srmi hot deck
Average Difference in RMSE (SRMI – Hot Deck) Survey of Income and Program Participation (SIPP)

No hay nada


Between imputation within imputation and total variance of mean monthly earnings for some months
Between-Imputation, Within-Imputation, and Total Variance of Mean Monthly Earnings for Some Months

No hay nada


Rmse of mean monthly earnings
RMSE of Mean Monthly Earnings Mean Monthly Earnings for Some Months

No hay nada


Concluding remarks
Concluding Remarks Mean Monthly Earnings for Some Months

  • Results show the model-based approach to imputation is a feasible alternative to hot deck for imputing missing values in the SIPP and should be further explore.

  • Model can incorporate more information than the hot-deck without depleting the donor pool.

  • Possibility to use any available auxiliary information. (e.g. administrative data)

  • Set up the model in a multiple imputation environment so we can estimate variances.

  • Disadvantage of using package mi for SRMI: computationally intensive


Thank you! Mean Monthly Earnings for Some Months

[email protected]


ad