slide1
Download
Skip this Video
Download Presentation
María García , Chandra Erdman, and Ben Klemens

Loading in 2 Seconds...

play fullscreen
1 / 13

María García , Chandra Erdman, and Ben Klemens - PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) . María García , Chandra Erdman, and Ben Klemens. Outline. Background on the Survey of Income and Program Participation (SIPP) Methods for missing data imputation - Randomized Hot deck

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' María García , Chandra Erdman, and Ben Klemens' - viet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP)

MaríaGarcía, Chandra Erdman, and Ben Klemens

outline
Outline
  • Background on the Survey of Income and Program Participation (SIPP)
  • Methods for missing data imputation

- Randomized Hot deck

- SRMI

  • Simulation study
  • Evaluation
  • Concluding remarks
background on the sipp
Background on the SIPP
  • Longitudinal survey, data collected in panels with interviews at set frequencies (2- 4 years)
  • Demographic characteristics, assets, liabilities, labor force participation, earnings, etc.
  • Provide comprehensive information about income and program participation
  • Evaluate federal, state, and local programs and provide measures of economic well-being
background on the sipp1
Background on the SIPP
  • Hot deck for most missing data imputation
  • Recent major redesign
  • Research ways to improve data processing.
    • Explore alternative imputation methods
    • Focus on missing monthly job-level earnings (twelve variables)
    • Sequential Regression Multivariate Imputation (SRMI, Raghunathan et al., 2001)
sequential regression multivariate imputation srmi
Sequential Regression Multivariate Imputation (SRMI)
  • Data matrix
  • Each column
  • Imputations are based on univariatedistributions
  • Instead of drawing from a joint distribution for variables, draw times from the univariate conditional distribution for each variable,
slide6
SRMI

Impute missing values sequentially conditioning on observed and imputed variables

  • Regression model
  • Impute sequentially for each variable:

1. Draw from )

2. Draw from |; )

simulation study
Simulation Study
  • SRMI

-R package mi (Su et al., 2011)

- Job-level earnings indicator – logistic regression

- Monthly earnings indicator imputed to positive – impute corresponding missing earnings using SRMI

  • Hot deck

- TEA’s randomized hot deck (Klemens, 2012)

  • Multiple imputation
simulation study1
Simulation Study
  • Simulation data

- Complete 2004 SIPP panel data – “true”

- Randomly select multiple sets of 10% of observations for which the job-level earnings are to be set to missing (100 repetitions)

  • Explanatory variables

- Age, sex, race, education, occupation, industry, firm size, job-type, hours, lead, lag, etc.

between imputation within imputation and total variance of mean monthly earnings for some months
Between-Imputation, Within-Imputation, and Total Variance of Mean Monthly Earnings for Some Months

No hay nada

concluding remarks
Concluding Remarks
  • Results show the model-based approach to imputation is a feasible alternative to hot deck for imputing missing values in the SIPP and should be further explore.
  • Model can incorporate more information than the hot-deck without depleting the donor pool.
  • Possibility to use any available auxiliary information. (e.g. administrative data)
  • Set up the model in a multiple imputation environment so we can estimate variances.
  • Disadvantage of using package mi for SRMI: computationally intensive
ad