1 / 25

Multiple Imputation for Handling Item Nonresponse in Environmental Data

This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this document are solely those of authors and EPA does not endo

maj
Download Presentation

Multiple Imputation for Handling Item Nonresponse in Environmental Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Multiple Imputation for Handling Item Nonresponse in Environmental Data Breda Munoz Ruben Smith Virginia Lesser

    2. This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this document are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.

    3. Outline Introduction Multiple imputation and Hierarchical Bayesian Mixed Model Illustration Summary Future research

    4. Introduction Researchers using environmental data face problems of missing data Missing observational unit (unit nonresponse) Few variables missing for an observational unit (item nonresponse) Causes of missing data: Failure of the measuring instruments Inaccessibility of the site Data lost or damaged

    5. Introduction Impact of missing data depends on: The missing data mechanism The fraction of complete cases The type of parameter the researcher intends to estimate Missing data mechanism Missing completely at random (MCAR) Missing at random Missing not at random or ignorable

    6. Introduction Missing at random (MAR) Missingness does not depend on the non observed response but depends on observed values and covariates A model can be formulated and incorporated into the analysis techniques to explain and account for the nonresponse mechanism Little and Rubin (2002); Lohr (2001); Lessler and Kalsbeek (2000)

    7. Introduction Results of data analysis on single imputation data do not reflect the missing-data uncertainty, or the consequence of imputation Schafer and Olsen (1998): analyses based on a single imputation may result: Small standard errors Smaller p-values

    8. Multiple Imputation Multiple imputation (MI) is a well known methodology for handling non-response Incorporates uncertainty of the missing data into the inference Replaces each missing data with several values from a distribution of likely values Generates m complete data sets, on which the same analysis procedure is performed Final inferences are combinations of individual ones (Rubin, 1987)

    9. Multiple Imputation Advantages of MI: Possibility of performing different analyses with the same collection of m complete data sets while accounting for the missing data problem Highly efficiency achieved for small values of m Data sets can be analyzed using standard techniques and software available for complete datasets (Schaffer and Olsen, 1998; Schaffer, 1997)

    10. Multiple Imputation Let a probabilistic sample Missing data occurred in n1 of the n sites Define response indicator: R(s) = 1 if the value of z(s) was observed at site s, R(s) = 0 otherwise

    11. Multiple Imputation Under MAR assumption: Imputations for are obtained from the posterior predictive distribution of the missing data: Valid inferences from MI: the imputation model should preserve the same relationship in the data that would be considered at the analysis stage (Schaffer(1997); Rubin (1996) )

    12. Multiple Imputation Note: the posterior of ? given the observed data is: is the observed data likelihood is some prior for ? (Schafer (1997) and Little and Rubin (2000))

    13. Hierarchical Bayesian Models

    14. Illustration Data: Oregon Stream Habitat Surveys Conducted every year from June through September Surveys are designed to assess all streams within the range of Coho salmon Target population: all streams located in watersheds at western Oregon, that drain into the Pacific Ocean south of the Columbia River

    15. Illustration Sites selected using Random Tessellation Stratified (RTS) design (Stevens 1997) Variable: Average unit gradient (represents the overall steepness of the stream channel within each habitat unit throughout the reach). Log(Gradient+0.001) is approximately normal

    16.

    17. Illustration

    18. Illustration ODFW habitat surveys-1998-2002 n= 647 observed n1= 75 (spawners surveys from year 2000 without habitat variables) Y(si)|Z(si) ~ independent N( µ, s2e I ) Z ~ MVN[0,s2z R(?)] where R(si,sj) =

    19. Illustration Parameter priors: ? , ? ~ Uniform (ai,bi) where i= ? , ?, s2Z, s2e ~ Inverse Gamma(ai,bi) where i=z, e Joint Posterior distribution:

    20. Illustration MCMC methods were used to draw samples from posterior and marginal distributions: Gibss sampler Metropolis-Hastings MCMC simulation was run for 15,000 iterations after a 10,000 burn-in period.

    23. Illustration Prediction at location s0 : Write expresssion here

    24. Future Research Implement MI under other distributions such as: Gamma, Poisson, Bernoulli Incorporate auxiliary variables into systematic part Explore MI with other methods of geostatistical analysis Explore imputation using the Posterior Predictive Distribution mean.

    25. Illustration

    26. Thanks to Phil Larson, Steve Jacobs, Kim Jones, Jeff Rodgers and Andy Talavere for providing data, interpretation and useful comments.

More Related