Missing Data in Research

Missing Data- Five Practical Guidelines R22013 Prakriti Sinha

Missing Data Levels Item-Level Missingness - Answering only j out of J possible items on a scale Construct-Level Missingness - Answering zero items on a construct (Entire scale ) Person-Level Missingness - Failure to return the survey by a person • Missing data levels are nested Item-level missingness can aggregate into construct-level missingness Construct-level missingness can aggregate into Person-level missingness • Choice of appropriate missing data technique can depend upon level of missingness

Missing Data Are Partly Unavoidable, and Partly Avoidable • Much missing data are avoidable • Personally, distributing surveys • Using identification numbers • Personalization of the survey invitation • University sponsorship of the survey • Giving advance notice • Missing data are a natural and unavoidable • Consequence of the ethical principle of respect for persons • Target population are allowed to autonomously opt out of the study

3 Missing Data Mechanisms Missing data can be missing- - Randomly - Systematically • MCAR: R miss Y is not related to X or Y miss Y • MAR: R miss Y is related to X, but is not related to Y after controlling for X • MNAR: R miss Y is related to Y

Missing Data Treatments Multiple Imputation Maximum Likelihood Sensitivity Analysis List Wise Deletion Pairwise Deletion Single Imputation

Missing Data Treatments- LISTWISE DELETION • Deleting the entire row (person- level) for whom any data are missing, then proceeding with the analysis • This procedure converts item-level and scale-level missingness into person-level missingness!

Guideline 1: Use All the Available Data • Listwise deletion • Compounds the problem of sample nonresponse • Often greatly reduces sample size and statistical power • Yields biased parameter estimates under systematic (MAR and MNAR) missingness • Target population of ‘‘individuals who fill out surveys completely” is not theoretically defensible Avoid outright !

Missing Data Treatments- SINGLE IMPUTATION • Single imputation techniques involve filling in each missing datum with a ‘‘good guess’’ as to what the missing datum should be. • Mean imputation (i.e., mean(across persons) underestimates variance and correlation • Hot deck imputation — using "donors" increases error—worse than regression imputation • Regression imputation — using predicted values underestimates variance and can bias the correlation

Guideline 2: Do Not Use Single Imputation • Single Imputation • First, most single imputation techniques are biased under MCAR • The inability to calculate accurate SEs for hypothesis testing • Creates Type I errors of inference Place a moratorium!

Guideline 3: Construct-Level Missingness: Use Maximum Likelihood or Multiple ImputationMissing Data Treatments Whenever 10% or More of The Respondent Sample Is Made Upof Construct-Level Partial Respondents Response rate = (n partial respondents + n full respondents) / n contacted

Missing Data Treatments- MULTIPLE IMPUTATION • Each single imputation contains some inaccuracy, so the imputations are performed multiple times and then aggregated in a way that accounts for the uncertainty of each imputation. Multiple Imputation (MI) — a 3-step process: • Step 1) Impute (or fill-in) missing values multiple times, to create multiple, partly imputed datasets. • Step 2) Run the analysis on each of these multiple, partly-imputed datasets. • Step 3) Combine these multiple results to get parameter estimates and standard errors.

Missing Data Treatments- MAXIMUM LIKELIHOOD • ML methods are acknowledged as mathematically complex. Maximum Likelihood direct estimation of parameters and standard errors by choosing estimates that maximize the probability of the observed data • There are two common ML missing data techniques: Full Information Maximum Likelihood (FIML) and the EM algorithm. FIML directly estimates parameters and provides accurate Standard Errors (SEs), while the EM algorithm calculates summary statistics for further analysis. • Auxiliary variables (i.e., variables usedfor imputation only not part of the theoretical model being tested) are easily incorporated into the EM algorithm. These variables can make an MNAR mechanism more similar to MAR.

Guideline 4: Item-Level Missingness—One Item Is Enough! Two Approaches for Handling Item-Level Missing Data: Listwise Deletion Cutoffs: This approach involves dropping participants from the analysis if they fail to respond to at least half of the items on a scale. It is a commonly taught practice but arbitrary and converts item-level missingness into construct-level missingness, which may lead to data loss. Mean Across Available Items: This approach suggests calculating an individual's scale score using only the available items they responded to. This method is sometimes referred to as "mean substitution across items." It avoids data loss but may introduce some reduction in reliability. Recommendation: The guideline recommends using the Mean Across Available Items method for handling item-level missing data, as it typically offers greater expected statistical power than listwise deletion cutoffs, even when only one item has been answered. Both methods may suffer from bias under MAR and MNAR mechanisms.

Guideline 5: Person-Level Missingness: If the Response Rate Is Below 30%, Report Systematic Nonresponse Parameters and Consider Conducting Sensitivity Analyses Researchers are encouraged to report response rates, systematic nonresponse parameters, and to conduct response rate sensitivity analyses to assess the potential direction and magnitude of missing data bias. • Report the overall response rate, calculated as the ratio of full respondents plus partial respondents to the total number of individuals contacted. • Report systematic nonresponse parameters (SNPs) if possible, which capture differences between respondents and nonrespondents on variables of interest in the study. • Conduct response rate sensitivity analyses by estimating the response rate– corrected correlations u

Decision tree for choosing missing data treatments. • To aid in the selection of appropriate missing data techniques to address item-level missingness, construct-level missingness, and person-level missingness.

Thank You Prakriti Sinha

Missing Data in Research

Missing Data in Research

Presentation Transcript

Missing Data in Randomized Control Trials

Handling Missing Data

MISSING DATA

Missing Data in Randomized Control Trials

Handling Missing Data

Analyzing Missing Data

Missing values problem in Data Mining

Managing missing data

Handling Missing Data

Missing Data in Clinical Trials

Missing Data

Missing Data

Missing Data

Handling Missing Data

Missing Data in NSQIP

Missing Data

Missing Data in Research Studies

Missing data

Handling Missing Data

Missing Data in Randomized Control Trials

Data Cleansing: Filling Missing Values in Data

Missing Data Mechanisms