CPMP/EWP/1776/99: PtC on Missing Data

CPMP/EWP/1776/99: PtC on Missing Data Ferran.Torres@uab.es

Evolución de los sujetos Ferran.Torres@uab.es

Datos faltantes (missing data)(1) • ¿Qué son los datos faltantes? ¡¡¡¡¡ Casillas vacías en los CRDs!!! • Viola el principio de la estricto principio de la ITT • La posibles causas son, por ejemplo : • Pérdida de seguimiento • Fracaso o éxito terapéutico • Acontecimiento adverso • Traslado del sujeto • No todas las razones de abandono están relacionadas con el tratamiento Ferran.Torres@uab.es

Datos faltantes (missing data) (2) • Afectando a : • Solo un dato • Varios datos en una visita • Toda una visita • Varias visitas • Toda una variable • Todas las visitas tras la inclusión Ferran.Torres@uab.es

Datos faltantes (missing data) (3) • Por qué son un problema? Potencial fuente de sesgos en el análisis • Tanto mayor cuanto mayor la proporción de datos afectados • Tanto más sesgo cuanto menos aleatorios • Tanta más interferencia cuanto más relacionados con el tratamiento • Impide la ITT Ferran.Torres@uab.es

EJEMPLOS Ferran.Torres@uab.es

Ejemplo: Descripción de poblaciones (1) Distribución de pacientes : Patients withdrawing before treatment Patients without Baseline VA • No Major Protocol Violation • E.g., Cataract • E.g., Only a Baseline VA Ferran.Torres@uab.es

Ejemplo 2: Incorrecto uso de poblaciones (1) Diseño • Cirugía vs Tratamiento Médico en estenosis carotidea bilateral (Sackket et al., 1985) • Variable principal: Número de pacientes que presenten TIA, ACV o muerte • Distribución de los pacientes: • Pacientes randomizados: 167 • Tratamiento quirúrgico: 94 • Tratamiento médico: 73 • Pacientes que no completaron el estudio debido a ACV en las fases iniciales de hospitalización: • Tratamiento quirúrgico: 15 pacientes • Tratamiento médico: 01 pacientes Ferran.Torres@uab.es

Ejemplo 2: Incorrecto uso de poblaciones (2) Primer análisis que se realiza : • Población Por Protocolo (PP): Pacientes que hayan completado el estudio • Análisis • Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54% • Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74% • Reducción del riesgo: 27%, p = 0.02 Ferran.Torres@uab.es

Ejemplo 2: Incorrecto uso de poblaciones (3) El análisis definitivo queda de la siguiente forma : • Población Intención de Tratar (ITT): Todos los pacientes randomizados • Análisis • Tratamiento quirúrgico: 58 / 94 = 62% • Tratamiento médico: 54 / 73 = 74% • Reducción del riesgo: 18%, p = 0.09(PP: 27%, p = 0.02) Conclusiones:  La población correcta de análisis es la ITT  El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico Ferran.Torres@uab.es

Relación de los valores faltantes con1) Tratamiento2) Resultado Ferran.Torres@uab.es

Ferran.Torres@uab.es

Tipos de Missing Ferran.Torres@uab.es

MCAR • Missing completely at random • La probabilidad de obtener un missing es completamente independiente de: • Valores observados: • Variables basales, otras mediciones de la misma variable... • Valores no observados o missing • Ejemplo: Cambio de ubicación geográfica Ferran.Torres@uab.es

MAR • Missing at random • La probabilidad de obtener un missing depende: • Sí: Valores observados: • No: Valores no observados o missing • Ejemplo: Sujetos con peor puntuación basal abandonan el estudio independientemente del resultado Ferran.Torres@uab.es

Non-Ignorable • La probabilidad de obtener un missing depende: • Valores no observados o missing • Ejemplo: malas o excelentes respuestas cursan con una mayor tasa de abandonos Ferran.Torres@uab.es

Manejo de los valores faltantes Ferran.Torres@uab.es

General Strategies • Complete-case analysis • “Weigthing methods” • Imputation methods • Analysing data as incomplete • Other methods Ferran.Torres@uab.es

Complete-case analysis • Analyse only subjects with complete data • Restrict analysis to those subjects with no missing data on variables of interest: • Also called ADO (Available Data Only) • Assumes in-complete cases are like complete cases. • Gives unbiased estimates if the reduced sample resulting from list-wise deletion is a random subsample of the original sample (MCAR). Ferran.Torres@uab.es

Complete-case analysis • Disadvantages: • Ignores possible systematic differences between complete cases and in-complete cases. • Loss of power. Standard Errors will generally be larger in the reduced sample because less information is utilized. • Get biased estimates if the reduced sample is NOT a random sub-sample of the original sample. • Against the ITT principle Ferran.Torres@uab.es

“Weigthing methods” (Sometimes considered as a form of imputation) • To constuct weigths for incomplete cases: • Each patient belongs to a subgroup in which all subjects have the same characteristics • A proportion within each subgroup are destined to complete the study • Heyting el al. • Robins et al. Ferran.Torres@uab.es

Randomización Inicio del tratamiento Datos faltantes : métodos de tratamiento (2) Sujetos con valores missing en la variable de eficacia Ferran.Torres@uab.es

Randomización Inicio del tratamiento Datos faltantes : métodos de tratamiento (3) Se aplica el método LOCF (Last Observation Carried Forward) Ferran.Torres@uab.es

Randomización Inicio del tratamiento Datos faltantes : métodos de tratamiento (4) Se aplica el método BOCF (Basal Observation Carried Forward) Ferran.Torres@uab.es

Ex: LOCF & lineal extrapolation lineal > Worse 36 32 28 24- 20 16 12 8 4 Lineal Regresion Bias Adas-Cog LOCF < Better 0 2 4 6 8 10 12 14 16 18 Time (months) Ferran.Torres@uab.es

Ex: Early drop-out due to AE > Worse 36 32 28 24- 20 16 12 8 4 Bias: Favours Active Placebo Adas-Cog Active < Better 0 2 4 6 8 10 12 14 16 18 Time (months) Ferran.Torres@uab.es

Ex: Early drop-out due to lack of Efficacy > Worse 36 32 28 24- 20 16 12 8 4 Bias: Favours Placebo Placebo Adas-Cog Active < Better 0 2 4 6 8 10 12 14 16 18 Time (months) Ferran.Torres@uab.es

Ferran.Torres@uab.es

A A B A A A A A B B A A Drop-outs and missing data ≠ Frecuencies Last Visit Visit 1 Visit 2 Baseline RND Ferran.Torres@uab.es

A A B B A A B B B A Drop-outs and missing data ≠ Timing Last Visit Visit 1 Visit 2 Baseline RND Ferran.Torres@uab.es

Imputation methods • LOCF and variants • Bias: • depending on the amount and timing of drop-outs: • Ej: The conditions under study has a worsening course • Conservative: • Drop-outs beacuse of lack of efficacy in the control group • Anticonservative: • Drop-outs beacuse of intolerance in the test group • Otros: interpolación, extrapolación Ferran.Torres@uab.es

Adas-Cog 36 32 28 24- 20 16 12 8 4 0 2 4 6 8 10 12 14 16 18 Time month Ejemplo: falta el resultado de Adas-cog en alguno de los tiempos Imputación por regresión Ferran.Torres@uab.es

Imputation methods • Worst case analysis: • Impute: • The worst response to the test • The best response to the control • Ultraconservative. Increases the variability. • Robustness of results: • Second approach: “Sensitivity analysis” • Lower bound of efficacy Ferran.Torres@uab.es

Group Means • Continuous variable: • group mean derived from a grouping variable • Categorical – ordinal variable: • Mode • If no unique mode: • Nominal: a value will be randomly selected • Ordinal: the ‘middle’ category or a value is randomly chosen from the middle two (even case) Ferran.Torres@uab.es

Predicted Mean • Continuous or ordinal variables: • Least-squares multiple regression algorithm to impute the most likely value • Binary or categorical variable: • a discriminant method is applied to impute the most likely value. Ferran.Torres@uab.es

Imputation Class methods • Imputed values from responders that are similar with respect to a set of auxiliary variables. • Clinical experience • Statistical methods: Hot-Decking • Respondents and non-respondents are sorted into a number of imputation subsets according to a user-specified set of covariates. • An imputation sub-set comprises cases with the same values as those of the user-specified covariates. • Missing values are then replaced with values taken from matching respondents. • Options: • The first respondent’s value (similar in time) • A respondent’s randomly selectedvalue Ferran.Torres@uab.es

Some problems in Single Imputation • Mean Estimation • Replace missing data with the mean of non-missing values. • Standard deviation and standard errors are underestimated (no variation in the imputed values). • Hot-deck Imputation • Stratify and sort by key covariates, replace missing data from another record in the same strata. • Underestimation of standard errors can be a problem. • Predict missing values from Regression • Impute each independent variable on the basis of other independent variables in model. • Produces biased estimates. • Disadvantage: • In general, Single Imputation results in the sample size being over-estimated with the variance and standard errors being underestimated. Ferran.Torres@uab.es

Mean imputation Ferran.Torres@uab.es

Simple Hot Deck Ferran.Torres@uab.es

Regression methods Ferran.Torres@uab.es

CPMP/EWP/1776/99: PtC on Missing Data