Biostatistics Case Studies Session 3: Missing Data in Longitudinal Studies Peter D. Christenson Biostatistician http://gcrc.humc.edu/Biostat
Case Study Hall S et al: A comparative study of Carvedilol, slow release Nifedipine, and Atenolol in the management of essential hypertension. J of Cardiovascular Pharmacology 1991;18(4)S35-38.
Case Study Outline Subjects randomized to one of 3 drugs for controlling hypertension: A: Carvedilol (new) B: Nifedipine (standard) C: Atenolol (standard) Blood pressure and HR measured at baseline and 4 post-treatment periods. Primary analysis is unclear, but changes over time in HR and bp are compared among the 3 groups.
Wanted: Use N=100 w/o LOCF Combine: Info on true 8 week change in 83 subjects. Info on baseline only in 17 subjects. Use week0-week8 correlation in 83 subjects. More generally: Suppose 9 subjects had only week 0 and 8 subjects had only week 8. Then, really 2 experiments, 1 paired (N=83) and 1 unpaired (N1=9 and N2=8). Combining involves weighting Δs from the 2 experiments. Does not impute (substitute) values for the 17 unknown values. Generalize further to >2 time periods and >1 treatment, etc.
Mixed Models Mixed models implement our need here. “Mixed” means combination of fixed effects (e.g., drugs; want info on those particular drugs) and random effects (e.g., centers or patients; not interested in the particular ones in the study). AKA multilevel models, hierarchical models. Very flexible, incorporate unequal patient variability, correlation, pairing, repeated values at multiple levels (e.g., sitting and standing dbp in Fig 2, or if subjects were clustered, say from the same family and genetics was an issue, etc), and data missing at random. More assumptions required than typical analyses.
Data Structure for Software Need: patient week dbp 1 0 97 1 2 101 1 4 88 1 6 89 1 8 86 2 0 109 2 2 72 etc Not: patient wk0 wk2 wk4 wk6 wk8 1 97 101 88 89 86 2 109 72 . . .
Software Need to use a mixed model module. Often, options are unclear. Use: SPSS Analyze > Mixed SAS proc mixed. Repeated measures modules with options for random factors do not typically handle missing data, e.g.: SPSS Analyze > GLM > Repeated > … Random SAS proc glm; model …; random …; are not in general OK, but will work with certain balanced patterns of missing data.
Mixed Models in SPSS Select Analyze > Mixed > Linear. First menu:
Mixed Models in SAS Select Solutions > Analysis > Analyst > Statistics > ANOVA > Mixed models Alternatively, typical code is: procmixed; class week patient; model dbp=week/ddfm=satterthwaite; lsmeans week/cl; estimate 'Week Diff' week 1 -1; repeated week/subject=patient type=un rcorr; title 'Mixed Model N=100+83 Unstructured'; run;
Model 1 Results Estimated Means: Standard Effect week Estimate Error week 0 103.04 0.7059 week 8 90.43 0.7749 Estimated Change: Standard Label Estimate Error DF t Value Pr > |t| Week Diff 12.6058 1.0441 95.6 12.07 <.0001 So, Δ = 12.61±1.04 incorporates 100 + 83 observations.
Group A: Baseline and Final dbp Update Is model appropriate? Depends on assumed covariance pattern.
Model 1 Covariance Pattern: Compound Symmetry Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col2 1 1.0000 0.008760 2 0.008760 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate CS patient 0.4366 Residual 49.3989 Output Interpretation Estimated Covariance Pattern: Week 0 8 0 (7.06)2 0.44 8 0.44 (7.06)2 (7.06)2= 49.3989 + 0.4366 Note that this model assumes that variability among subjects is the same at each week, and that there is a correlation between the weeks (estimated at 0.00876). But: Week 0 SD = 5.2 Week 8 SD = 8.8
Model 2 Covariance Pattern: Unstructured Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col2 1 1.0000 0.01129 2 0.01129 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) patient 27.1700 UN(2,1) patient 0.5169 UN(2,2) patient 77.2008 Output Interpretation Estimated Covariance Pattern: Week 0 8 0 (5.21)2 0.44 8 0.44 (8.79)2 (5.21)2= 27.17 This model allows different variability among subjects at each week, and a correlation between the weeks (estimated at 0.011). This better models the SDs: Week 0 SD = 5.2 Week 8 SD = 8.8
Model 3 Covariance: Heterogeneous Uncorrelated Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col2 1 1.0000 2 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) patient 27.1701 UN(2,1) patient 0 UN(2,2) patient 77.1998 Output Interpretation Estimated Covariance Pattern: Week 0 8 0 (5.21)2 0 8 0 (8.79)2 (5.21)2= 27.17 This model allows different variability among subjects at each week, but no correlation between the two weeks. Matches: Week 0 SD = 5.2 Week 8 SD = 8.8
Choice of Covariance Pattern Use likelihood ratio test to test whether a more complex model significantly improves fit of the data. Models must be “nested”. Is model 2 significantly better than model 1? Χ2 = 1230.2-1206.0 = 24.2 has Χ2 distribution with d.f.= difference in # of estimated parameters (here 3-2) if model 2 is not an improvement. P-value=Prob(Χ2 >24.2) <0.0001, so model 2 is needed. Final choice: model 3.
Model 3 Results Estimated Means: Standard Effect week Estimate Error DF week 0 103.04 0.5212 99 week 8 90.43 0.9644 82 Estimated Change: Standard Label Estimate Error DF t Value Pr > |t| Week Diff 12.6063 1.0963 128 11.50 <.0001 Thus, use Δ = 12.61±1.10 from 100 + 83 observations.
Conclusions for Group A Week 0 to Week 8 dbp Δ Last observation carried forward overestimates dbp at week 8. Essentially 0 correlation between residual week 0 and week 8 dbp. Use mixed model with heterogeneous uncorrelated covariance pattern. This mixed model is equivalent to a 2-sample t-test with unequal variance using Satterthwaite’s weighting. This would not happen if either (1) some subjects only had dbp at week 8, or (2) correlation was stronger between weeks 0 and 8, which usually happens.
Generalize: Group A with all 5 Time Periods Since LR = 3141.4 - 3111.7 = 30.7 is large for a Χ26 , there is substantial unstructured correlation over weeks.
Conclusions: Repeated Measures with Mixed Models Very useful for missing data. Requires more than usual assumptions. Mild deviations from assumed covariance pattern do not have a large influence. Software can be intimidating due to specifying many model assumptions, since the method is so general and flexible. May be difficult to apply unbiasedly in clinical trials where the primary analysis needs to be specifically detailed.