Mixed Models – Part 2

1 / 112

mixed models part 2 - PowerPoint PPT Presentation

Mixed Models – Part 2. Learning Outcomes – Part 2. The participant will learn: In what circumstances LS residuals provide direct information on how the model for the fixed effects is misspecified How and why to transform a GLS model into a LS model

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'mixed models part 2' - Mercy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning Outcomes – Part 2
• The participant will learn:
• In what circumstances LS residuals provide direct information on how the model for the fixed effects is misspecified
• How and why to transform a GLS model into a LS model
• The different types of residuals in mixed models and their uses
• SAS diagnostics for fixed effects and covariance parameters in mixed models
• How PRISM plots provide information about the covariance structure
• General strategies for building mixed models
• How penalised spline regression can be used as a nonparametric fitting method

Department of Statistics

TEXAS A&M UNIVERSITY

Choosing the models for the mean and the covariance structure

Three approaches to modeling the covariance
• Random effects covariance structures:- random intercepts and/or random slopes
• Covariance pattern models
• Unstructured covariance
5.4.1 Model-Building Guidelines

“Fitting linear mixed models implies that an appropriate mean structure as well as a covariance structure--i.e., both for random effects and error components--needs to be specified. These steps are not independent of each other.

First, unless robust inference is used, an appropriate covariance model is essential to obtain valid inferences for the parameters in the mean structure, which is usually of primary interest.

Once an appropriate mean structure Xiβ for E(Yi) has been selected, we use the ordinary least squares (OLS) method to estimate β, and we hereby ignore the fact that not all measurements are independent. It follows from the theory of generalized estimating equations (GEE) that this OLS estimator is consistent for β (Liang and Zeger, 1986).

A helpful tool for deciding which time-varying covariates should be included in the model is a plot of the OLS residual profiles versus time. When this plot shows constant variability over time, we assume stationarity and we do not include random effects other than intercepts. In cases where the variability varies over time and where there is still some remaining systematic structure in the residual profiles (i.e., where the between-subject variability is large in comparison to the overall variation), the following guidelines can be used to select one or more random effects additional to the random intercepts.”

Dmitrienko, A., Molenberghs, G. Chuang-Stein, C. & Offen, W. (2005), Analysis of Clinical Trials Using SAS – A Practical Guide, SAS, Cary NC

Department of Statistics

TEXAS A&M UNIVERSITY

A cautionary example

Interest Rate Example

Tryfos, P. (1998), Methods for Business Analysis and Forecasting: Text & Cases, Wiley, New York

Interest Rate Example

Next Steps:

Or

?

Interest Rate Example: Model with AR(1) errors

Generalized least squares fit by maximum likelihood

Model: InterestRate ~ LoansClosed + I(LoansClosed^2)

+ VacancyIndex

Correlation Structure: AR(1)

Formula: ~Month

Parameter estimate(s):

Phi

0.9568896

Coefficients:

Value Std.Error t-value p-value

(Intercept) 7.163999 0.4505311 15.901231 0.0000

LoansClosed -0.004723 0.0044814 -1.053932 0.3086

I(LoansClosed^2) 0.000009 0.0000285 0.298965 0.7691

VacancyIndex -0.079368 0.1349623 -0.588074 0.5652

What happened to the quadratic term in LoansClosed that is evident in the LS residual plots?

Interest Rate Example: Model with AR(1) errors

Generalized least squares fit by maximum likelihood

Model: InterestRate ~ LoansClosed + VacancyIndex

Correlation Structure: AR(1)

Formula: ~Month

Parameter estimate(s):

Phi

0.9572093

Coefficients:

Value Std.Error t-value p-value

(Intercept) 7.122990 0.4182065 17.032232 0.0000

LoansClosed -0.003432 0.0011940 -2.874452 0.0110

VacancyIndex -0.076340 0.1307842 -0.583710 0.5676

Interest Rate Example after transforming a model with AR(1) errors into a model with iid errors

Call:

lm(formula = ystar ~ xstar - 1)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

xstar(Intercept) 7.122990 0.418207 17.032 1.12e-11

xstarLoansClosed -0.003432 0.001194 -2.874 0.011

xstarVacancyIndex -0.076340 0.130784 -0.584 0.568

Generalized least squares fit by maximum likelihood

Coefficients:

Value Std.Error t-value p-value

(Intercept) 7.122990 0.4182065 17.032232 0.0000

LoansClosed -0.003432 0.0011940 -2.874452 0.0110

VacancyIndex -0.076340 0.1307842 -0.583710 0.5676

Interpreting residual plots

Li, K.C. and Duan, N. (1989), Regression analysis under link violation, Annals of Statistics, 17, 1009-1052.

Interpreting residual plots

Cook, R.D. and Weisberg, S. (1999), Graphs in statistical analysis: is the medium the message? The American Statistician, 53(1), 29-37.

Department of Statistics

TEXAS A&M UNIVERSITY

Random effects covariance structures

Investigators at the University of North Carolina Dental School followed the growth of 27 children (16 males, 11 females) from age 8 top age 14. Every two years they measured the distance between the pituitary and pterygomaxillary fissure, two points that are easily identified on x-ray exposures of the side of the head. These data are reported in Pothoff and Roy (1964) and analyzed in detail in Pinheiro and Bates (2000) and elsewhere.

Pinheiro, J.C. & Bates, D.M. (2000). Mixed Effects Models in S and S-PLUS, Springer, New York.

Pothoff, R.F. & Roy, S.N. (1964). A generalized multivariate analysis of variance model especially useful for growth curve problems, Biometrika 51, 313-326.

distance age Subject Sex

1 26.0 8 M01 Male

2 25.0 10 M01 Male

3 29.0 12 M01 Male

4 31.0 14 M01 Male

.

61 22.0 8 M16 Male

62 21.5 10 M16 Male

63 23.5 12 M16 Male

64 25.0 14 M16 Male

65 21.0 8 F01 Female

66 20.0 10 F01 Female

67 21.5 12 F01 Female

68 23.0 14 F01 Female

.

105 24.5 8 F11 Female

106 25.0 10 F11 Female

107 28.0 12 F11 Female

108 28.0 14 F11 Female

Random effects covariance structures: Orthodontic Growth Data – Plots of Y= Distance

Correlations

T8 T10 T12 T14

T8 1.000 0.626 0.711 0.600

T10 0.626 1.000 0.635 0.759

T12 0.711 0.635 1.000 0.795

T14 0.600 0.759 0.795 1.000

The correlations are relatively constant across time. This suggests that we fit ______ as the covariance structure.

Random intercepts model

Compound Symmetry = Random intercepts

Example Covariance Matrices

Compound Symmetry = Random intercepts

Random intercepts covariance structure: Orthodontic Growth Data – R output

Linear mixed-effects model fit by REML

Data: Orthodont

AIC BIC logLik

445.7572 461.6236 -216.8786

Random effects:

Formula: ~1 | Subject

(Intercept) Residual

StdDev: 1.816214 1.386382

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.340625 0.9813122 79 16.651810 0.0000

age 0.784375 0.0775011 79 10.120823 0.0000

SexFemale 1.032102 1.5374208 25 0.671321 0.5082

age:SexFemale -0.304830 0.1214209 79 -2.510520 0.0141

Random intercepts covariance structure: Orthodontic Growth Data: Fitted lines for each subject

Examples of “shrinkage” in random intercepts

Likelihood ratio tests

For models with the same fixed effects, the REML log-likelihoods can be used to compare nested models for the covariance. A formal statistical test is obtained by taking twice the difference in the respective maximized log-likelihoods and comparing it to a chi-squared distribution with degrees of freedom equal to the difference between the number of covariance parameters in the full and reduced models. This test is called the likelihood ratio test.

Fitzmaurice, G.M., Laird, N.N. & Ware, J.H. (2004). Applied Longitudinal Analysis. Wiley, New York (Chapter 4)

Random intercepts versus unstructured covariance: Orthodontic Growth Data

Conclusion:

Testing on the boundary of the parameter space

In general, when testing a null hypothesis that is on the boundary of the parameter space (e.g., the variance of a random effect equals zero), the usual null distribution for the likelihood ratio test is no longer a chi-squared distribution with degrees of freedom equal to the difference between the number of parameters in the full and reduced models; instead, the null distribution is a mixture of chi-squared distributions. For example, when comparing two nested models, one with q correlated random effects, the other with q+1 correlated random effects, the null distribution of the likelihood ratio test is a 50:50 mixture of chi-squared distributions with q and q+1 degrees of freedom.

Fitzmaurice, G.M., Laird, N.N. & Ware, J.H. (2004). Applied Longitudinal Analysis. Wiley, New York (Chapter 8)

Random intercepts covariance structure: Orthodontic Growth Data – R output

Linear mixed-effects model fit by REML

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.340625 0.9813122 79 16.651810 0.0000

age 0.784375 0.0775011 79 10.120823 0.0000

SexFemale 1.032102 1.5374208 25 0.671321 0.5082

age:SexFemale -0.304830 0.1214209 79 -2.510520 0.0141

lm(formula = distance ~ age * Sex, data = Orthodont)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.3406 1.4162 11.538 < 2e-16

age 0.7844 0.1262 6.217 1.07e-08

SexFemale 1.0321 2.2188 0.465 0.643

age:SexFemale -0.3048 0.1977 -1.542 0.126

Model df AIC BIC logLik Test L.Ratio p-value

RI 1 6 445.7572 461.6236 -216.8786

lm0 2 5 493.5591 506.7811 -241.7796 1 vs 2 49.80187 <.0001

Critical value (alpha=0.001) is 9.55

Conclusion:

Comparing models using AIC & BIC

“Often it is of interest to compare non-nested models for the covariance. To compare non-nested models, an … approach is the Akaike Information Criterion (AIC). According to the AIC, given a set of competing models for the covariance, one should select the model that minimizes

AIC = -2l + 2c

where l is the maximized REML log- likelihood and c is the number of covariance parameters.

Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2004). Applied Longitudinal Analysis. Wiley, New York (page 176)

“BIC extracts a very large penalty for the estimation of each additional covariance parameter. In general we do not recommend the use of BIC for covariance model selection as it entails a high risk of selecting a model that is too simple or parsimonious for the data at hand.”

Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2004). Applied Longitudinal Analysis. Wiley, New York (page 177)

Random intercepts covariance structure: Orthodontic Growth Data – Marginal & conditional residuals

Conclusions:

Random intercepts covariance structure: Orthodontic Growth Data – Mixed model diagnostics in SAS

Conclusions:

Random intercepts covariance structure: Orthodontic Growth Data – Mixed model diagnostics in SAS
Random intercepts covariance structure: Orthodontic Growth Data – Mixed model diagnostics in SAS
Random intercepts covariance structure: Orthodontic Growth Data – Mixed model diagnostics in SAS

Conclusions:

Random effects covariance structures: Orthodontic Growth Data – Plots of Marginal Residuals

Correlations

T8 T10 T12 T14

T8 1.000 0.560 0.660 0.522

T10 0.560 1.000 0.560 0.718

T12 0.660 0.560 1.000 0.728

T14 0.522 0.718 0.728 1.000

The marginal residuals are positively correlated.

Random effects covariance structures: Orthodontic Growth Data – Plots of Conditional Residuals

Correlations

T8 T10 T12 T14

T8 1.000 -0.326 -0.129 -0.600

T10 -0.326 1.000 -0.560 0.005

T12 -0.129 -0.560 1.000 -0.065

T14 -0.600 0.005 -0.065 1.000

In this case, the conditional residuals are generally NEGATIVELY correlated.

Orthodontic growth data after transforming a model with random intercepts into a regression model with iid errors: diagnostic LS plots

Conclusions:

Orthodontic growth data after transforming a model with random intercepts into a regression model with iid errors

Conclusions:

Orthodontic growth data after transforming a model with random intercepts into a regression model with iid errors: plots of standardized LS residuals

T8 T10 T12 T14

T8 1.000 -0.184 0.179 -0.186

T10 -0.184 1.000 -0.073 0.383

T12 0.179 -0.073 1.000 0.172

T14 -0.186 0.383 0.172 1.000

Pearson\'s product-moment correlation

data: T10 and T14

p-value = 0.04832

alternative hypothesis: true correlation

is not equal to 0

Conclusions:

Orthodontic Growth Data – Plots of Y= Distance for females

Correlations

T8 T10 T12 T14

T8 1.000 0.626 0.711 0.600

T10 0.626 1.000 0.635 0.759

T12 0.711 0.635 1.000 0.795

T14 0.600 0.759 0.795 1.000

FT8 FT10 FT12 FT14

FT8 1.000 0.830 0.862 0.841

FT10 0.830 1.000 0.895 0.879

FT12 0.862 0.895 1.000 0.948

FT14 0.841 0.879 0.948 1.000

Conclusions:

Orthodontic Growth Data – Plots of Y= Distance for males

Correlations

T8 T10 T12 T14

T8 1.000 0.626 0.711 0.600

T10 0.626 1.000 0.635 0.759

T12 0.711 0.635 1.000 0.795

T14 0.600 0.759 0.795 1.000

MT8 MT10 MT12 MT14

MT8 1.000 0.437 0.558 0.315

MT10 0.437 1.000 0.387 0.631

MT12 0.558 0.387 1.000 0.586

MT14 0.315 0.631 0.586 1.000

Conclusions:

Random intercepts covariance structure with variances differing with sex : Orthodontic Growth Data – R output

Linear mixed-effects model fit by REML

Data: Orthodont

AIC BIC logLik

429.2205 447.7312 -207.6102

Random effects:

Formula: ~1 | Subject

(Intercept) Residual

StdDev: 1.847570 1.669823

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Sex

Parameter estimates:

Male Female

1.0000000 0.4678944

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.340625 1.1450945 79 14.270111 0.0000

age 0.784375 0.0933459 79 8.402883 0.0000

SexFemale 1.032102 1.4039842 25 0.735124 0.4691

age:SexFemale -0.304830 0.1071828 79 -2.844016 0.0057

Random intercepts covariance structure with variances differing with sex: Orthodontic Growth Data – R output

Model df AIC BIC logLik Test L.Ratio p-value

RI 1 6 445.7572 461.6236 -216.8786

RIH 2 7 429.2205 447.7312 -207.6102 1 vs 2 18.53677 <.0001

Random intercepts covariance structure with variances differing with sex: Orthodontic Growth Data
Random intercepts covariance structure differing across sex: Orthodontic Growth Data – SAS output

Conclusions:

Random intercepts covariance structure with variances differing with sex: Orthodontic Growth Data – Marginal & conditional residuals

Conclusions:

Orthodontic Growth Data – Conditional residuals (RH side has the variances differing across sex)

Conclusions:

Random intercepts covariance structure differing across sex : Orthodontic Growth Data – Mixed model diagnostics in SAS

Conclusions:

Random intercepts covariance structure differing across sex: Orthodontic Growth Data – Mixed model diagnostics in SAS

Conclusions:

Orthodontic growth data after transforming a model with random intercepts covariance structure differing with sex into a regression model with iid errors: diagnostic LS plots

Conclusions:

Orthodontic growth data after transforming a model with random intercepts covariance structure differing with sex into a regression model with iid errors: diagnostic LS plots

Conclusions:

Orthodontic growth data after transforming a model with random intercepts covariance structure differing with sex into a regression model with iid errors: plots of standardized LS residuals

T8 T10 T12 T14

T8 1.000 -0.188 0.222 -0.083

T10 -0.188 1.000 -0.001 0.270

T12 0.222 -0.001 1.000 0.297

T14 -0.083 0.270 0.297 1.000

Conclusions:

Department of Statistics

TEXAS A&M UNIVERSITY

Choosing between random effects covariance structures and covariance pattern models

Three approaches to modeling the covariance
• Random effects covariance structures:- random intercepts and/or random slopes
• Covariance pattern models
• Unstructured covariance
Pig Weight Data

“… data provided by Dr Philip McCloud (Monash University Melbourne) on the weights of 48 pigs measured in nine successive weeks.”

Diggle,P.J., Heagerty, P., Liang, K-Y. & Zeger, S.L. (2002). Analysis of Longitudinal Data (2nd edn.), Oxford University Press, Oxford.

Pig Weight Data

Conclusions:

Pig Weight Data

Conclusions:

Pig Weight Data: Time plots of Y= Weight

Correlations

T1 T2 T3 T4 T5 T6 T7 T8 T9

T1 1.00 0.92 0.80 0.80 0.75 0.71 0.66 0.63 0.56

T2 0.92 1.00 0.91 0.91 0.88 0.84 0.78 0.71 0.66

T3 0.80 0.91 1.00 0.96 0.93 0.91 0.84 0.82 0.77

T4 0.80 0.91 0.96 1.00 0.96 0.93 0.87 0.83 0.79

T5 0.75 0.88 0.93 0.96 1.00 0.92 0.85 0.81 0.79

T6 0.71 0.84 0.91 0.93 0.92 1.00 0.96 0.93 0.89

T7 0.66 0.78 0.84 0.87 0.85 0.96 1.00 0.96 0.92

T8 0.63 0.71 0.82 0.83 0.81 0.93 0.96 1.00 0.97

T9 0.56 0.66 0.77 0.79 0.79 0.89 0.92 0.97 1.00

Conclusions:

Pig Weight Data: random intercept and slope model

… We therefore adopt as our working model for the weight Yij of the ith pig in the jth week:

Diggle, P.J., Heagerty, P., Liang, K-Y. & Zeger, S.L. (2002). Analysis of Longitudinal Data (2nd edn.), Oxford University Press, Oxford (page 77)

A & S (2005) fit a RIAS model with AR(2) rather than iid errors.

Alkhamisi, M.A. & Shukur, G. (2005). Bayesian analysis of a linear mixed model with AR(p) errors via MCMC, Journal of Applied Statistics, 32, 741-755.

Pig Weight Data – RI vs RIAS

Model df AIC BIC logLik Test L.Ratio p-value

mRI 1 4 2041.797 2058.052 -1016.8984

mRIAS 2 6 1752.871 1777.254 -870.4356 1 vs 2 292.9256 <.0001

Linear mixed-effects model fit by REML

AIC BIC logLik

2041.797 2058.052 -1016.898

Random effects:

Formula: ~-1 | Animal

(Intercept) Residual

StdDev: 3.891253 2.096356

Fixed effects: weight ~ Time

Value Std.Error DF t-value p-value

(Intercept) 19.355613 0.6031390 383 32.09146 0

Time 6.209896 0.0390633 383 158.97012 0

Linear mixed-effects model fit by REML

AIC BIC logLik

1752.871 1777.254 -870.4356

Random effects:

Formula: ~Time | Animal

StdDev Corr

(Intercept) 2.6431920 (Intr)

Time 0.6164379 -0.063

Residual 1.2636572

Fixed effects: weight ~ Time

Value Std.Error DF t-value p-value

(Intercept) 19.355613 0.4038676 383 47.92564 0

Time 6.209896 0.0920382 383 67.47085 0

Conclusions:

Pig Weight Data – RIAS vs Unstructured

Model df AIC BIC logLik Test L.Ratio p-value

mRIAS 1 6 1752.871 1777.254 -870.4356

mFULLHet 2 47 1635.943 1826.941 -770.9717 1 vs 2 198.9278 <.0001

Generalized least squares fit by REML

Model: weight ~ Time

Data: pigweights

AIC BIC logLik

1635.943 1826.941 -770.9717

Correlation Structure: General

Formula: ~1 | Animal

Parameter estimate(s):

Correlation:

1 2 3 4 5 6 7 8

2 0.894

3 0.722 0.895

4 0.767 0.910 0.946

5 0.742 0.879 0.889 0.956

6 0.694 0.836 0.876 0.930 0.923

7 0.650 0.774 0.806 0.863 0.857 0.963

8 0.602 0.720 0.812 0.835 0.810 0.927 0.953

9 0.546 0.669 0.753 0.789 0.788 0.891 0.917 0.968

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Parameter estimates:

1 2 3 4 5 6 7 8 9

1.000000 1.133915 1.514700 1.524495 1.825750 1.798021 2.005047 2.212073 2.561561

Coefficients:

Value Std.Error t-value p-value

(Intercept) 19.072855 0.3292651 57.92552 0

Time 6.174367 0.0791560 78.00249 0

Conclusions:

Pig Weight Data – RIASAR2Het vs Unstructured

Conclusions & next steps:

Linear mixed-effects model fit by REML

Data: pigweights

AIC BIC logLik

1628.055 1693.076 -798.0276

Random effects:

Formula: ~Time | Animal

StdDev Corr

(Intercept) 2.0328867 (Intr)

Time 0.5644216 0.135

Residual 1.2570683

Correlation Structure: ARMA(2,0)

Formula: ~1 | Animal

Parameter estimate(s):

Phi1 Phi2

0.7987402 -0.1113745

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Parameter estimates:

1 2 3 4 5 6 7 8 9

1.000000 1.111492 1.609214 1.155950 1.739209 1.032130 1.519083 1.470896 2.007803

Fixed effects: weight ~ Time

Value Std.Error DF t-value p-value

(Intercept) 18.651097 0.3546391 383 52.59176 0

Time 6.339774 0.0910021 383 69.66622 0

Model df AIC BIC logLik Test L.Ratio p-value

mRIAS 1 6 1752.871 1777.254 -870.4356

mRIASHet 2 14 1712.108 1769.001 -842.0539 1 vs 2 56.76340 <.0001

mRIASAR2Het.REML 3 16 1628.055 1693.076 -798.0276 2 vs 3 88.05265 <.0001

mFULLHet 4 47 1635.943 1826.941 -770.9717 3 vs 4 54.11179 0.0063

Pig Weight Data – after transforming a straight line fit with unstructured covariance into a regression model with iid errors: diagnostic LS plots

Conclusions:

Pig Weight Data – after transforming a straight line fit with unstructured covariance into a regression model with iid errors: diagnostic LS plots

Formula:

StanRes1 ~ s(Timestar, k = 7)

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -6.793e-05 4.740e-02 -0.001 0.999

Approximate significance of smooth terms:

edf Est.rank F p-value

s(Timestar) 4.738 6 3.291 0.00355 **

---

Deviance explained = 4.22%

GCV score = 0.98375

Scale est. = 0.97068 n = 432

Conclusions & next steps:

Pig Weight Data – spline fit with unstructured covariance

Generalized least squares fit by maximum likelihood

Model: weight ~ Time + TimeM2Plus + TimeM3Plus + TimeM4Plus + TimeM5Plus + TimeM6Plus + TimeM7Plus + TimeM8Plus

Data: pigweights

AIC BIC logLik

1600.992 1820.687 -746.4958

Correlation Structure: General

Formula: ~1 | Animal

Parameter estimate(s):

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Coefficients:

Value Std.Error t-value p-value

(Intercept) 18.260417 0.3801332 48.03689 0.0000

Time 6.760417 0.1623941 41.62968 0.0000

TimeM2Plus 0.322917 0.2294237 1.40751 0.1600

TimeM3Plus -1.552083 0.2925803 -5.30481 0.0000

TimeM4Plus 0.229167 0.2432417 0.94214 0.3467

TimeM5Plus 0.531250 0.3931523 1.35126 0.1773

TimeM6Plus -0.281250 0.2646046 -1.06291 0.2884

TimeM7Plus 0.833333 0.2974937 2.80118 0.0053

TimeM8Plus -0.927083 0.2757124 -3.36250 0.0008

Generalized least squares fit by maximum likelihood

Model df AIC BIC logLik Test L.Ratio p-value

mFULLHet.ML 1 47 1632.303 1823.520 -769.1517

mFULLSplineHet.ML 2 54 1600.992 1820.687 -746.4958 1 vs 2 45.31183 <.0001

Conclusions & next steps:

… we recommend the use of REML for estimation of the variance covariance matrix. … The REML log-likelihood should also be used to compare nested models for the covariance. However, the construction of likelihood ratio tests comparing nested models for the mean should always be based on the ML and not the REML log-likelihood.

Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2004). Applied Longitudinal Analysis. Wiley, New York (page 102)

Model df AIC BIC logLik Test L.Ratio p-value

mFULLHet.ML 1 47 1632.303 1823.520 -769.1517

mFULLSpline378Het.ML 2 50 1600.125 1803.546 -750.0624 1 vs 2 38.17859 <.0001

mFULLSplineHet.ML 3 54 1600.992 1820.687 -746.4958 2 vs 3 7.13324 0.129

Generalized least squares fit by maximum likelihood

Model: weight ~ Time + TimeM3Plus + TimeM7Plus + TimeM8Plus

Data: pigweights

AIC BIC logLik

1600.125 1803.546 -750.0624

Correlation Structure: General

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Coefficients:

Value Std.Error t-value p-value

(Intercept) 18.256905 0.3533667 51.66560 0.0000

Time 6.820640 0.1371208 49.74183 0.0000

TimeM3Plus -0.981702 0.1375617 -7.13645 0.0000

TimeM7Plus 0.828701 0.2096348 3.95307 0.0001

TimeM8Plus -0.767967 0.2485657 -3.08959 0.0021

Conclusions:

Generalized least squares fit by REML

Model: weight ~ Time + TimeM3Plus + TimeM7Plus + TimeM8Plus

Data: pigweights

AIC BIC logLik

1608.513 1811.352 -754.2563

Correlation Structure: General

Formula: ~1 | Animal

Parameter estimate(s):

Correlation:

1 2 3 4 5 6 7 8

2 0.916

3 0.800 0.909

4 0.796 0.909 0.955

5 0.749 0.881 0.924 0.963

6 0.703 0.832 0.906 0.929 0.918

7 0.651 0.771 0.844 0.863 0.849 0.964

8 0.620 0.706 0.816 0.821 0.802 0.927 0.959

9 0.553 0.658 0.769 0.779 0.778 0.889 0.917 0.970

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Parameter estimates:

1 2 3 4 5 6 7 8 9

1.000000 1.130608 1.434699 1.512733 1.835017 1.803844 2.019490 2.206989 2.575075

Coefficients:

Value Std.Error t-value p-value

(Intercept) 18.256899 0.3550324 51.42319 0.0000

Time 6.820637 0.1377675 49.50832 0.0000

TimeM3Plus -0.981701 0.1382101 -7.10296 0.0000

TimeM7Plus 0.828708 0.2106256 3.93451 0.0001

TimeM8Plus -0.767975 0.2497383 -3.07512 0.0022

Conclusions & next steps:

Pig Weight Data – after transforming the spline378 fit with unstructured covariance into a regression model with iid errors: diagnostic LS plots

Conclusions:

Pig Weight Data – after transforming the spline378 fit with unstructured covariance into a regression model with iid errors: diagnostic LS plots

Formula:

StanRes1 ~ s(LSFits, k = 7)

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.01075 0.04822 0.223 0.824

Approximate significance of smooth terms:

edf Est.rank F p-value

s(LSFits) 1 1 0.036 0.849

Deviance explained = 0.00844%

GCV score = 1.0091

Scale est. = 1.0045 n = 432

Conclusions & next steps:

Pig weight data
• Mean structure is adequately modeled by regression splines with knots at Time = 3, 7 & 8
• Can we use a random effects covariance structure and/or a covariance pattern model rather than an unstructured covariance?
Pig weight data

Model df AIC BIC logLik Test L.Ratio p-value

mRIASSpline378.REML 1 21 1663.697 1748.889 -810.8483

mRIASAR2Spline378.REML 2 23 1636.153 1729.459 -795.0768 1 vs 2 31.54306 <.0001

mRIASAR2HetSpline378.REML 3 31 1605.189 1730.949 -771.5945 2 vs 3 46.96459 <.0001

mFULLSpline378Het.REML 4 50 1608.513 1811.352 -754.2563 3 vs 4 34.67638 0.0153

Pig weight data: RIASAR2HetSpline378

Linear mixed-effects model fit by REML

Data: pigweights

AIC BIC logLik

1605.189 1730.949 -771.5945

Random effects:

Formula: ~Time + TimeM3Plus + TimeM7Plus + TimeM8Plus | Animal

StdDev Corr

(Intercept) 2.1834406 (Intr) Time TmM3Pl TmM7Pl

Time 0.8315105 -0.269

TimeM3Plus 0.4659166 0.779 -0.790

TimeM7Plus 1.2592576 -0.024 -0.214 0.267

TimeM8Plus 0.7618044 -0.424 0.603 -0.751 -0.836

Residual 1.3902067

Correlation Structure: ARMA(2,0)

Formula: ~1 | Animal

Parameter estimate(s):

Phi1 Phi2

0.6231344 0.2935163

Variance function:

Structure: Different standard deviations per stratum

Formula: ~1 | Time

Parameter estimates:

1 2 3 4 5 6 7 8 9

1.000000 1.341819 1.690955 1.763115 2.300776 1.547082 1.111084 1.334961 1.997972

Fixed effects: weight ~ Time + TimeM3Plus + TimeM7Plus + TimeM8Plus

Value Std.Error DF t-value p-value

(Intercept) 18.236529 0.3594763 380 50.73082 0.0000

Time 6.956690 0.1484038 380 46.87677 0.0000

TimeM3Plus -1.058136 0.1453124 380 -7.28180 0.0000

TimeM7Plus 0.974005 0.2287388 380 4.25815 0.0000

TimeM8Plus -0.846191 0.2610472 380 -3.24152 0.0013

Department of Statistics

TEXAS A&M UNIVERSITY

Scatter plot smoothing using regression splines: choosing the location of the knots versus penalized regression splines

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Figure 3.8 is based on knots at every 12.5 meters of the range. However, for these data, it appears that there is now too much flexibility; the plot is heavily “overfitted”, meaning that the fitted function is following small, apparently random fluctuations in the data as well as the main features. One could try pruning knots to overcome this problem. For example, if the knots at range = 612.5, 650, 662.5 and 687.5 are deleted then we obtain the fit in Figure 3.9.

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Penalised spline regression

An alternative to pruning knots is to retain all the knots but to constrain their influence. This is the idea behind penalised spline regression.

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Penalized splines as BLUPs of a mixed model

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

REML & CV smoothing parameter choices

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Simulation results re REML & CV smoothing parameter choices

“In summary, REML will produce smoother fits than GCV ... This means that in the fitted curves REML will have more bias but less variance than GCV. However, the two methods of selecting the smoothing parameter will be about equally accurate in terms of MSE; they trade of bias and variance differently but achieve nearly the same value of MSE.”

Ruppert, D., Wand, M.P. & Carroll, R.J. (2003), Semiparametric Regession, Cambridge University Press, Cambridge

Pig Weight Data – after transforming a straight line fit with unstructured covariance into a regression model with iid errors: diagnostic LS plots

Formula:

StanRes1 ~ s(Timestar, k = 7)

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -6.793e-05 4.740e-02 -0.001 0.999

Approximate significance of smooth terms:

edf Est.rank F p-value

s(Timestar) 4.738 6 3.291 0.00355 **

---