1 / 62

Lecture 3 Empirical Bayes and Proc Mixed

Lecture 3 Empirical Bayes and Proc Mixed. Ziad Taib Biostatistics, AZ April 24, 2009. Outline of the lecture. Reminder Inference for the random effects Proc Mixed. 1. Reminder. V i. 2. Inference for the Random Effects - Empirical Bayes Inference. Comments.

irina
Download Presentation

Lecture 3 Empirical Bayes and Proc Mixed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3Empirical Bayes and Proc Mixed Ziad Taib Biostatistics, AZ April 24, 2009 Name, department

  2. Outline of the lecture • Reminder • Inference for the random effects • Proc Mixed Name, department

  3. 1. Reminder Vi Name, department

  4. 2. Inference for the Random Effects - Empirical Bayes Inference Name, department

  5. Name, department

  6. Name, department

  7. Name, department

  8. Comments • The above EB estimate of the random effect can be obtained using a set of equations • It can be shown that using the EB estimate lead to Best Linear Unbiased Prediction of linear combination of the form: • When trying to predict the response of an individual, we can use: and we see that the observed data are shrunken towards the prior average profile. Name, department

  9. 3. Statistical software Name, department

  10. Software (cont’d) • SAS – SPSS – BMDP/5v – ML3 – HLM – Splus – R can handle correlated data but some are more restricted than others. • Most packages offer a choice between ML and REML and optimisation is often based on Newton-Raphson, the EM algorithm or Fisher scoring. • The user has to specify a model for the mean response that is linear in the fixed effects and to specify a covariance structure. The user can select a full parameterisation of the covariance structure (unstructured) or choose among given covariance structures. • The covariance structure is also influenced by the inclusion of random effects and their covariance structure. Name, department

  11. Software (cont’d) • Output often includes: • history of optimisation iterations • estimates of fixed effects • covariance parameters with standard errors • estimates of user specified contrasts • Graphics is often limited but can be done in another software Name, department

  12. SAS PROC MIXED and Repeated Measures • PROC MIXED of SAS offers greater flexibility for the modelling of repeated measures data than PROC GLM. (Firstly, the procedure provides a mechanism for modelling the covariance structure associated with the repeated measures. Secondly, it can handle some forms of missing data without discarding an entire subject’s-worth of data. Thirdly, it has some capability to handle the situation when each subject may be measured at different times and time intervals.) • In PROC GLM, repeated measures are handled in a multivariate framework and it requires a multivariate view of the data. PROC MIXED, on the other hand, requires a univariate or stacked-data view of the data. In other words, there is only a single response variable. The repeated information, including all of the information about the subjects, is contained in other variables. Proc GLMassumes that the covariance matrix meets a sphericity assumption compound symmetry. Name, department

  13. SAS PROC MIXED • Proc mixed was designed to handle mixed models. It has a large choice of covariance structures (unstructured, random effects, autoregressive, Diggle etc) • PROC MIXED can be used not only to estimate the fixed parameters, but also the covariance parameters. • By default, PROC MIXED estimates the covariance parameters using the method of restricted maximum likelihood (REML). • PROC MIXED provides empirical Bayes estimates. • Separate analyses for separate groups can be run using the BY statement. • Approximate F tests for class variables are obtained using Wald’s test. • All components of the output can be saved as a SAS data set for further manipulation using other internal (SAS) or external procedures. Name, department

  14. PROC MIXED: Syntax Proc PROC MIXED < options > ; BY variables ; CLASS variables ; ID variables ; MODEL dependent = < fixed-effects > < / options > ; RANDOM random-effects < / options > ; REPEATED < repeated-effect > < / options > ; PARMS (value-list) ... < / options > ; CONTRAST 'label' < fixed-effect values ... >                                    < | random-effect values ... > , ... < / options > ; ESTIMATE 'label' < fixed-effect values ... >                                    < | random-effect values ... >< / options > ; LSMEANS fixed-effects < / options > ; MAKE 'table' OUT=SAS-data-set ; Name, department

  15. Data structure of Proc Mixed • Consider the example where arm strength is measured on 8 patients at 3 different times and where patients have been randomized to one of 2 treatment groups. The multivariate view associated with e.g. PROC GLM code: would look like below Name, department

  16. For analysis of this data set using PROC MIXED, the univariate or stacked-data view will be required. The univariate view below was obtained by Proc Transpose: Name, department

  17. Name, department

  18. Name, department

  19. Name, department

  20. Name, department

  21. Name, department

  22. Name, department

  23. Name, department

  24. PROC MIXED data=prostate method=REML asycov asycorr covtest ic; CLASS id group timeclss ; MODEL lnpsa = group age group*time age*time group*time2 age*time2 / noint solution ddfm=satterth covb chisq; ID id time ; RANDOM intercept time time2 /type=un subject=id g gcorr v vcorr solution; REPEATED timeclss / type=simple subject=id r rcorr ; CONTRAST‘Final model' age*time 1, age*time2 1, group*time2 1 0 0 0, group*time2 0 1 0 0, group*time2 0 0 1 –1 /chisq; ESTIMATE ‘Diff L/R-BPH, t=5yr’ group 0 –4 4 0 group*time 0 -2 2 0, group*time2 0 0 1 0, / cl alpha=0.05 divisor=4chisq; MAKE ‘solutionR’ out=randeff; RUN; Proc Name, department

  25. Proc mixed • Proc mixed invokes the mixed procedure. • Method= specifies the estimation methods (ML. REML. MIVQUE0) • Asycov and Asycorr can be used for printing the asymptotic covariance and correlation matrices for the marginal model • Covtest prints the asymptotic standard errors and associated Wald tests for the variance components • ic calculates some information criteria • Class specifies which variables are considered as factors. Name, department

  26. Proc mixed • Model specifies the model (i.e response and fixed effects Xi). Intercept included by default. • Solution prints estimates of the fixed effects in the model together with standard errors, t-statistics and p-values for significance. • Covb gives the whole covariance matrix for the estimates. • ddfm= specifies the number of degrees of freedomin the t- and F-approximations. One of many options is Satterthwaite. • Chisq is used to make SAS include Wld tests next to the default t- and F-tests for all effects specified in the model. Name, department

  27. Proc mixed • Id In general, SAS uses the same order as the original data but it does not hurt to have an extra column helping identifying the records and the subjects. This is nice to have e.g. when using predmeans or predicted to get predicted values. • Random specifies the random effects (Zi). Notice that random intercept is not default. • Solution needed to calculate empirical Bayes estimates • Subject= id • G, gcorr, v, vcorr print correlation matrices D and Vi. Default is first subject but number of subjects can be specified.- Name, department

  28. Proc mixed • Repeated used to specify the Si. The repeated effects must be classification variables. • Type= specifies the structure of Si. Simple meabns independence. • r and rcorr print residual covariance, Si , and correlation matrices. • Contrast eeallows testing hypothesis of the form • Several contrasts can be specified and thereby we can run several tests at the same time. A label is needed in single quotes as well as the linear combinations (the rows in L). F-test is default but the Wald test can be run using the chisq option. Name, department

  29. Proc mixed • Estimate permits the estimation of one or several linear combinations of the fixed effects. A label is needed in single quotes as well. Very similar to contrast but output also includes confidence intervals. • Use the option cl alpha = 0.05 when you require an t-type test with a = 0.05 • Make is used to convert parts of the output into a sas data set. In later versions Make is replaced by ODS. Name, department

  30. Modelling the Covariance Structure Using the RANDOM and REPEATED Statements in PROC MIXED Measures on different individuals are independent, so covariance needs attention only with measures on the same individuals. The covariance structure refers to variances at individual times and to correlation between measures at different times on the same individual. There are basically two aspects of the correlation. • First, two measures on the same individual are correlated simply because they share common contributions from that individual. This is due to variation between indivduals. • Second, measures on the same individual close in time are often more highly correlated than measures far apart in time. This is covariation within indivduals. . Usually, when using PROC MIXED, the variation between indivduals is specified by the RANDOM statement, and covariation within indivduals is specified by the REPEATED statement Name, department

  31. PROC MIXED fits many different structures (some are listed here). Note also that a particular structure may be fit using more than one “TYPE” designation, and with combinations of the RANDOM and REPEATED statements. Name, department

  32. Confusing Proc mixed? • The simple answer to why SAS's PROC MIXED can seem so confusing is that it's so powerful, but there's more to it than that. Early on, many guides to PROC MIXED presented an example of fitting a compound symmetry model to a repeated measures study in which subjects (ID) are randomized to one of many treatments (TREAT) and then measured at multiple time points (PERIOD). The command language to analyze these data can be written as proc mixed; class id treat period; model y=treat period treat*period; repeat period/sub=id(treat) type=cs; • or proc mixed; class id treat period; model y=treat period treat*period; random id(treat); Name, department

  33. Because both sets of command language produce the correct analysis, this immediately raises confusion over the roles of the repeated and random statements. In order to sort this out, the underlying mathematics must be reviewed. Once the reason for the equivalence is understood, the purposes of the repeated and random statements will be clear, cf. the following: http://www.jerrydallal.com/LHSP/mixedq.htm Name, department

  34. Summary: In Proc Mixed, the mixed model is specified by means of a number of statements like CLASS, MODEL, RANDOM and REPEATED. • The CLASS statement identifies the classification variables (for example, gender, person, age, etc.). • The MODEL statement specifies the model’s fixed effects equation,Xiβ. Thus, the design matrix Xi is defined and the model’s intercept is included by default. • The RANDOM statement isused to specify random effects and the form of covariance matrix D. (Useful options: SOLUTION: print random effects solution). • The REPEATED statement models the intra-individual variation and includes the structure of Si=Cov(ei), where Siis a block diagonal matrix for each subject. (If the REPEATED statement is not included it is assumed that Si=σ2I). • LSMEANS Calculates least squares mean estimates of specified fixed effects. Name, department

  35. The rat data proc mixed data=rat method=reml; class id group; model y = t group*t / solution; random intercept t / type=un subject=id ; run; Name, department

  36. ? Name, department

  37. Name, department

  38. Results Using the option nobound Name, department

  39. Non convergence or non positive definiteness can be indications of negative variance components. Usually Proc mixed would not allow that to happen. But using the option nobound in Proc mixed will result in a new set of estimates where d22 is negative.Consider the fitted variance function: Hence, the negative variance component suggests a negative curvature in the variance function. Name, department

  40. Name, department

  41. Name, department

  42. The prostate data Age could not be matched Name, department

  43. SAS code Name, department

  44. ML and REML estimates: Name, department

  45. ML and REML estimates (cont’d) Name, department

  46. Fitted average profiles Name, department

  47. Name, department

  48. In practice, histograms and scatter plots of certain components of the estimate of bican be used to detect model deviations or subjects with exceptional evolution over time. Name, department

  49. Histograms and scatter plots Correlations between components of estimate of b Name, department

  50. Name, department

More Related