Revealing the riddle of REML. Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney. Background. Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty Biometry 3 is a Third Year elective
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Revealing the riddle of REML
Mick O’Neill
Faculty of Agriculture, Food & Natural Resources, University of Sydney
The likelihood is the prior probability of obtaining the actual data in your sample
Each of these distributions involves at least one unknown parameter (probability, mean, standard deviation, …) which must be estimated from the data.
The likelihood is the prior probability of obtaining the actual data in your sample
Estimation is achieved by finding that parameter valuewhich maximises the likelihood (or equivalently the log-likelihood)
It is possible to partition the likelihood into two terms:
It is possible to partition the likelihood into two terms, in such a way that:
=
sn-1
sn
ML estimate
REML estimate
=
ANOVA:
Source of variation d.f. s.s. m.s. v.r. F pr.
Chick stratum
Diet 3 0.01847 0.00616 0.10 0.958
Residual 12 0.73230 0.06103
Total 15 0.75078
REML Variance Components Analysis:
Fixed model : Constant+Diet
Random model : Chick
Chick used as residual term
*** Residual variance model ***
Term Factor Model(order) Parameter Estimate S.e.
Chick Identity Sigma20.0610 0.02491
*** Wald tests for fixed effects ***
Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob
Diet 0.30 3 0.100.960
ANOVA:
Grand mean 3.129
Diet Diet_1 Diet_2 Diet_3 Diet_4
3.107 3.188 3.112 3.107
*** Standard errors of differences of means ***
Table Diet
rep. 4
d.f. 12
s.e.d. 0.1747
REML Variance Components Analysis:
*** Table of predicted means for Diet ***
Diet Diet_1 Diet_2 Diet_3 Diet_4
3.107 3.187 3.112 3.107
Standard error of differences: 0.1747
ANOVA:
Source of variation d.f. s.s. m.s. v.r. F pr.
Block stratum 5 5.410 1.082 0.29
Block.*Units* stratum
Spacing 4 125.661 31.415 8.50 <.001
Residual 20 73.919 3.696
REML Variance Components Analysis
(a) With Block + Spacing both fixed effects:
Term Factor Model(order) Parameter Estimate S.e.
Residual Identity Sigma2 3.696 1.169
Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob
Block 1.46 5 0.29 0.917
Spacing 34.00 4 8.50 <0.001
ANOVA:
Source of variation d.f. s.s. m.s. v.r. F pr.
Block stratum 5 5.410 1.082 0.29
Block.*Units* stratum
Spacing 4 125.661 31.415 8.50 <.001
Residual 20 73.919 3.696
REML Variance Components Analysis
(b) With Spacing fixed and Block random:
*** Estimated Variance Components ***
Random term Component S.e.
Block 0.000 BOUND
*** Residual variance model ***
Term Factor Model(order) Parameter Estimate S.e.
Residual Identity Sigma2 3.173 0.897
*** Wald tests for fixed effects ***
Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob
Spacing 39.60 4 9.90 <0.001
Estimated
Source Value
Block -0.5228
Error 3.6959
Yield of soybean = Overall mean + Block effect + Spacing effect
+ Error
Yield of soybean = Overall mean + Block effect + Spacing effect
+ Error
Y = Fixed effects + Random effects + Error term
Y = Xt + Zu + e
Y = Fixed effects + Random effects + Error term
Y = Xt + Zu + e
is a scaling factor, often set to 1
Furthermore, different choices for the variance matrices allow for :
Nested models can be compared using the change in deviance which is approximately c2 with df = change in number of parameters
For students:
For markers:
Minitab’s analysis
SourceDFAdj MS F P
Rabbit 3602.6 5.26 0.004
Error33114.5
Estimated
SourceTermSource Value
1Rabbit(2) + 9.0090 (1)Rabbit 54.18
2 Error(2) Error 114.48
Rabbit Mean
1 372.3
2 354.4
3 355.3
4 361.3these are sample means
GenStat’s Linear Mixed Models analysis
Random term Component S.e.
Rabbit 55.0 55.8
*** Residual variance model ***
Term Model(order) Parameter Estimate S.e.
Residual Identity Sigma2 114.4 28.2
Table of predicted means for Rabbit (these are BLUPs)
Rabbit 1 2 3 4
369.9 355.5 356.0 361.2
Standard error of differences: Average 4.613
Maximum 5.055
Minimum 4.133
Average variance of differences: 21.38
Deviance d.f.
215.22 34
Test H0:
Method: drop Rabbit as a random term
Deviance d.f.
221.21 35 for reduced model
215.22 34
Change in deviance = 6.0 with 1 df
P-value = 0.014
The variation in the widths of the dorsal shield of larvae of ticks found among rabbits differs significantly across rabbits (P = 0.014)
The variance among rabbits is estimated to be 55.0 ( 55.7) compared to the variance within rabbits, namely 114.4 ( 28.2)
Growth of a fungus (in cm) over time
Split plot without Greenhouse-Geisser adjustmment
(assumes equi-correlation structure among times)
Source of variation d.f. m.s. v.r. F pr.
Rep.Fungus stratum
Fungus 2 8.104 97.30 <.001
Residual 9 0.083 3.37
Rep.Fungus.Time stratum
Time 5 55.231 2233.21 <.001
Fungus.Time 10 0.933 37.71 <.001
Residual 45 0.025
Estimated stratum variances
Stratum variance d.f. variance component
Rep.Fungus 0.0833 9 0.0098
Rep.Fungus.Time 0.0247 45 0.0247
Split plot with Greenhouse-Geisser adjustmment
(tests equi-correlation structure among times)
Source of variation d.f. m.s. v.r. F pr.
Rep.Fungus stratum
Fungus 2 8.104 97.30 <.001
Residual 9 0.083 3.37
Rep.Fungus.Time stratum
Time 5 55.231 2233.21 <.001
Fungus.Time 10 0.933 37.71 <.001
Residual 45 0.025
(d.f. are multiplied by the correction factors before calculating F probabilities)
Box's tests for symmetry of the covariance matrix:
Chi-square 57.47 on 19 df: probability 0.000
F-test 2.93 on 19, 859 df: probability 0.000
Greenhouse-Geisser epsilon 0.3206
Split plot via REML – ignoring changing variances
Fixed model : Constant+Fungus+Time+Fungus.Time
Random model : Rep.Fungus+Rep.Fungus.Time
Estimated Variance Components
Random term Component S.e.
Rep.Fungus 0.00976 0.00660
Residual variance model
Term Model(order) Parameter Estimate S.e.
Rep.Fungus.Time Identity Sigma2 0.0247 0.0052
Deviance d.f.
-109.90 52
Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob
Fungus 194.60 2 97.30 <0.001
Time 11166.05 5 2233.21 <0.001
Fungus.Time 377.08 10 37.71 <0.001
Split plot via REML – accounting for changing variances (a)
Fixed model : Constant+Fungus+Time+Fungus.Time
Random model : Rep.Fungus+Rep.Fungus.Time
Estimated Variance Components
Random term Component S.e.
Rep.Fungus 0.01053 0.00539
Residual variance model
Term Model(order) Parameter Estimate S.e.
Rep.Fungus.Time Identity Sigma2 0.0082 .00428
Rep Identity - - -
Fungus Identity - - -
Time Diagonal d_1 1.000 FIXED
d_2 1.102 0.815
d_3 0.227 0.215
d_4 0.262 0.253
d_5 1.965 1.443
d_6 13.550 9.580
Split plot via REML – accounting for changing variances (b)
Fixed model : Constant+Fungus+Time+Fungus.Time
Random model : Rep.Fungus+Rep.Fungus.Time
Estimated Variance Components
Random term Component S.e.
Rep.Fungus 0.01053 0.00539
Residual variance model
Term Model(order) Parameter Estimate S.e.
Rep.Fungus.Time Identity Sigma2 1.000 FIXED
Rep Identity - - -
Fungus Identity - - -
Time Diagonal d_1 0.0082 0.0043
d_2 0.0091 0.0047
d_3 0.0019 0.0015
d_4 0.0022 0.0017
d_5 0.0162 0.0081
d_6 0.1116 0.0530
Split plot via REML – accounting for changing variances
and an AR(1) correlation structure
Fixed model : Constant+Fungus+Time+Fungus.Time
Random model : Rep.Fungus+Rep.Fungus.Time
Estimated Variance Components
Random term Component S.e.
Rep.Fungus 0.010616 0.005572
Residual variance model
Term Model(order) Parameter Estimate S.e.
Rep.Fungus.Time Identity Sigma2 0.0085 0.0045
Rep Identity - - -
Fungus Identity - - -
Time AR(1) hetphi_10.148 0.209
d_1 1.000 FIXED
d_2 1.202 0.895
d_3 0.260 0.249
d_4 0.264 0.266
d_5 1.829 1.347
d_6 13.560 9.620
Split plot via REML – accounting for changing variances
and an AR(1) correlation structure
Deviance d.f. Change d.f.
Same variance, uncorrelated -109.90 52
Different variances over time -151.03 47 41.13 5
+ AR(1) correlation structure -151.59 46 0.56 1
RCBD (fixed) fertilisers Potato yields (t/ha)
Source of variation d.f. m.s. v.r. F pr.
Block stratum 3 2.929 1.43
Block.Treatment stratum
Treatment 5 92.359 45.07 <.001
Residual 15 2.049
Treatment A B C D E F
17.55 25.60 27.67 25.94 30.51 30.63
*** Standard errors of differences of means ***
Table Treatment
rep. 4
d.f. 15
s.e.d. 1.012
REML
Random term Component S.e.
Block 0.395 0.500
Residual variance model
Term Factor Model Parameter Estimate S.e.
Y.X Sigma2 2.849 1.739
Y AR(1) phi_10.7054 0.2078
X AR(1) phi_1 -0.2508 0.3397
Deviance d.f.
36.54 14
Treatment A B C D E F
17.74 26.29 26.79 26.34 30.41 29.37
Standard error of differences: Average 0.7749
Maximum 0.8942
Minimum 0.6465
Average variance of differences: 0.6050