# Revealing the riddle of REML - PowerPoint PPT Presentation

1 / 46

Revealing the riddle of REML. Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney. Background. Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty Biometry 3 is a Third Year elective

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Revealing the riddle of REML

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Revealing the riddle of REML

Mick O’Neill

Faculty of Agriculture, Food & Natural Resources, University of Sydney

### Background

• Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty

• Biometry 3 is a Third Year elective

• Biometry 4 is (still) a possible major

• All students are now expected to design and analyse their fourth year experiments with little or no help from the Biometry Unit

### Third year Biometry students can:

• Design and analyse multi-strata factorial experiments (split-plots, strip-plots)

• Perform binomial & ordinal logistic regression, Poisson regression, …

• Analyse repeated measures data using REML

### Step 1. What is Maximum Likelihood?

• The likelihood is the prior probability of obtaining the actual data in your sample

• This requires you to assume that the data follow some distribution, typically:

• Binomial or Poisson for count data

• Normal or LogNormal for continuous data

### Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Each of these distributions involves at least one unknown parameter (probability, mean, standard deviation, …) which must be estimated from the data.

### Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Estimation is achieved by finding that parameter valuewhich maximises the likelihood (or equivalently the log-likelihood)

### Example 1. Binomial data

• Guess p = P(seed germinates)

• Evaluate LogL

• Maximise LogL by varying p

### Example 2. Normal data

• Guess m and s

• Evaluate LogL

• Maximise LogL by varying m and s

### Step 2. What is REML?

It is possible to partition the likelihood into two terms:

• a likelihood that involves m (as well as s2)

• and

• a residual likelihood that involves only s2

### Step 2. What is REML?

It is possible to partition the likelihood into two terms, in such a way that:

• the first likelihood can be maximised to estimate m (and its solution does not depend on the value of s2)

• the residual likelihood can be maximised to estimates2 REML estimate

=

### Solutions

• ML estimate of variance is

• REML estimate is

• In each case the estimate of m isthe sample mean

sn-1

sn

ML estimate

REML estimate

=

### ANOVA vs REML

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Chick stratum

Diet 3 0.01847 0.00616 0.10 0.958

Residual 12 0.73230 0.06103

Total 15 0.75078

REML Variance Components Analysis:

Fixed model : Constant+Diet

Random model : Chick

Chick used as residual term

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Chick Identity Sigma20.0610 0.02491

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Diet 0.30 3 0.100.960

ANOVA:

• ***** Tables of means *****

Grand mean 3.129

Diet Diet_1 Diet_2 Diet_3 Diet_4

3.107 3.188 3.112 3.107

*** Standard errors of differences of means ***

Table Diet

rep. 4

d.f. 12

s.e.d. 0.1747

REML Variance Components Analysis:

*** Table of predicted means for Diet ***

Diet Diet_1 Diet_2 Diet_3 Diet_4

3.107 3.187 3.112 3.107

Standard error of differences: 0.1747

### Example 4a. One-way (in randomised blocks) – fixed treatments

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(a) With Block + Spacing both fixed effects:

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.696 1.169

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Block 1.46 5 0.29 0.917

Spacing 34.00 4 8.50 <0.001

### Random blocks, fixed treatments

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(b) With Spacing fixed and Block random:

*** Estimated Variance Components ***

Random term Component S.e.

Block 0.000 BOUND

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.173 0.897

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Spacing 39.60 4 9.90 <0.001

Estimated

Source Value

Block -0.5228

Error 3.6959

### Model for RCBD

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

• Overall mean and Spacing effects are fixedeffects

• Block effect is a random term

• Error is a random term

### General Linear Mixed Model

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

• The random effects can be correlated

• The error term can be correlated

• The random effects are uncorrelated with the error term

### General Linear Mixed Model

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

is a scaling factor, often set to 1

• REML is used as the default to estimate to variance and covariance parameters of the model

• The algorithm does not depend on the data being balanced

Furthermore, different choices for the variance matrices allow for :

• an appropriate repeated measures analysis for normal data

• an appropriate spatial analysis for field trials

Nested models can be compared using the change in deviance which is approximately c2 with df = change in number of parameters

For students:

For markers:

### Example 6. Use of devianceWidths (in mm) of the dorsal shield of larvae of ticks found on 4 rabbits

Minitab’s analysis

Rabbit 3602.6 5.26 0.004

Error33114.5

Estimated

SourceTermSource Value

1Rabbit(2) + 9.0090 (1)Rabbit 54.18

2 Error(2) Error 114.48

Rabbit Mean

1 372.3

2 354.4

3 355.3

4 361.3these are sample means

GenStat’s Linear Mixed Models analysis

Random term Component S.e.

Rabbit 55.0 55.8

*** Residual variance model ***

Term Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 114.4 28.2

Table of predicted means for Rabbit (these are BLUPs)

Rabbit 1 2 3 4

369.9 355.5 356.0 361.2

Standard error of differences: Average 4.613

Maximum 5.055

Minimum 4.133

Average variance of differences: 21.38

Deviance d.f.

215.22 34

Test H0:

Method: drop Rabbit as a random term

Deviance d.f.

221.21 35 for reduced model

215.22 34

Change in deviance = 6.0 with 1 df

P-value = 0.014

The variation in the widths of the dorsal shield of larvae of ticks found among rabbits differs significantly across rabbits (P = 0.014)

The variance among rabbits is estimated to be 55.0 ( 55.7) compared to the variance within rabbits, namely 114.4 ( 28.2)

### Example 7 - Repeated Measures

Growth of a fungus (in cm) over time

### Growth of fungus

(assumes equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

Estimated stratum variances

Stratum variance d.f. variance component

Rep.Fungus 0.0833 9 0.0098

Rep.Fungus.Time 0.0247 45 0.0247

(tests equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

(d.f. are multiplied by the correction factors before calculating F probabilities)

Box's tests for symmetry of the covariance matrix:

Chi-square 57.47 on 19 df: probability 0.000

F-test 2.93 on 19, 859 df: probability 0.000

Greenhouse-Geisser epsilon 0.3206

Split plot via REML – ignoring changing variances

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.00976 0.00660

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0247 0.0052

Deviance d.f.

-109.90 52

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Fungus 194.60 2 97.30 <0.001

Time 11166.05 5 2233.21 <0.001

Fungus.Time 377.08 10 37.71 <0.001

Split plot via REML – accounting for changing variances (a)

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0082 .00428

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 1.000 FIXED

d_2 1.102 0.815

d_3 0.227 0.215

d_4 0.262 0.253

d_5 1.965 1.443

d_6 13.550 9.580

Split plot via REML – accounting for changing variances (b)

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 1.000 FIXED

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 0.0082 0.0043

d_2 0.0091 0.0047

d_3 0.0019 0.0015

d_4 0.0022 0.0017

d_5 0.0162 0.0081

d_6 0.1116 0.0530

Split plot via REML – accounting for changing variances

and an AR(1) correlation structure

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.010616 0.005572

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0085 0.0045

Rep Identity - - -

Fungus Identity - - -

Time AR(1) hetphi_10.148 0.209

d_1 1.000 FIXED

d_2 1.202 0.895

d_3 0.260 0.249

d_4 0.264 0.266

d_5 1.829 1.347

d_6 13.560 9.620

Split plot via REML – accounting for changing variances

and an AR(1) correlation structure

Deviance d.f. Change d.f.

Same variance, uncorrelated -109.90 52

Different variances over time -151.03 47 41.13 5

+ AR(1) correlation structure -151.59 46 0.56 1

### Example 8 – Spatial analysis

RCBD (fixed) fertilisers Potato yields (t/ha)

Source of variation d.f. m.s. v.r. F pr.

Block stratum 3 2.929 1.43

Block.Treatment stratum

Treatment 5 92.359 45.07 <.001

Residual 15 2.049

Treatment A B C D E F

17.55 25.60 27.67 25.94 30.51 30.63

*** Standard errors of differences of means ***

Table Treatment

rep. 4

d.f. 15

s.e.d. 1.012

### Contour plot of residuals

REML

Random term Component S.e.

Block 0.395 0.500

Residual variance model

Term Factor Model Parameter Estimate S.e.

Y.X Sigma2 2.849 1.739

Y AR(1) phi_10.7054 0.2078

X AR(1) phi_1 -0.2508 0.3397

Deviance d.f.

36.54 14

Treatment A B C D E F

17.74 26.29 26.79 26.34 30.41 29.37

Standard error of differences: Average 0.7749

Maximum 0.8942

Minimum 0.6465

Average variance of differences: 0.6050