1 / 46

# Revealing the riddle of REML - PowerPoint PPT Presentation

Revealing the riddle of REML. Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney. Background. Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty Biometry 3 is a Third Year elective

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Revealing the riddle of REML' - shae

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Revealing the riddle of REML

Mick O’Neill

Faculty of Agriculture, Food & Natural Resources, University of Sydney

• Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty

• Biometry 3 is a Third Year elective

• Biometry 4 is (still) a possible major

• All students are now expected to design and analyse their fourth year experiments with little or no help from the Biometry Unit

• Design and analyse multi-strata factorial experiments (split-plots, strip-plots)

• Perform binomial & ordinal logistic regression, Poisson regression, …

• Analyse repeated measures data using REML

Step 1. What is Maximum Likelihood?

• The likelihood is the prior probability of obtaining the actual data in your sample

• This requires you to assume that the data follow some distribution, typically:

• Binomial or Poisson for count data

• Normal or LogNormal for continuous data

Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Each of these distributions involves at least one unknown parameter (probability, mean, standard deviation, …) which must be estimated from the data.

Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Estimation is achieved by finding that parameter valuewhich maximises the likelihood (or equivalently the log-likelihood)

• Guess p = P(seed germinates)

• Evaluate LogL

• Maximise LogL by varying p

• Guess m and s

• Evaluate LogL

• Maximise LogL by varying m and s

It is possible to partition the likelihood into two terms:

• a likelihood that involves m (as well as s2)

• and

• a residual likelihood that involves only s2

It is possible to partition the likelihood into two terms, in such a way that:

• the first likelihood can be maximised to estimate m (and its solution does not depend on the value of s2)

• the residual likelihood can be maximised to estimates2 REML estimate

How?

• ML estimate of variance is

• REML estimate is

• In each case the estimate of m isthe sample mean

sn-1

sn

ML estimate

REML estimate

Extensions

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Chick stratum

Diet 3 0.01847 0.00616 0.10 0.958

Residual 12 0.73230 0.06103

Total 15 0.75078

REML Variance Components Analysis:

Fixed model : Constant+Diet

Random model : Chick

Chick used as residual term

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Chick Identity Sigma20.0610 0.02491

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Diet 0.30 3 0.100.960

• ***** Tables of means *****

Grand mean 3.129

Diet Diet_1 Diet_2 Diet_3 Diet_4

3.107 3.188 3.112 3.107

*** Standard errors of differences of means ***

Table Diet

rep. 4

d.f. 12

s.e.d. 0.1747

REML Variance Components Analysis:

*** Table of predicted means for Diet ***

Diet Diet_1 Diet_2 Diet_3 Diet_4

3.107 3.187 3.112 3.107

Standard error of differences: 0.1747

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(a) With Block + Spacing both fixed effects:

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.696 1.169

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Block 1.46 5 0.29 0.917

Spacing 34.00 4 8.50 <0.001

Random blocks, fixed treatments treatments

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(b) With Spacing fixed and Block random:

*** Estimated Variance Components ***

Random term Component S.e.

Block 0.000 BOUND

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.173 0.897

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Spacing 39.60 4 9.90 <0.001

Estimated

Source Value

Block -0.5228

Error 3.6959

Model for RCBD treatments

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

• Overall mean and Spacing effects are fixedeffects

• Block effect is a random term

• Error is a random term

General Linear Mixed Model treatments

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

• The random effects can be correlated

• The error term can be correlated

• The random effects are uncorrelated with the error term

General Linear Mixed Model treatments

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

is a scaling factor, often set to 1

• an appropriate repeated measures analysis for normal data

• an appropriate spatial analysis for field trials

Nested models can be compared using the change in deviance which is approximately c2 with df = change in number of parameters

For students: allow for :

For markers:

Example 6. Use of deviance allow for :Widths (in mm) of the dorsal shield of larvae of ticks found on 4 rabbits

Minitab’s analysis allow for :

Source DF Adj MS F P

Rabbit 3 602.6 5.26 0.004

Error 33 114.5

Estimated

Source Term Source Value

1 Rabbit (2) + 9.0090 (1) Rabbit 54.18

2 Error (2) Error 114.48

Rabbit Mean

1 372.3

2 354.4

3 355.3

4 361.3 these are sample means

GenStat’s Linear Mixed Models analysis allow for :

Random term Component S.e.

Rabbit 55.0 55.8

*** Residual variance model ***

Term Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 114.4 28.2

Table of predicted means for Rabbit (these are BLUPs)

Rabbit 1 2 3 4

369.9 355.5 356.0 361.2

Standard error of differences: Average 4.613

Maximum 5.055

Minimum 4.133

Average variance of differences: 21.38

Deviance d.f.

215.22 34

Test H allow for :0:

Method: drop Rabbit as a random term

Deviance d.f.

221.21 35 for reduced model

215.22 34

Change in deviance = 6.0 with 1 df

P-value = 0.014

The variation in the widths of the dorsal shield of larvae of ticks found among rabbits differs significantly across rabbits (P = 0.014)

The variance among rabbits is estimated to be 55.0 ( 55.7) compared to the variance within rabbits, namely 114.4 ( 28.2)

Example 7 - Repeated Measures allow for :

Growth of a fungus (in cm) over time

Growth of fungus allow for :

Split plot allow for :without Greenhouse-Geisser adjustmment

(assumes equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

Estimated stratum variances

Stratum variance d.f. variance component

Rep.Fungus 0.0833 9 0.0098

Rep.Fungus.Time 0.0247 45 0.0247

Split plot allow for :with Greenhouse-Geisser adjustmment

(tests equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

(d.f. are multiplied by the correction factors before calculating F probabilities)

Box's tests for symmetry of the covariance matrix:

Chi-square 57.47 on 19 df: probability 0.000

F-test 2.93 on 19, 859 df: probability 0.000

Greenhouse-Geisser epsilon 0.3206

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.00976 0.00660

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0247 0.0052

Deviance d.f.

-109.90 52

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Fungus 194.60 2 97.30 <0.001

Time 11166.05 5 2233.21 <0.001

Fungus.Time 377.08 10 37.71 <0.001

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0082 .00428

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 1.000 FIXED

d_2 1.102 0.815

d_3 0.227 0.215

d_4 0.262 0.253

d_5 1.965 1.443

d_6 13.550 9.580

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 1.000 FIXED

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 0.0082 0.0043

d_2 0.0091 0.0047

d_3 0.0019 0.0015

d_4 0.0022 0.0017

d_5 0.0162 0.0081

d_6 0.1116 0.0530

and an AR(1) correlation structure

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.010616 0.005572

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0085 0.0045

Rep Identity - - -

Fungus Identity - - -

Time AR(1) hetphi_10.148 0.209

d_1 1.000 FIXED

d_2 1.202 0.895

d_3 0.260 0.249

d_4 0.264 0.266

d_5 1.829 1.347

d_6 13.560 9.620

and an AR(1) correlation structure

Deviance d.f. Change d.f.

Same variance, uncorrelated -109.90 52

Different variances over time -151.03 47 41.13 5

+ AR(1) correlation structure -151.59 46 0.56 1

RCBD (fixed) fertilisers Potato yields (t/ha)

Block stratum 3 2.929 1.43

Block.Treatment stratum

Treatment 5 92.359 45.07 <.001

Residual 15 2.049

Treatment A B C D E F

17.55 25.60 27.67 25.94 30.51 30.63

*** Standard errors of differences of means ***

Table Treatment

rep. 4

d.f. 15

s.e.d. 1.012

REML (b)

Random term Component S.e.

Block 0.395 0.500

Residual variance model

Term Factor Model Parameter Estimate S.e.

Y.X Sigma2 2.849 1.739

Y AR(1) phi_10.7054 0.2078

X AR(1) phi_1 -0.2508 0.3397

Deviance d.f.

36.54 14

Treatment A B C D E F

17.74 26.29 26.79 26.34 30.41 29.37

Standard error of differences: Average 0.7749

Maximum 0.8942

Minimum 0.6465

Average variance of differences: 0.6050