Revealing the riddle of reml
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

Revealing the riddle of REML PowerPoint PPT Presentation


  • 36 Views
  • Uploaded on
  • Presentation posted in: General

Revealing the riddle of REML. Mick O’Neill Faculty of Agriculture, Food & Natural Resources, University of Sydney. Background. Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty Biometry 3 is a Third Year elective

Download Presentation

Revealing the riddle of REML

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Revealing the riddle of reml

Revealing the riddle of REML

Mick O’Neill

Faculty of Agriculture, Food & Natural Resources, University of Sydney


Background

Background

  • Biometry 1 and 2 are core units with an applied stats focus. Many students have only Maths in Society on entry to the Faculty

  • Biometry 3 is a Third Year elective

  • Biometry 4 is (still) a possible major

  • All students are now expected to design and analyse their fourth year experiments with little or no help from the Biometry Unit


Third year biometry students can

Third year Biometry students can:

  • Design and analyse multi-strata factorial experiments (split-plots, strip-plots)

  • Perform binomial & ordinal logistic regression, Poisson regression, …

  • Analyse repeated measures data using REML


Step 1 what is m aximum l ikelihood

Step 1. What is Maximum Likelihood?

  • The likelihood is the prior probability of obtaining the actual data in your sample

  • This requires you to assume that the data follow some distribution, typically:

  • Binomial or Poisson for count data

  • Normal or LogNormal for continuous data


Step 1 what is m aximum l ikelihood1

Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Each of these distributions involves at least one unknown parameter (probability, mean, standard deviation, …) which must be estimated from the data.


Step 1 what is m aximum l ikelihood2

Step 1. What is Maximum Likelihood?

The likelihood is the prior probability of obtaining the actual data in your sample

Estimation is achieved by finding that parameter valuewhich maximises the likelihood (or equivalently the log-likelihood)


Example 1 binomial data

Example 1. Binomial data

  • Guess p = P(seed germinates)

  • Evaluate LogL

  • Maximise LogL by varying p


Example 2 normal data

Example 2. Normal data

  • Guess m and s

  • Evaluate LogL

  • Maximise LogL by varying m and s


Step 2 what is reml

Step 2. What is REML?

It is possible to partition the likelihood into two terms:

  • a likelihood that involves m (as well as s2)

  • and

  • a residual likelihood that involves only s2


Step 2 what is reml1

Step 2. What is REML?

It is possible to partition the likelihood into two terms, in such a way that:

  • the first likelihood can be maximised to estimate m (and its solution does not depend on the value of s2)

  • the residual likelihood can be maximised to estimates2 REML estimate


Revealing the riddle of reml

=

How?


Solutions

Solutions

  • ML estimate of variance is

  • REML estimate is

  • In each case the estimate of m isthe sample mean


Revealing the riddle of reml

sn-1

sn

ML estimate

REML estimate


Extensions

=

Extensions


Example 3 one way no blocking fixed effects

Example 3. One-way (no blocking)Fixed effects


Anova vs reml

ANOVA vs REML

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Chick stratum

Diet 3 0.01847 0.00616 0.10 0.958

Residual 12 0.73230 0.06103

Total 15 0.75078

REML Variance Components Analysis:

Fixed model : Constant+Diet

Random model : Chick

Chick used as residual term

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Chick Identity Sigma20.0610 0.02491

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Diet 0.30 3 0.100.960


Revealing the riddle of reml

ANOVA:

  • ***** Tables of means *****

    Grand mean 3.129

    Diet Diet_1 Diet_2 Diet_3 Diet_4

    3.107 3.188 3.112 3.107

    *** Standard errors of differences of means ***

    Table Diet

    rep. 4

    d.f. 12

    s.e.d. 0.1747

    REML Variance Components Analysis:

    *** Table of predicted means for Diet ***

    Diet Diet_1 Diet_2 Diet_3 Diet_4

    3.107 3.187 3.112 3.107

    Standard error of differences: 0.1747


Example 4a one way in randomised blocks fixed treatments

Example 4a. One-way (in randomised blocks) – fixed treatments

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(a) With Block + Spacing both fixed effects:

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.696 1.169

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Block 1.46 5 0.29 0.917

Spacing 34.00 4 8.50 <0.001


Random blocks fixed treatments

Random blocks, fixed treatments

ANOVA:

Source of variation d.f. s.s. m.s. v.r. F pr.

Block stratum 5 5.410 1.082 0.29

Block.*Units* stratum

Spacing 4 125.661 31.415 8.50 <.001

Residual 20 73.919 3.696

REML Variance Components Analysis

(b) With Spacing fixed and Block random:

*** Estimated Variance Components ***

Random term Component S.e.

Block 0.000 BOUND

*** Residual variance model ***

Term Factor Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 3.173 0.897

*** Wald tests for fixed effects ***

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Spacing 39.60 4 9.90 <0.001

Estimated

Source Value

Block -0.5228

Error 3.6959


Model for rcbd

Model for RCBD

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

  • Overall mean and Spacing effects are fixedeffects

  • Block effect is a random term

  • Error is a random term


General linear mixed model

General Linear Mixed Model

Yield of soybean = Overall mean + Block effect + Spacing effect

+ Error

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

  • The random effects can be correlated

  • The error term can be correlated

  • The random effects are uncorrelated with the error term


General linear mixed model1

General Linear Mixed Model

Y = Fixed effects + Random effects + Error term

Y = Xt + Zu + e

is a scaling factor, often set to 1


Revealing the riddle of reml

  • REML is used as the default to estimate to variance and covariance parameters of the model

  • The algorithm does not depend on the data being balanced


Revealing the riddle of reml

Furthermore, different choices for the variance matrices allow for :

  • an appropriate repeated measures analysis for normal data

  • an appropriate spatial analysis for field trials

Nested models can be compared using the change in deviance which is approximately c2 with df = change in number of parameters


Example 5 adjusting thesis marks for random markers

Example 5. Adjusting thesis marks for random markers


Revealing the riddle of reml

For students:

For markers:


Example 6 use of deviance widths in m m of the dorsal shield of larvae of ticks found on 4 rabbits

Example 6. Use of devianceWidths (in mm) of the dorsal shield of larvae of ticks found on 4 rabbits


Revealing the riddle of reml

Minitab’s analysis

SourceDFAdj MS F P

Rabbit 3602.6 5.26 0.004

Error33114.5

Estimated

SourceTermSource Value

1Rabbit(2) + 9.0090 (1)Rabbit 54.18

2 Error(2) Error 114.48

Rabbit Mean

1 372.3

2 354.4

3 355.3

4 361.3these are sample means


Revealing the riddle of reml

GenStat’s Linear Mixed Models analysis

Random term Component S.e.

Rabbit 55.0 55.8

*** Residual variance model ***

Term Model(order) Parameter Estimate S.e.

Residual Identity Sigma2 114.4 28.2

Table of predicted means for Rabbit (these are BLUPs)

Rabbit 1 2 3 4

369.9 355.5 356.0 361.2

Standard error of differences: Average 4.613

Maximum 5.055

Minimum 4.133

Average variance of differences: 21.38

Deviance d.f.

215.22 34


Revealing the riddle of reml

Test H0:

Method: drop Rabbit as a random term

Deviance d.f.

221.21 35 for reduced model

215.22 34

Change in deviance = 6.0 with 1 df

P-value = 0.014

The variation in the widths of the dorsal shield of larvae of ticks found among rabbits differs significantly across rabbits (P = 0.014)

The variance among rabbits is estimated to be 55.0 ( 55.7) compared to the variance within rabbits, namely 114.4 ( 28.2)


Example 7 repeated measures

Example 7 - Repeated Measures

Growth of a fungus (in cm) over time


Growth of fungus

Growth of fungus


Revealing the riddle of reml

Split plot without Greenhouse-Geisser adjustmment

(assumes equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

Estimated stratum variances

Stratum variance d.f. variance component

Rep.Fungus 0.0833 9 0.0098

Rep.Fungus.Time 0.0247 45 0.0247


Revealing the riddle of reml

Split plot with Greenhouse-Geisser adjustmment

(tests equi-correlation structure among times)

Source of variation d.f. m.s. v.r. F pr.

Rep.Fungus stratum

Fungus 2 8.104 97.30 <.001

Residual 9 0.083 3.37

Rep.Fungus.Time stratum

Time 5 55.231 2233.21 <.001

Fungus.Time 10 0.933 37.71 <.001

Residual 45 0.025

(d.f. are multiplied by the correction factors before calculating F probabilities)

Box's tests for symmetry of the covariance matrix:

Chi-square 57.47 on 19 df: probability 0.000

F-test 2.93 on 19, 859 df: probability 0.000

Greenhouse-Geisser epsilon 0.3206


Revealing the riddle of reml

Split plot via REML – ignoring changing variances

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.00976 0.00660

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0247 0.0052

Deviance d.f.

-109.90 52

Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob

Fungus 194.60 2 97.30 <0.001

Time 11166.05 5 2233.21 <0.001

Fungus.Time 377.08 10 37.71 <0.001


Revealing the riddle of reml

Split plot via REML – accounting for changing variances (a)

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0082 .00428

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 1.000 FIXED

d_2 1.102 0.815

d_3 0.227 0.215

d_4 0.262 0.253

d_5 1.965 1.443

d_6 13.550 9.580


Revealing the riddle of reml

Split plot via REML – accounting for changing variances (b)

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.01053 0.00539

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 1.000 FIXED

Rep Identity - - -

Fungus Identity - - -

Time Diagonal d_1 0.0082 0.0043

d_2 0.0091 0.0047

d_3 0.0019 0.0015

d_4 0.0022 0.0017

d_5 0.0162 0.0081

d_6 0.1116 0.0530


Revealing the riddle of reml

Split plot via REML – accounting for changing variances

and an AR(1) correlation structure

Fixed model : Constant+Fungus+Time+Fungus.Time

Random model : Rep.Fungus+Rep.Fungus.Time

Estimated Variance Components

Random term Component S.e.

Rep.Fungus 0.010616 0.005572

Residual variance model

Term Model(order) Parameter Estimate S.e.

Rep.Fungus.Time Identity Sigma2 0.0085 0.0045

Rep Identity - - -

Fungus Identity - - -

Time AR(1) hetphi_10.148 0.209

d_1 1.000 FIXED

d_2 1.202 0.895

d_3 0.260 0.249

d_4 0.264 0.266

d_5 1.829 1.347

d_6 13.560 9.620


Revealing the riddle of reml

Split plot via REML – accounting for changing variances

and an AR(1) correlation structure

Deviance d.f. Change d.f.

Same variance, uncorrelated -109.90 52

Different variances over time -151.03 47 41.13 5

+ AR(1) correlation structure -151.59 46 0.56 1


Example 8 spatial analysis

Example 8 – Spatial analysis

RCBD (fixed) fertilisers Potato yields (t/ha)


Revealing the riddle of reml

Source of variation d.f. m.s. v.r. F pr.

Block stratum 3 2.929 1.43

Block.Treatment stratum

Treatment 5 92.359 45.07 <.001

Residual 15 2.049

Treatment A B C D E F

17.55 25.60 27.67 25.94 30.51 30.63

*** Standard errors of differences of means ***

Table Treatment

rep. 4

d.f. 15

s.e.d. 1.012


Contour plot of residuals

Contour plot of residuals


Revealing the riddle of reml

REML

Random term Component S.e.

Block 0.395 0.500

Residual variance model

Term Factor Model Parameter Estimate S.e.

Y.X Sigma2 2.849 1.739

Y AR(1) phi_10.7054 0.2078

X AR(1) phi_1 -0.2508 0.3397

Deviance d.f.

36.54 14

Treatment A B C D E F

17.74 26.29 26.79 26.34 30.41 29.37

Standard error of differences: Average 0.7749

Maximum 0.8942

Minimum 0.6465

Average variance of differences: 0.6050


  • Login