250 likes | 448 Views
Outline. The basic arguments for mixed-effects models (continuous dependent variables, with random effects)What you get from a modelHow to compare models with different random effectsHow to get p-valuesHow to do it all in RRemaining issues/questions:Contrast tests?Comparing models with differ
E N D
1. Bare-bones pragmatic highlights of Baayen, Davidson, & Bates, (2008) by Scottevil
For MLM reading group
2/12/09
2. Outline The basic arguments for mixed-effects models (continuous dependent variables, with random effects)
What you get from a model
How to compare models with different random effects
How to get p-values
How to do it all in R
Remaining issues/questions:
Contrast tests?
Comparing models with different fixed effects?
Multi-level/hierarchical versions
3. The pros of mixed-effects More powerful (i.e., less likely to make Type II errors)
Very robust against missing data
No need for averaging, etc.
Can correct for heteroskedacity in a more principled way
by explicitly modeling (e.g.) subject effects on slope, as well as intercept
Can explicitly test which random effects should be included in the model
Likelihood ratio tests
with some limits (well get to that)
4. Modeling error terms Three basic options:
Random effect on the intercept
Random effect on the slope
Nesting of random effects
Combinations of those
A little terminology:
Nesting: multi-level or hierarchical models
Non-nesting: crossed random factors
5. Comparing models Likelihood ratio test: tests chi-square distribution of difference in log-likelihood between two models.
Explicit test to determine which error terms should be in a model
RESTRICTIONS:
Not for testing different models of fixed effects (only appropriate for determining which error terms to use)
Can only compare nested models, where one model contains a proper subset of the other models error terms
Only valid for comparing models of same set of data
6. Getting p-values with mixed-effects Fitted model gives t-value, but picking degrees of freedom is very unclear.
In experiments with large numbers of observations (hundreds), picking Df is less of a problem, can more or less assume that t > 2 is significant.
Alternative: using Markov chain Monte Carlo (MCMC) sampling to estimate a p-value.
BDB use data from Raaijmakers et al. (1999) and simulations to show that p-values estimated from MCMC method match up very well (in terms of Type I & II errors) with alternatives such as quasi-F, with fewer drawbacks.
7. Examples in R Packages needed:
languageR (by Baayen) requires:
zipfR
coda
lme4 (by Bates) requires:
Matrix
lattice
How to install these from a basic installation of R:
Install.packages(c(Matrix, lattice, lme4, coda, zipfR, languageR), repos = http://cran.r-project.org)
After packages installed, you need to load these:
library(languageR)
8. Getting data into R Make an ASCII table of your data, with a header row (e.g., from Excel, save as a tab-delimited .txt file)
Each row is an observation, with columns for all the various fixed & random factors.
No need for averaging by-subject or by-item.
No need for data replacement.
NB: R does NOT like empty cells, however. Just leave the row blank if there is missing data, or fill in with some code that you can use to get a subset of the data out in R that does not include those rows.
9. Getting data into R Use the read.table() function:
priming= read.table(baayen2008.txt, header = TRUE)
Type the name of your data set to have R show it to you.
A few other useful things:
nrow() : tells you how many rows there are in your data set
colnames() : tells you what the column names are
data[1:10, ] : displays just the first 10 rows of your data
10. Running lmer Form:
lmer.object = lmer(formula, data = name of data table)
lmer.object : the name of the output object
name of data table : the name of the data as you imported it
formula : where all the action is.
11. Formulas in R The basic form is: X ~ Y, where X is your dependent variable, and Y is your list of factors.
On the Y side:
+ is used to add a factor
| is used to specify error terms
: is used to specify interactions
* is used for a fully-crossed design. E.g. X*Y is the same as X + Y + X:Y
12. Looking at the data Type priming
Look at names of factors, etc.
NB: in R, columns are of basically three types
Characters
Numbers
Factors
In order to use a column as a factor in an analysis in R, it must be of factor type.
By default, read.table() makes character columns into factors.
To show this, type levels(priming$SOA).
$ notation very useful: the thing after the $ is the name of the column, so priming$SOA selects the column named SOA from the data table priming
13. Example lmer priming.lmer = lmer(RT ~ SOA + (1|Item) + (1+SOA|Subj), data = priming)
SOA is the fixed effect
(1|Item) specifies Item as a random effect on just the intercept (i.e., Item is expected to have an effect on overall how fast or slow a trial is responded to)
( 1+SOA|Subj) specifies Subj as a random effect on both intercept and slope.
This is an example of modeling heteroskedacity! (SOA|Subj) basically says that subjects vary in how big the effect of SOA is at different levels of SOA
14. Examining the model summary(priming.lmer)
In random effects, high correlation (-1!) for the subject effects suggests that they are redundant (well test this later).
In fixed effects, t-value given. This can be helpful for a quick look at whether something is likely to be significant, but not directly interpretable with a p-value.
Estimate in fixed effects tells you the effect size & means.
15. Trying some other models Different options for random effects:
priming.lmer2 = lmer(RT ~ SOA + (1|Item) + (1|Subj), data = priming)
Comparing the models with likelihood ratio test:
anova(priming.lmer, priming.lmer2)
Results (p = 0.2288) tell us that changing the model to the simpler one (priming.lmer2) makes no difference, so it should be preferred over the more complex one.
16. Lookin for p in all the right places Testing for significance, using languageRs pvals.fnc()
Uses Markov chain Monte Carlo method
mcmc = pvals.fnc(priming.lmer2, nsim = 10000)
10,000 is default
Generates plots of posterior values. The less skewed these are, the better.
Type mcmc$fixed
Two p-values: pMCMC and Pr(>|t|)
First is p-value based on MCMC
Second is two-tailed t-test, with Df at the upper bound (observations minus fixed effect), anticonservative for small samples!
The more observations in your data, the more similar these two values.
The more different these two values, the more skewed the posterior distributions are generated by the MCMC
NOTE: pvals.fnc() does not work with random slope effects (yet)!
Must resort to some other method, like using estimated Dfs for t-test
17. Summary of procedure Get data in
Fit model using lmer
Check lmer and run some model comparisons (using anova function) to find the right array of random effects
Use pvals.fnc to evaluate fixed effects for significance, using MCMC method (and t-test with upper-bound Df)
18. Good things Can deal with Items & Subjects simultaneously
High power & low Type I error compared to alternatives
No need for averaging, data-replacement, etc.
Can help figure out which random effects should be in the model
Pretty quick & easy!
19. Remaining issues & questions Contrast tests?
I personally dont know how to do them (yet)
Quené & van den Bergh (2008) mention that its complicated
Method for distinguishing models with different fixed effects?
Nested/hierarchical random factors?
20. Nested (multi-level) random effects How to do it in R:
RT ~ SOA + (1|Subj) + (1|Item:Subj)
RT ~ SOA + (1|Subj) + (1|Item)
Or
RT ~ SOA + (1|Item/Subj)
Means, effect of Subject on intercept, and effect of Item *within* Subj
Baayen et al. (2008) suggest that nesting items within subjects may not always be best model.
Assumes that effects of item can be completely different for each subject. May not be appropriate for items with properties that should have similar effects across subject (e.g., length)
Other problem: likelihood ratio test not valid when models are not nested wrt random effects (Pinheiro & Bates 2000)
Cant use method to distinguish between models with nesting vs. models without nesting (aka crossed random effects models)
21. the end
(?!?) Another example follows if there is time/interest
22. Another example splitplot is a data set in languageR that has a traditional counterbalanced-type design (items are in both conditions, but only across subjects, in two counterbalanced lists)
lmer can be run with no additional need for specification of this design.
23. Working through example 2 Type splitplot to make sure you have it loaded. If not, make sure you have languageR loaded (with library(languageR))
Make a new data file called dat that is identical to splitplot, just to make this example parallel with the one in Baayen et al. 2008
dat = splitplot
Fit the following models with languageR, using different arrangements of random effects.
dat.lmer1 = lmer(RT ~ list * priming + (1+priming|subjects) + (1+list|items),
data = dat)
dat.lmer2 = lmer(RT ~ list * priming + (1+priming|subjects) + (1|items), data = dat)
dat.lmer3 = lmer(RT ~ list * priming + (1|subjects) + (1|items), data = dat)
dat.lmer4 = lmer(RT ~ list * priming + (1|subjects) , data = dat)
24. Working through example 2 Next, run the likelihood ratio test:
anova(dat.lmer1, dat.lmer2, dat.lmer3, dat.lmer4)
The results of this test tell us that dat.lmer3 is a significant improvement over dat.lmer4, but that 1 & 2 are not worth the additional complications.
This tells us that having subjects and items as random effects is important for this data, but that we probably dont need to worry about heteroskedacity (i.e., modeling random effects of the slopes we tried adds nothing)
25. Working through example 2 Finally, we run the tests for significance on the model with the best random effects model, dat.lmer3 (it should take a few seconds)
dat.lmer3.pvals = pvals.fnc(dat.lmer3, nsim = 10000)
dat.lmer3.pvals$fixed
We can see that the effect of priming is significant by both pMCMC and t-test, and the effects involving list are not.
Notice the high similarity between the pMCMC and the t-test p values. This is because the number of observations is relatively high (20 items in both conditions for 20 subjects).
This also means we could have just eye-balled the t-values in the lmer model itself, looking for t > 2.
26. The end (I think for reals, this time)