Multilevel Models

1 / 62

# Multilevel Models - PowerPoint PPT Presentation

Multilevel Models. Other names for the same basic thing. hierarchical linear models multilevel models mixed-effects models mixed models variance-components models random-effects regression models random-coefficients regression models . Multilevel models. Common situations:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Multilevel Models' - arvid

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Multilevel Models

Other names for the same basic thing
• hierarchical linear models
• multilevel models
• mixed-effects models
• mixed models
• variance-components models
• random-effects regression models
• random-coefficients regression models
Multilevel models
• Common situations:
• Individuals nested within groups
• Random assignment done at the group level rather than at the individual level
• Timepoints nested within subjects
The traditional (wrong) way to analyze this
• Linear or logistic regression analysis
• IV  DV
• Ignore the clustering
• People did this for years because there weren’t good computer programs to do it any other way.
What’s wrong with this?
• It violates one of the main assumptions of the regression model!
• Observations are supposed to be independent
• Each person’s residual is independent of everyone else’s residual
Why the observations are not independent
• Kids in the same school are similar.
• My residual is likely to be similar to the residuals of the other kids in my school.
• (residual=difference between predicted value and actual value)
Students in the same school are similar
• Similar backgrounds (SES, urban/rural, ethnic mix of neighborhoods, community resources)
• Similar experiences (same teachers, same school climate, shared events in school)
Here, each individual represents an independent observation

…and traditional data analytic techniques are appropriate

Hierarchical Data Structures in a Group Randomized Trial

In many research studies, we start by drawing a sample of individuals…

and randomly assign them to either treatment or control

However, we are not always able to separate people from their contexts.

Students learn in schools

Children grow up in neighborhoods

Patients are treated in hospitals

When the cluster is a necessary part of a research design, the resultant data will be nested, or hierarchically structured.

Hierarchical Data Structures in a Group Randomized Trial

Different groups are allocated to each condition.

The units of observation are members of the groups.

The number of groups allocated to each condition is usually limited.

Characteristics of Hierarchical Data Structures in a Group Randomized Trial

### What do the data look like?

Scatterplot of number of nutrition lessons completed vs. number of days student brought fruit for lunch

Draw an overall regression line

Are the points evenly scattered around the line???

How these lines differ
• Each has its own Y-intercept.
• Each has its own slope.
• (Each could have its own amount of scatter, but in this example they’re all the same.)
• (Each could be a different polynomial curve, but in this example they’re all lines.)

### You can improve the predictive power of the model if you add in information about the school.

The points are closer to the regression line if you have a separate regression line for each school.

### School-level information helps you predict the individual-level scores better

• Level 1: The individual-level equation

Y = β0 + β1 X + ε

• Level 2: The school-level equation
• Within each school….

β0j = ψ00 + μ0j

β1j = ψ10 + μ1j

• (j represents the school)

### Multilevel models let you use all these equations at the same time!

How to do it
• Programs specifically designed for multilevel modeling
• HLM
• M+
• MLwiN
• Other programs
• SAS PROC MIXED
What if you just ignore the multilevel structure and use PROC REG or GLM?
• OLS regression model assumes that you have N unique pieces of information to estimate the regression line.
• If my responses are partially explained by the responses of everyone else in my school, then there aren’t really N unique pieces of information.
• (We’re cheating)
The accuracy of the betas (and our confidence in them) is shown by their standard errors.
• OLS model assumes there are N independent pieces of information when it computes the standard errors (minus the d.f.)
• If observations are correlated, there really aren’t N independent pieces of information.
• Estimated standard errors will be too small.
Each individual’s score consists of two components:
• Variance due to the group
• E.g., overall SES of the school
• Variance due to the individual within the group
• E.g., each kid’s personality
If you ignore the grouping, you’re attributing all the variance to the between-individuals component.
• You’re saying that all the causes of variation exist across individuals, and you’re ignoring the effect of the group.

### What happens if the standard errors of the betas are too small?

Type I errors

(Conclude that an effect is significant when it’s really not)

Intraclass Correlation (ICC)
• Proportion of the total variance that is due to the group membership
Equation for ICC

σ2g(variance due to the group)

-----------------

σ2m + σ2g (total variance: member + group)

English: The proportion of the total variance that’s due to the grouping variable

ICC in school-based studies is usually small
• Typically around .02 for substance use, kids within schools
• (David Murray is the expert on this)
• Varies across DVs, samples, etc.
How to calculate ICC
• There is a macro on the SAS website
• Paste the macro into your SAS program, and replace their variable names with your variable names.
Another way to calculate ICC
• Use PROC MIXED to calculate the unconditional means model

PROC MIXED METHOD = ML COVTEST ;

CLASS school ;

MODEL dv= / SOLUTION ;

RANDOM INT / TYPE=UN SUBJECT=school ;

RUN ;

Covariance Parameter Estimates (MLE)

Cov Parm Subject Estimate Std Error Z Pr > |Z|

UN(1,1) School 129.19 25.48 5.07 0.0001

Residual 321.56 10.63 30.25 0.0001

Variance component for school is 129.19

Variance component left over after variance due to school has been explained is 321.56.

ICC = variance due to clustering variable /

(variance due to clustering variable + variance remaining)

129 / (129 + 321) = .29

Small ICCs can have big effects!
• Variance inflation factor (VIF)
• Also known as design effect
• 1 + (m-1) ICC
• m=number of members per group
• So with an ICC of .02 and 100 kids per group, VIF=2.98
DEFT
• Square root of VIF
• In the example,
• DEFT= √2.98 = 1.73
• The standard error of beta is really 1.73 times higher than what you get if you run a traditional OLS regression model!
If you don’t account for the ICC
• Beta will be about the same
• But its standard error will be too small
• So it will look more significant than it really is
• So you conclude that there’s a significant effect, when maybe there really isn’t!
Underestimating your ICC can undermine statistical power

Real power of cluster randomized trials according to discrepancy between a priori postulated and a posteriori estimated intraclass correlation coefficients

Effect size=.25

Power=80%

g=number of clusters

M=average cluster size

N=total number of subjects

Source: Guittet, L., Giraudeau, B., & Ravaud, P. (2005). A priori postulated and real power in cluster randomized trials: mind the gap. BMC Medical Research Methodology 2005, 5:25

A study design issue
• For the same total N, you’ll have more power if you have a large number of clusters (schools) with few individuals (students) per cluster.
• Example: It’s better to have 100 schools with 10 students per school than 10 schools with 100 students per school.
• But that’s more difficult logistically!
So how do we fix it?
• Need to account for the clustering in the regression model
• Can reduce the ICC somewhat by including covariates that explain part of the group effect (e.g., proxy measures of SES)
• But that doesn’t completely eliminate the problem
PROC MIXED
• Lets you include fixed effects (your regular IVs) and random effects (group effects) in the model
• Proc mixed;
• Class school;
• Model dv = var1 var2;
• Random intercept / subject=school;
• Run;
/Solution option
• Gives you the stats you usually want to report:
• Parameter estimate
• Standard error
• Degrees of freedom
• T-value
• P-value
• Model dv = fixedIV1 fixedIV2 / solution;
Example
• Association between SES and smoking
• Hypothesis: smoking is inversely associated with SES.
• In this example….
• SES is the median income in the adolescent’s zip code
• Smoking is a standardized average of ever tried smoking, ever smoked a whole cigarette, days in past month smoked, cigarettes per day
• IRP Year 3 data
PROC REG ignores the clustering

procreg;

model smkscale3=income1000/stb;

run;

REG output

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 12.53389 12.53389 30.12 <.0001

Error 1902 791.60096 0.41619

Corrected Total 1903 804.13485

Root MSE 0.64513 R-Square 0.0156

Dependent Mean 0.20623 Adj R-Sq 0.0151

Coeff Var 312.82168

REG output

Parameter Estimates

Parameter Standard Standardized

Variable DF Estimate Error t Value Pr > |t| Estimate

Intercept 1 0.21036 0.01480 14.21 <.0001 0

income1000 1 -0.08135 0.01482 -5.49 <.0001 -0.12485

Use PROC MIXED to take into account students clustered within schools.

procmixed;

class sch3;

model smkscale3=income1000/solution;

random intercept/sub=sch3 solution;

run;

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept sch3 0.003753

Residual 0.4131

(If you had run the unconditional means model, this is where you would get the numbers to calculate the ICC.)

The unconditional means model

procmixed method=ml covtest;

class sch3;

model smkscale3= /solution;

random intercept/type=un sub=sch3;

run;

Output from the unconditional means model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) sch3 0.0078 0.0037 2.11 0.0174

Residual 0.4276 0.0132 32.52 <.0001

ICC= .0078 / (.0078+.4276) = .0179

David Murray was right—it is around .02!

### Back to our model….

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 0.1891 0.02277 23 8.30 <.0001

income1000 -0.05732 0.02129 1879 -2.69 0.0072

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

income1000 1 1879 7.25 0.0072

OLS model would have overestimated the effect of income on smoking. Part of the effect is due to the fact that low-income schools have high smoking, and high-income schools have low smoking.

You can make this way more complicated!
• Intercepts vary across schools
• Slopes vary across schools
• Effect of income varies according to the slope or intercept
• Cross-level interaction
• E.g., Smoking is most prevalent among low-income students in high-income schools
• Smoking is most prevalent among low-income students, but only in schools where there is a strong association between income and smoking.
The unit of analysis issue
• Typical school-based prevention trial
• 40 schools
• 100 kids per school
• Total N=4000
• 20 schools randomized to intervention group
• 20 schools randomized to control group
• (“randomization at the school level”)
Why do researchers randomize at the school level?
• It’s easier to give the same intervention to a whole school, instead of randomly assigning students and sending them to different rooms to get different interventions.
• When kids talk to each other, the interventions would leak across the groups.
Traditional way to analyze program effects was at the individual level:
• Intervention  DV
• N=4000
• However, this ignored the clustering of kids within schools!
Most conservative approach:“Unit of assignment should be the unit of analysis”
• Analyze at the school level
• Intervention  school mean of DV
• N=40 schools
• But N=40 has low power!
• Not really making use of all the available information, because you’re aggregating to the school level.
A compromise: Multilevel models
• Use data from individuals, but include a random effect for school to partial out the effects of the clustering.
Can I delete the grouping variable from the model if it’s not significant?
• Even a small ICC, if ignored, can inflate the Type I error rate if the number of members per group is moderate to large.
• Safest course is to include all random effects associated with the study design and sampling plan.
• If those random effects really don’t matter, the analyses will come out the same anyway.
• All you’re losing is a little convenience.
Multilevel models for dichotomous outcomes
• PROC MIXED assumes a DV with a continuous distribution.
• PROC GLIMMIX lets you use a binary outcome variable.
• Syntax is similar to MIXED.