- By
**arvid** - Follow User

- 198 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Multilevel Models' - arvid

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### If each school has its own regression line, is it appropriate to draw just one overall line to represent all the schools?

### You can improve the predictive power of the model if you add in information about the school.

### School-level information helps you predict the individual-level scores better

### What happens if the standard errors of the betas are too small?

Other names for the same basic thing

- hierarchical linear models
- multilevel models
- mixed-effects models
- mixed models
- variance-components models
- random-effects regression models
- random-coefficients regression models

Multilevel models

- Common situations:
- Individuals nested within groups
- Random assignment done at the group level rather than at the individual level
- Timepoints nested within subjects

The traditional (wrong) way to analyze this

- Linear or logistic regression analysis
- IV DV
- Ignore the clustering
- People did this for years because there weren’t good computer programs to do it any other way.

What’s wrong with this?

- It violates one of the main assumptions of the regression model!
- Observations are supposed to be independent
- Each person’s residual is independent of everyone else’s residual

Why the observations are not independent

- Kids in the same school are similar.
- My residual is likely to be similar to the residuals of the other kids in my school.
- (residual=difference between predicted value and actual value)

Students in the same school are similar

- Similar backgrounds (SES, urban/rural, ethnic mix of neighborhoods, community resources)
- Similar experiences (same teachers, same school climate, shared events in school)

Here, each individual represents an independent observation

…and traditional data analytic techniques are appropriate

Hierarchical Data Structures in a Group Randomized TrialIn many research studies, we start by drawing a sample of individuals…

and randomly assign them to either treatment or control

However, we are not always able to separate people from their contexts.

Students learn in schools

Children grow up in neighborhoods

Patients are treated in hospitals

When the cluster is a necessary part of a research design, the resultant data will be nested, or hierarchically structured.

Hierarchical Data Structures in a Group Randomized TrialThe unit of assignment is an identifiable group (e.g., cluster).

Different groups are allocated to each condition.

The units of observation are members of the groups.

The number of groups allocated to each condition is usually limited.

Characteristics of Hierarchical Data Structures in a Group Randomized TrialScatterplot of number of nutrition lessons completed vs. number of days student brought fruit for lunch

Source: I made this up.

Draw an overall regression line

Are the points evenly scattered around the line???

How these lines differ

- Each has its own Y-intercept.
- Each has its own slope.
- (Each could have its own amount of scatter, but in this example they’re all the same.)
- (Each could be a different polynomial curve, but in this example they’re all lines.)

The points are closer to the regression line if you have a separate regression line for each school.

Two different equations that help predict an individual’s score:

- Level 1: The individual-level equation

Y = β0 + β1 X + ε

- Level 2: The school-level equation
- Within each school….

β0j = ψ00 + μ0j

β1j = ψ10 + μ1j

- (j represents the school)

How to do it

- Programs specifically designed for multilevel modeling
- HLM
- M+
- MLwiN
- Other programs
- SAS PROC MIXED

What if you just ignore the multilevel structure and use PROC REG or GLM?

- OLS regression model assumes that you have N unique pieces of information to estimate the regression line.
- If my responses are partially explained by the responses of everyone else in my school, then there aren’t really N unique pieces of information.
- (We’re cheating)

The accuracy of the betas (and our confidence in them) is shown by their standard errors.

- OLS model assumes there are N independent pieces of information when it computes the standard errors (minus the d.f.)
- If observations are correlated, there really aren’t N independent pieces of information.
- Estimated standard errors will be too small.

Each individual’s score consists of two components:

- Variance due to the group
- E.g., overall SES of the school
- Variance due to the individual within the group
- E.g., each kid’s personality

If you ignore the grouping, you’re attributing all the variance to the between-individuals component.

- You’re saying that all the causes of variation exist across individuals, and you’re ignoring the effect of the group.

Type I errors

(Conclude that an effect is significant when it’s really not)

Intraclass Correlation (ICC)

- Proportion of the total variance that is due to the group membership

Equation for ICC

σ2g(variance due to the group)

-----------------

σ2m + σ2g (total variance: member + group)

English: The proportion of the total variance that’s due to the grouping variable

ICC in school-based studies is usually small

- Typically around .02 for substance use, kids within schools
- (David Murray is the expert on this)
- Varies across DVs, samples, etc.

How to calculate ICC

- There is a macro on the SAS website
- http://ftp.sas.com/techsup/download/stat/intracc.html
- Paste the macro into your SAS program, and replace their variable names with your variable names.

Another way to calculate ICC

- Use PROC MIXED to calculate the unconditional means model

PROC MIXED METHOD = ML COVTEST ;

CLASS school ;

MODEL dv= / SOLUTION ;

RANDOM INT / TYPE=UN SUBJECT=school ;

RUN ;

Covariance Parameter Estimates (MLE)

Cov Parm Subject Estimate Std Error Z Pr > |Z|

UN(1,1) School 129.19 25.48 5.07 0.0001

Residual 321.56 10.63 30.25 0.0001

Variance component for school is 129.19

Variance component left over after variance due to school has been explained is 321.56.

ICC = variance due to clustering variable /

(variance due to clustering variable + variance remaining)

129 / (129 + 321) = .29

Source: http://www.utexas.edu/its/rc/answers/sas/sas97.html

Small ICCs can have big effects!

- Variance inflation factor (VIF)
- Also known as design effect
- 1 + (m-1) ICC
- m=number of members per group
- So with an ICC of .02 and 100 kids per group, VIF=2.98

DEFT

- Square root of VIF
- In the example,
- DEFT= √2.98 = 1.73
- The standard error of beta is really 1.73 times higher than what you get if you run a traditional OLS regression model!

If you don’t account for the ICC

- Beta will be about the same
- But its standard error will be too small
- So it will look more significant than it really is
- So you conclude that there’s a significant effect, when maybe there really isn’t!

Underestimating your ICC can undermine statistical power

Real power of cluster randomized trials according to discrepancy between a priori postulated and a posteriori estimated intraclass correlation coefficients

Effect size=.25

Power=80%

g=number of clusters

M=average cluster size

N=total number of subjects

Source: Guittet, L., Giraudeau, B., & Ravaud, P. (2005). A priori postulated and real power in cluster randomized trials: mind the gap. BMC Medical Research Methodology 2005, 5:25

A study design issue

- For the same total N, you’ll have more power if you have a large number of clusters (schools) with few individuals (students) per cluster.
- Example: It’s better to have 100 schools with 10 students per school than 10 schools with 100 students per school.
- But that’s more difficult logistically!

So how do we fix it?

- Need to account for the clustering in the regression model
- Can reduce the ICC somewhat by including covariates that explain part of the group effect (e.g., proxy measures of SES)
- But that doesn’t completely eliminate the problem

PROC MIXED

- Lets you include fixed effects (your regular IVs) and random effects (group effects) in the model
- Proc mixed;
- Class school;
- Model dv = var1 var2;
- Random intercept / subject=school;
- Run;

/Solution option

- Gives you the stats you usually want to report:
- Parameter estimate
- Standard error
- Degrees of freedom
- T-value
- P-value
- Model dv = fixedIV1 fixedIV2 / solution;

Example

- Association between SES and smoking
- Hypothesis: smoking is inversely associated with SES.
- In this example….
- SES is the median income in the adolescent’s zip code
- Smoking is a standardized average of ever tried smoking, ever smoked a whole cigarette, days in past month smoked, cigarettes per day
- IRP Year 3 data

REG output

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 12.53389 12.53389 30.12 <.0001

Error 1902 791.60096 0.41619

Corrected Total 1903 804.13485

Root MSE 0.64513 R-Square 0.0156

Dependent Mean 0.20623 Adj R-Sq 0.0151

Coeff Var 312.82168

REG output

Parameter Estimates

Parameter Standard Standardized

Variable DF Estimate Error t Value Pr > |t| Estimate

Intercept 1 0.21036 0.01480 14.21 <.0001 0

income1000 1 -0.08135 0.01482 -5.49 <.0001 -0.12485

Use PROC MIXED to take into account students clustered within schools.

procmixed;

class sch3;

model smkscale3=income1000/solution;

random intercept/sub=sch3 solution;

run;

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept sch3 0.003753

Residual 0.4131

(If you had run the unconditional means model, this is where you would get the numbers to calculate the ICC.)

The unconditional means model

procmixed method=ml covtest;

class sch3;

model smkscale3= /solution;

random intercept/type=un sub=sch3;

run;

Output from the unconditional means model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) sch3 0.0078 0.0037 2.11 0.0174

Residual 0.4276 0.0132 32.52 <.0001

ICC= .0078 / (.0078+.4276) = .0179

David Murray was right—it is around .02!

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 0.1891 0.02277 23 8.30 <.0001

income1000 -0.05732 0.02129 1879 -2.69 0.0072

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

income1000 1 1879 7.25 0.0072

OLS model would have overestimated the effect of income on smoking. Part of the effect is due to the fact that low-income schools have high smoking, and high-income schools have low smoking.

You can make this way more complicated!

- Intercepts vary across schools
- Slopes vary across schools
- Effect of income varies according to the slope or intercept
- Cross-level interaction
- E.g., Smoking is most prevalent among low-income students in high-income schools
- Smoking is most prevalent among low-income students, but only in schools where there is a strong association between income and smoking.

The unit of analysis issue

- Typical school-based prevention trial
- 40 schools
- 100 kids per school
- Total N=4000
- 20 schools randomized to intervention group
- 20 schools randomized to control group
- (“randomization at the school level”)

Why do researchers randomize at the school level?

- It’s easier to give the same intervention to a whole school, instead of randomly assigning students and sending them to different rooms to get different interventions.
- When kids talk to each other, the interventions would leak across the groups.

Traditional way to analyze program effects was at the individual level:

- Intervention DV
- N=4000
- However, this ignored the clustering of kids within schools!

Most conservative approach:“Unit of assignment should be the unit of analysis”

- Analyze at the school level
- Intervention school mean of DV
- N=40 schools
- But N=40 has low power!
- Not really making use of all the available information, because you’re aggregating to the school level.

A compromise: Multilevel models

- Use data from individuals, but include a random effect for school to partial out the effects of the clustering.

Can I delete the grouping variable from the model if it’s not significant?

- Even a small ICC, if ignored, can inflate the Type I error rate if the number of members per group is moderate to large.
- Safest course is to include all random effects associated with the study design and sampling plan.
- If those random effects really don’t matter, the analyses will come out the same anyway.
- All you’re losing is a little convenience.

Multilevel models for dichotomous outcomes

- PROC MIXED assumes a DV with a continuous distribution.
- PROC GLIMMIX lets you use a binary outcome variable.
- Syntax is similar to MIXED.

Download Presentation

Connecting to Server..