Other names for the same basic thing • hierarchical linear models • multilevel models • mixed-effects models • mixed models • variance-components models • random-effects regression models • random-coefficients regression models
Multilevel models • Common situations: • Individuals nested within groups • Random assignment done at the group level rather than at the individual level • Timepoints nested within subjects
The traditional (wrong) way to analyze this • Linear or logistic regression analysis • IV DV • Ignore the clustering • People did this for years because there weren’t good computer programs to do it any other way.
What’s wrong with this? • It violates one of the main assumptions of the regression model! • Observations are supposed to be independent • Each person’s residual is independent of everyone else’s residual
Why the observations are not independent • Kids in the same school are similar. • My residual is likely to be similar to the residuals of the other kids in my school. • (residual=difference between predicted value and actual value)
Why are kids in the same school similar? • Durkheim: social facts • Facts, concepts, expectations that come from the social community • Exist before and outside the individual • Constrain the members’ behavior • People in groups share the same social context, social norms, experiences.
Here, each individual represents an independent observation …and traditional data analytic techniques are appropriate Hierarchical Data Structures in a Group Randomized Trial In many research studies, we start by drawing a sample of individuals… and randomly assign them to either treatment or control
However, we are not always able to separate people from their contexts. Students learn in schools Children grow up in neighborhoods Patients are treated in hospitals When the cluster is a necessary part of a research design, the resultant data will be nested, or hierarchically structured. Hierarchical Data Structures in a Group Randomized Trial
The unit of assignment is an identifiable group (e.g., cluster). Different groups are allocated to each condition. The units of observation are members of the groups. The number of groups allocated to each condition is usually limited. Characteristics of Hierarchical Data Structures in a Group Randomized Trial
Scatterplot of number of nutrition lessons completed vs. number of days student brought fruit for lunch Source: I made this up.
Draw an overall regression line Are the points evenly scattered around the line???
How these lines differ • Each has its own Y-intercept. • Each has its own slope. • (Each could have its own amount of scatter, but in this example they’re all the same.) • (Each could be a different polynomial curve, but in this example they’re all lines.)
If each school has its own regression line, is it appropriate to draw just one overall line to represent all the schools?
Students in the same school are similar • Similar backgrounds (SES, urban/rural, ethnic mix of neighborhoods, community resources) • Similar experiences (same teachers, same school climate, shared events in school)
You can improve the predictive power of the model if you add in information about the school. The points are closer to the regression line if you have a separate regression line for each school.
School-level information helps you predict the individual-level scores better
Two different equations that help predict an individual’s score: • Level 1: The individual-level equation Y = β0 + β1 X + ε • Level 2: The school-level equation • Within each school…. β0j = ψ00 + μ0j β1j = ψ10 + μ1j • (j represents the school)
How to do it • Programs specifically designed for multilevel modeling • HLM • M+ • MLwiN • Other programs • SAS PROC MIXED
What if you just ignore the multilevel structure and use PROC REG or GLM? • OLS regression model assumes that you have N unique pieces of information to estimate the regression line. • If my responses are partially explained by the responses of everyone else in my school, then there aren’t really N unique pieces of information. • (We’re cheating)
The accuracy of the betas (and our confidence in them) is shown by their standard errors. • OLS model assumes there are N independent pieces of information when it computes the standard errors (minus the d.f.) • If observations are correlated, there really aren’t N independent pieces of information. • Estimated standard errors will be too small.
Each individual’s score consists of two components: • Variance due to the group • E.g., overall SES of the school • Variance due to the individual within the group • E.g., each kid’s personality
If you ignore the grouping, you’re attributing all the variance to the between-individuals component. • You’re saying that all the causes of variation exist across individuals, and you’re ignoring the effect of the group.
What happens if the standard errors of the betas are too small? Type I errors (Conclude that an effect is significant when it’s really not)
Intraclass Correlation (ICC) • Proportion of the total variance that is due to the group membership
Equation for ICC σ2g(variance due to the group) ----------------- σ2m + σ2g (total variance: member + group) English: The proportion of the total variance that’s due to the grouping variable
ICC in school-based studies is usually small • Typically around .02 for substance use, kids within schools • (David Murray is the expert on this) • Varies across DVs, samples, etc.
How to calculate ICC • There is a macro on the SAS website • http://ftp.sas.com/techsup/download/stat/intracc.html • Paste the macro into your SAS program, and replace their variable names with your variable names.
Another way to calculate ICC • Use PROC MIXED to calculate the unconditional means model PROC MIXED METHOD = ML COVTEST ; CLASS school ; MODEL dv= / SOLUTION ; RANDOM INT / TYPE=UN SUBJECT=school ; RUN ;
Covariance Parameter Estimates (MLE) Cov Parm Subject Estimate Std Error Z Pr > |Z| UN(1,1) School 129.19 25.48 5.07 0.0001 Residual 321.56 10.63 30.25 0.0001 Variance component for school is 129.19 Variance component left over after variance due to school has been explained is 321.56. ICC = variance due to clustering variable / (variance due to clustering variable + variance remaining) 129 / (129 + 321) = .29 Source: http://www.utexas.edu/its/rc/answers/sas/sas97.html
Small ICCs can have big effects! • Variance inflation factor (VIF) • Also known as design effect • 1 + (m-1) ICC • m=number of members per group • So with an ICC of .02 and 100 kids per group, VIF=2.98
Variance Inflation Factor • VIF is an index of the relative increase in variation found in a HLM analysis as compared to the standard OLS • VIF increases both as m and as ICC increase
DEFT • Square root of VIF • In the example, • DEFT= √2.98 = 1.73 • The standard error of beta is really 1.73 times higher than what you get if you run a traditional OLS regression model!
If you don’t account for the ICC • Beta will be about the same • But its standard error will be too small • So it will look more significant than it really is • So you conclude that there’s a significant effect, when maybe there really isn’t!
Underestimating your ICC can undermine statistical power Real power of cluster randomized trials according to discrepancy between a priori postulated and a posteriori estimated intraclass correlation coefficients Effect size=.25 Power=80% g=number of clusters M=average cluster size N=total number of subjects Source: Guittet, L., Giraudeau, B., & Ravaud, P. (2005). A priori postulated and real power in cluster randomized trials: mind the gap. BMC Medical Research Methodology 2005, 5:25
A study design issue • For the same total N, you’ll have more power if you have a large number of clusters (schools) with few individuals (students) per cluster. • Example: It’s better to have 100 schools with 10 students per school than 10 schools with 100 students per school. • But that’s more difficult logistically!
So how do we fix it? • Need to account for the clustering in the regression model • Can reduce the ICC somewhat by including covariates that explain part of the group effect (e.g., proxy measures of SES) • But that doesn’t completely eliminate the problem
PROC MIXED • Lets you include fixed effects (your regular IVs) and random effects (group effects) in the model • Proc mixed; • Class school; • Model dv = var1 var2; • Random intercept / subject=school; • Run;
/Solution option • Gives you the stats you usually want to report: • Parameter estimate • Standard error • Degrees of freedom • T-value • P-value • Model dv = fixedIV1 fixedIV2 / solution;
Example • Association between SES and smoking • Hypothesis: smoking is inversely associated with SES. • In this example…. • SES is the median income in the adolescent’s zip code • Smoking is a standardized average of ever tried smoking, ever smoked a whole cigarette, days in past month smoked, cigarettes per day • IRP Year 3 data
PROC REG ignores the clustering procreg; model smkscale3=income1000/stb; run;
REG output Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 12.53389 12.53389 30.12 <.0001 Error 1902 791.60096 0.41619 Corrected Total 1903 804.13485 Root MSE 0.64513 R-Square 0.0156 Dependent Mean 0.20623 Adj R-Sq 0.0151 Coeff Var 312.82168
REG output Parameter Estimates Parameter Standard Standardized Variable DF Estimate Error t Value Pr > |t| Estimate Intercept 1 0.21036 0.01480 14.21 <.0001 0 income1000 1 -0.08135 0.01482 -5.49 <.0001 -0.12485
Use PROC MIXED to take into account students clustered within schools. procmixed; class sch3; model smkscale3=income1000/solution; random intercept/sub=sch3 solution; run;
PROC MIXED output The Mixed Procedure Model Information Data Set WORK.COMPLETE3 Dependent Variable smkscale3 Covariance Structure Variance Components Subject Effect sch3 Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment
Class Level Information Class Levels Values sch3 24 2 5 6 18 21 22 30 39 45 49 50 51 52 55 57 58 59 61 62 63 64 66 67 68 (the numbers are the IDs of the schools)