Multilevel Regression Models

1 / 147

# Multilevel Regression Models - PowerPoint PPT Presentation

Multilevel Regression Models . sean f. reardon 17 june, 2004. Outline. What/Why Multilevel Regression? The Basic Multilevel Regression Model Growth Models A Taste of Advanced Topics. Part I.A. What are multilevel data and multilevel analysis?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Multilevel Regression Models

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multilevel Regression Models

sean f. reardon

17 june, 2004

Outline
• What/Why Multilevel Regression?
• The Basic Multilevel Regression Model
• Growth Models
• A Taste of Advanced Topics
Part I.A.

What are multilevel data and multilevel analysis?

Observations within the same unit may be more similar than observations in separate units, on average

What effect does this have on estimation and statistical inference?

What are multilevel data?
Examples of multilevel data with contextual clustering
• Observations of students, clustered within schools
• Observations of siblings, clustered within families
• Observations of individuals, clustered within countries, states, or neighborhoods
Examples of multilevel data with intra-person clustering
• Repeated test scores, clustered within students
• Multiple measures of a latent construct, clustered within persons
Other examples of multilevel data
• Patients, clustered within doctors
• Coefficient estimates, clustered within studies (meta-analysis)
• Widget sizes, clustered within factories
• And so on…
What is multilevel regression analysis?
• Also called
• Hierarchical Linear Models
• Mixed Models
• Multilevel Models
• Growth Models
• Slopes-as-Outcomes Models
Multilevel Regression Models
• A form of regression models
• Used to answer questions about the relationship of context to individual outcomes
• Used to estimate both within-unit and between-unit relationships (and cross-level interactions)
• e.g., within- vs. between-school relationships between SES and achievement
Assumptions of OLS
• Linearity
• Errors are normally distributed
• Errors are homoskedastic
• Errors are uncorrelated/independent
• Knowing the error term for one observation is not informative of the error term of any other observation
Some Example Data
• Data from Early Childhood Longitudinal Study-Kindergarten Cohort (NCES, 1998-2004)
• Longitudinal study of 21,000 kindergarten students in K class of 1998-99
• Followed through fifth grade (2003-04)
ECLS-K data
• Subsample
• 399 kindergarten students
• sampled from 17 schools
• Math Score:
• Fall kindergarten math test scores
• Administered 2-3 months into school year
• Age
• Age in months at time of math assessment
• Ranges from 60-79 months
What is the relationship between age and math scores?
• Note: this is NOT a growth model
• It is a cross-sectional model
• A growth model requires repeated measures, so we can observe intra-individual growth
OLS Regression:Math on Age

. reg math age

Source | SS df MS Number of obs = 422

-------------+------------------------------ F( 1, 420) = 32.38

Model | 1765.41947 1 1765.41947 Prob > F = 0.0000

Residual | 22896.5737 420 54.5156517 R-squared = 0.0716

Total | 24661.9932 421 58.5795563 Root MSE = 7.3835

------------------------------------------------------------------------------

math | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .4956666 .0871016 5.69 0.000 .3244572 .666876

_cons | -10.91381 6.049008 -1.80 0.072 -22.80391 .9762943

------------------------------------------------------------------------------

OLS Regression:Math on Age

Next look at the residuals from this model.

Are they homoscedastic? Normally distributed? Independent?

Residuals look correlated with each other within schools
• Formal test of this dependence – ANOVA
Random Effects ANOVA

One-way Analysis of Variance for resid: Residuals

Number of obs = 422

R-squared = 0.1638

Source SS df MS F Prob > F

-------------------------------------------------------------------------

Between s_id1 3750.169 17 220.59817 4.65 0.0000

Within s_id1 19146.405 404 47.392091

-------------------------------------------------------------------------

Total 22896.574 421 54.386161

Intraclass Asy.

correlation S.E. [95% Conf. Interval]

------------------------------------------------

0.13487 0.05204 0.03287 0.23687

Estimated SD of s_id1 effect 2.718128

Estimated SD within s_id1 6.884191

Est. reliability of a s_id1 mean 0.78517

(evaluated at n=23.44)

Consequences of Non-Independence of Residuals
• The computation of standard errors in OLS depends on the assumption of independence of errors
• If errors are not independent, then standard errors will, in general, be too small (so the probability of Type I errors is larger than it should be)
Two extreme examples
• n individuals observed from each of K schools (total of nK observations)
• if Yik= Yjk for all i and j in school k, then knowing k completely determines Y, so there are really only K unique observations
• In this case, we can just treat each school as a single observation (with outcome Y.k), and use OLS on the sample of K schools
Two extreme examples
• n individuals observed from each of K schools (total of nK observations)
• if YikYjk for all i and j and k, then knowing k tells us nothing about Y, so there are really nK unique observations
• In this case, there is no dependence of the errors, so we can use OLS on the sample of nK students.
When do we need multilevel regression?
• In the intermediate case, where knowing the school gives us some, but not complete information about Y.
• e.g., test scores vary both within and between schools
• e.g., individuals vary within and between neighborhoods
• e.g., mood varies both within individuals (over time) and between individuals

### Intermission I

Part II.A.

Farewell OLS

What we know so far
• Two observations within the same unit may be more similar than two observations chosen at random
• If the regression model does not explain all of the between-unit differences (and it is unlikely that they will), we will have correlated errors within units
• This is a violation of the independence of residuals assumption in OLS
• At a minimum, this results in incorrect standard errors (too small)
How do we allow dependence in the regression model?
• We want a model that explicitly allows the level of the outcome variable to vary across level-two units
• For example, we want to let the mean reading score differ across schools
• So let’s write a model that allows this
Some notation
• i indexes level-one units (people within schools, observations within persons)
• j indexes level-two units (e.g., schools, if we have students nested within schools)
• We will use r to denote a level-one residual, and u to denote a level-two residual
Farewell OLS: Our first multilevel model

Outcome for observation i in unit j

Farewell OLS: Our first multilevel model

Outcome for observation i in unit j

Intercept

Farewell OLS: Our first multilevel model

Outcome for observation i in unit j

Value of X for observation i in unit j

Intercept

Coefficient

Farewell OLS: Our first multilevel model

Outcome for observation i in unit j

Residual term specific to unit j

Value of X for observation i in unit j

Intercept

Coefficient

Farewell OLS: Our first multilevel model

Residual term specific to observation i in unit j

Outcome for observation i in unit j

Residual term specific to unit j

Value of X for observation i in unit j

Intercept

Coefficient

Farewell OLS: Our first multilevel model

Residual term specific to observation i in unit j

Outcome for observation i in unit j

Residual term specific to unit j

Value of X for observation i in unit j

Intercept

Coefficient

What is uj?
• A residual term
• Specific to unit j
• Common to all observations in unit j
• Subscript j, no subscript i
• Interpretation: the difference between the overall intercept and the intercept in unit j
What is rij?
• A residual term
• Specific to observation i in unit j
• Has a mean of 0, so any part of ij that is common to all observations within j has been removed
• So the rij’s may be independent
• Not guaranteed to be independent
Features of this model
• Note that: ij = uj + rij
• We also have:

Var(ij) = Var(uj + rij)

= Var(uj)+ Var(rij) + 2*Cov(uj,rij)

= Var(uj)+ Var(rij)

• We will come back to variance decomposition later
Features of this model
• The level of Yij – after adjusting for Xij – may vary across the units
• We have made no assumptions yet about the distribution of the uj’s or the rij’s.
• The relationship between X and Y does not depend on j (1 does not depend on j)
So how do we estimate this model?
• We want an estimate of 1 , the relationship between Xij and Yij.
• Two approaches:
• Fixed Effects estimator
• Random Effects estimator
The fixed effects estimator
• We have ‘absorbed’ the level-two error terms (the uj’s) into the intercept
• Now each aggregate unit has its own intercept; so between-unit variation is accounted for in the intercepts
• This solves the dependence problem with the rij’s (they may still not be independent, but not because of unexplained variation between-level-two units)
The fixed effects estimator
• Three methods of obtaining the fixed effects estimator 1 from this model:
• Dummy variables for each unit
• Change or difference scores
• Deviations from mean unit values
• All three are mathematically equivalent
• All can be estimated via OLS, with some adjustment of the degrees of freedom
The random effects estimator
• We treat the variance between units as consisting of parameter variance (true variance between units) and error variance (extra variance produced because of sampling)
• We treat units with larger samples as having more reliable estimated unit means
The random effects estimator
• So our estimate of the unit mean for a particular unit is a weighted average of the unit mean estimated in a fixed effects model and the overall mean—our estimates are shrunken toward the grand mean
• Let’s see a picture of this:
Part II.B.

The basic multilevel model

The basic multilevel model
• Random effects ANOVA is the simplest random effects model
• The random effects model is a very simple kind of multilevel model
• So we are building up here to the multilevel model
The multilevel model as a random effects model
• We write the random effects model as:

Yij = 0 + uj + rij

• We can also write it as:

Yij = 0j+ rij

.

.

The multilevel model as a random effects model
• We write the random effects model as:

Yij = 0 + uj + rij

• We can also write it as:

Yij = 0j+ rij

.

.

Level-1 model

The multilevel model as a random effects model
• We write the random effects model as:

Yij = 0 + uj + rij

• We can also write it as:

Yij = 0j+ rij

0j= 00 + uj

.

Level-1 model

The multilevel model as a random effects model
• We write the random effects model as:

Yij = 0 + uj + rij

• We can also write it as:

Yij = 0j+ rij

0j= 00 + uj

(here we’re using the 00 notation where before we used 0; this is the notation of HLM)

Level-1 model

Level-2 model

HLM Notation (Null Model)
• Level-1 model:

Yij = 0j+ rij

• Level-2 model

0j= 00 + uj

• Mixed model:

Yij= 00 + uj+ rij

HLM Notation
• Level-1 model:

Yij = 0j + 1jXij+ rij

• Level-2 model:

0j= 00 + uj

1j= 10

• Mixed model:

Yij= 00 + 10Xij +uj+ rij

HLM Notation
• Mixed model:

Yij= 00 + 10Xij +uj+ rij

Structural part

of the model

HLM Notation
• Mixed model:

Yij= 00 + 10Xij +uj+ rij

Structural part

of the model

Stochastic (random) part of the model

HLM Notation
• Mixed model:

Yij= 00+10Xij +uj+ rij

Fixed Effects

HLM Notation
• Mixed model:

Yij= 00 + 10Xij+uj+ rij

Random Effect

Fixed Effects

Part II.C.

Variance Decomposition

Reminder: The unconditional (null) random effects model
• The one-way random effects ANOVA model:

Yij = 00+ uj + rij

• We can also write it as:

Yij = 0j+ rij

0j= 00 + uj

• Useful as a baseline model
• Allows us to decompose the variance

Mixed (composite) model

Level-1 model

Level-2 model

Variance decomposition
• Var(Yij) = Var(uj) + Var(rij)

= 00 + 2

• Intraclass Correlation (): the proportion of the total variance in Yij that is between level-2 units

 = 00 /(00 + 2)

Multilevel Analyses
• Analytic Problems:
• Explain variation in means across units
• Estimate within- and between-unit relationships
• Distinguish contextual from compositional variation in means across units
• Explain how and why within-unit relationships differ across units
Explaining variation in means across units
• Why do some schools have higher mean achievement levels?
• Why do some hospitals have lower mortality rates?
• Why do some countries have higher infant mortality rates?
Explaining variation in means across units
• Means-as-outcomes regression (MLM)

Yij = 0j + rij

0j = 00 + 01Wj + uj

where Wj is a variable indicating some characteristic of unit j (no i subscript)

• Wj may be inherent to level-2
• School curriculum, doctor/patient ratio, regime type
• Wj may be a compositional property of unit j
• School racial composition, patient diagnosis composition, average maternal education level

Note: why don’t we just compute the means of Y in each unit and use OLS at level 2?

Explaining variation in means across units
• Called “means-as-outcomes” because the Wj’s can only explain mean differences in Yij across units (Wj only predicts the intercept, not the slope)
• uj is now a level-2 residual
• We can compute an R2 at both levels of the model:
• 00 from the null model is the total level-2 variance that can be explained
• R2between = [00 (null) - 00 (model)]/[00 (null)]
• R2within = [2 (null) - 2(model)]/[2 (null)]
Example (null model)

The outcome variable is MATH1

Final estimation of fixed effects:

----------------------------------------------------------------------------

Standard Approx.

Fixed Effect Coefficient Error T-ratio d.f. P-value

----------------------------------------------------------------------------

For INTRCPT1, B0

INTRCPT2, G00 20.369682 0.129474 157.327 867 0.000

----------------------------------------------------------------------------

Final estimation of variance components:

-----------------------------------------------------------------------------

Random Effect Standard Variance df Chi-square P-value

Deviation Component

-----------------------------------------------------------------------------

INTRCPT1, U0 3.25337 10.58439 867 3733.93573 0.000

level-1, R 6.46230 41.76128

-----------------------------------------------------------------------------

Intraclass correlation:  = 10.58439 / (10.58439 + 41.76128) = 0.202

Example: Means-as-outcomes model

The outcome variable is MATH1

Final estimation of fixed effects:

----------------------------------------------------------------------------

Standard Approx.

Fixed Effect Coefficient Error T-ratio d.f. P-value

----------------------------------------------------------------------------

For INTRCPT1, B0

INTRCPT2, G00 19.585257 0.133633 146.560 866 0.000

S_PRIVAT, G01 3.691165 0.286361 12.890 866 0.000

----------------------------------------------------------------------------

Final estimation of variance components:

-----------------------------------------------------------------------------

Random Effect Standard Variance df Chi-square P-value

Deviation Component

-----------------------------------------------------------------------------

INTRCPT1, U0 2.87151 8.24560 866 3070.97445 0.000

level-1, R 6.46291 41.76917

-----------------------------------------------------------------------------

Level-2 variance explained: R2 = (10.58439 - 8.24560) / (10.58439) = 0.221

Part II.D.

Variable Centering

Variable Centering
• An important topic with major implications for fitting and interpreting multilevel models…
Variable Centering
• An important topic with major implications for fitting and interpreting multilevel models…
• …which we will not have time to cover today.
Part II.E.

Random Coefficients and

the Full 2-Level Model

Individual-Level Model

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

Slope on X1 for unit j

Slope on XK for unit j

Slope on X2 for unit j

Intercept for unit j

Outcome for person i in unit j

Contextual Model

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

0j = 00

1j = 10

2j = 20

Kj = K0

In OLS, the intercept and slopes are fixed – they are the same in all units

Contextual Questions
• Does the intercept vary across units?
• Can we predict the intercepts using level-2 covariates (Z’s)?
• Do the slopes vary across units?
• Can we predict the slopes using level-2 covariates (Z’s)?
Does the intercept vary across units?

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

0j = 00+ u0j

1j = 10

2j = 20

Kj = K0

In the random effects model, the intercept varies around some grand mean intercept (00), and the slopes are fixed – they are the same in all units

Test H0: Var(u0j) = 0

Can we predict the intercepts?

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

0j = 00+ 01Z1 + 02Z2 + … + 0MZM + u0j

1j = 10

2j = 20

Kj = K0

Here, the Zm’s predict the intercept.

Test H0:0m = 0

Do the slopes vary across units?

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

0j = 00 + u0j

1j = 10+ u1j

2j = 20+ u2j

Kj = K0 + uKj

The intercept and each of the slopes varies around thei grand means (the k0’s)

Test H0: Var(ukj) = 0

Can we predict the slopes?

Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij

0j = 00 + 01Z1 + 02Z2 + … + 0MZM + u0j

1j = 10+ 11Z1 + 12Z2 + … + 1MZM + u1j

2j = 20+ 21Z1 + 22Z2 + … + 2MZM + u2j

Kj = K0+ K1Z1 + K2Z2 + … + KMZM + uKj

Here, the Zm’s predict the slopes.

Test H0:km = 0

Example
• ECLS-K Fall Kindergarten data
• 8,799 white and black students in 807 schools (618 public, 189 private schools)
• SES measured by standarized SES variable
• Outcome is Fall K math score
Research Questions
• What is the within-school relationship between race and SES and math scores?
• Do average math scores vary across schools?
• Are math scores higher in private schools?
• Does the relationship between SES and math scores vary across schools?
• Is the relationship between SES and math scores lower in private schools?
Example (cont.)
• See HLM command files lecture8[a-g].hlm and corresponding HLM output files lecture8[a-g].txt
• We will meet in the lab 2/11/04.

### Intermission II

Part III.A.

Growth Modeling

Growth Models
• Allow us to model development over time and to investigate correlates of between-person variation in growth trajectories over time.
• The fitted model describes an expected growth trajectory for each person, rather than a single expected value on an outcome measure
Examples
• Modeling inter-individual changes in some outcome
• School achievement, income, attitudes
• Modeling growth in national characteristics
• GNP, population, etc
• Modeling change in organizational outcomes
• Business profits, hospital mortality rates
The Growth Model
• Made up of a within-unit model of change and a between-unit model of inter-individual variation in change
• Requires repeated measures of outcome within each unit
• Requires multilevel error structure since errors are likely not independent
Within-unit model of change
• Outcome varies as a function of time

Yit = fi(time) + rit

• Simple case: f is linear:

Yit = 0i + 1i(timeit) + rit

Within-unit model of change
• Outcome varies as a function of time

Yit = fi(time) + rit

• Simple case: f is linear:

Yit = 0i + 1i(timeit) + rit

Intercept for unit i

Within-unit model of change
• Outcome varies as a function of time

Yit = fi(time) + rit

• Simple case: f is linear:

Yit = 0i + 1i(timeit) + rit

Growth slope for unit i

Intercept for unit i

Between-unit model
• Model between-unit differences in growth trajectories

Yit = 0i + 1i(timeit) + eit

0i = 00 + 01(Xi) + r0i

1i = 10 + 11(Xi) + r1i

Between-unit model of intercept

Between-unit model of slope

Within-unit model for change
• Simple case: f is linear:

Yit = 0i + 1i(timeit) + rit

• Need to specify zero-point for time
• Pick a point that is interpretable and substantively meaningful for your study
• e.g., age; time in school, time since institution opened, calendar time, etc…
• Affects estimation and interpretation of the intercept
Defining time

TOLERANCEit = 0i + 1i(AGEit) + eit

0i = 00 + r0i

1i = 10 + r1i

Final estimation of fixed effects:

----------------------------------------------------------------------------

Standard Approx.

Fixed Effect Coefficient Error T-ratio d.f. P-value

----------------------------------------------------------------------------

For INTRCPT1, B0

INTRCPT2, G00 -0.081187 0.511521 -0.159 15 0.876

For TIME slope, B1

INTRCPT2, G10 0.130812 0.043074 3.037 15 0.009

----------------------------------------------------------------------------

Defining time

TOLERANCEit = 0i + 1i(AGEit-11) + eit

0i = 00 + r0i

1i = 10 + r1i

Final estimation of fixed effects:

----------------------------------------------------------------------------

Standard Approx.

Fixed Effect Coefficient Error T-ratio d.f. P-value

----------------------------------------------------------------------------

For INTRCPT1, B0

INTRCPT2, G00 1.357750 0.074445 18.238 15 0.000

For TIME slope, B1

INTRCPT2, G10 0.130812 0.043074 3.037 15 0.009

----------------------------------------------------------------------------

Modeling Inter-personal variation in growth trajectories

TOLERANCEit = 0i + 1i(AGEit-11) + eit

0i = 00 + 01(MALEi) + r0i

1i = 10 + 11(MALEi) + r1i

Final estimation of fixed effects:

----------------------------------------------------------------------------

Standard Approx.

Fixed Effect Coefficient Error T-ratio d.f. P-value

----------------------------------------------------------------------------

For INTRCPT1, B0

INTRCPT2, G00 1.355556 0.102740 13.194 14 0.000

MALE, G01 0.005016 0.155328 0.032 14 0.975

For TIME slope, B1

INTRCPT2, G10 0.102333 0.058323 1.755 14 0.101

MALE, G11 0.065095 0.088177 0.738 14 0.473

----------------------------------------------------------------------------

Parameters of the growth model

Yit = 0i + 1i(timeit) + eit

0i = 00 + 01(Xi) + r0i

1i = 10 + 11(Xi) + r1i

Structural parameters of the growth model

0i: true intercept for individual i

1i: true slope for individual i

00: population average intercept (for individuals with X=0)

01: population average difference in level-one intercept for individuals with one unit difference in X

10: population average slope (for individuals with X=0)

01: population average difference in level-one slope for individuals with one unit difference in X

Stochastic (random) parameters of the growth model

Var(eit) = e2: level–1 residual variance

Var(r0i) = 02: level–2 residual variance in true intercept (0i)

Var(r1i) = 12: level–2 residual variance in true slope (1i)

Cov(r0i, r1i) = 01: level–2 residual covariance in true intercept (0i) and true slope (1i)

Part III.B.

Growth Modeling

Growth Modeling Issues
• Timing of observations
• Centering the time variable
• Variable numbers of observations
• Missing observations
• Time-varying covariates
• Slope-intercept covariances
• Non-linear growth curves
Timing of observations
• If each person (unit) has the same number of observations, and if the timing of observations is the same for all units,the data (and design) are said to be balanced.
• In this case, the growth model is equivalent to a repeated measures ANOVA
• But if not, the growth model is mode flexible than the repeated measures ANOVA model
Centering time in growth models
• Group-mean centering time results in unbiased estimate of average within-person growth rate
• Any other centering of time results in biased estimate of average within-person growth rate if individuals’ mean times are correlated with their mean outcomes
• In balanced design, mean time is the same for all persons, so centering does not affect slope estimate
• In unbalanced design, it may be necessary to center time
• Centering time affects interpretation of the intercept
Variable numbers of observations
• Number and timing of observations may differ by design or because of missingness
• Types of missingness (see S&W p. 157-159):
• MCAR: missing completely at random
• CDD: covariate dependent dropout
• MAR: missing at random
• The growth model estimates are unbiased under any of these types of missingness
Time-varying covariates
• So far, we have considered only the relationship between stable person-level covariates and the growth trajectory
• A time-varying covariate is a covariate whose value changes over time
• Examples from S&W chapter 5
Examining the covariance matrix in growth models

Yit = 0i + 1i(timeit) + eit

0i = 00 + 01(Xi) + r0i

1i = 10 + 11(Xi) + r1i

Var(r0i) = 02: level–2 residual variance in true intercept (0i)

Var(r1i) = 12: level–2 residual variance in true slope (1i)

Cov(r0i, r1i) = 01: level–2 residual covariance in true intercept (0i) and true slope (1i)

The slope-intercept covariance
• Do individuals with high initial values of Y have faster growth rates of Y?
• Is r0i correlated with r1i?
• It depends on how we center the time variable
• It is possible to observe any correlation (-1 to +1) between r0i and r1i depending where we define the intercept.
Non-linear growth curves
• So far we have assumed that each individual’s growth trajectory is linear(constant growth rate over time)
• Now we consider cases where growth may be non-linear
• Polynomial curve
• Piecewise linear
• Discontinuous
Polynomial growth curves

Yit = 0i + 1i(timeit) + 2i(time2it) + eit

0i = 00 + r0i

1i = 10 + r1i

2i = 20 + r2i

Piecewise linear growth curves
• Piecewisegrowth trajectory:

Yit = 0i + 1i(time1it) + 2i(time2it) + eit

0i = 00 + r0i

1i = 10 + r1i

2i = 20 + r2i

Discontinuous growth curves
• Discontinuousgrowth trajectory:

Yit = 0i + 1i(timeit) + 2i(eventit) + eit

0i = 00 + r0i

1i = 10 + r1i

2i = 20 + r2i

Discontinuous growth curves
• Discontinuousgrowth trajectory (with time-varying growth rate):

Yit = 0i + 1i(timeit) + 2i(eventit) + 3i(timeit*eventit) + eit

0i = 00 + r0i

1i = 10 + r1i

2i = 20 + r2i

3i = 30 + r3i

### Intermission III

Part IV.A.

3+ Level Models

Examples of 3-level data
• Students > classrooms > schools
• Students > schools > districts
• Patients > Doctors > Hospitals
• Children > Families > Neighborhoods
• Repeated observations > individuals > contexts
• For example:

Repeated obs > students > classrooms > schools > districts > countries > planets …

Data may have more than 3 levels—but the more levels, the more data needed to model relationships.

Level-1 error/residual: deviation of observation i from cluster j mean

True grand mean intercept

Level-3 error/residual term: deviation of cluster k mean from grand mean

Level-2 error/residual term: deviation of cluster j mean from cluster k mean

The 3-level null model

Level 1 model

Level 2 model

Level 3 model

3-level variance decomposition
• 02 = Var(eijk): true within level-1 variance (variance within j, between i)
• 002 = Var(r0jk): true level-2 variance (variance within k, between j)
• 0002 = Var(u00k): true between k level-3 (variance within k, between j)
• Var(Yijk)= 02 + 002 + 0002
3-level variance decomposition
• Remember the ICC from the 2-level model: Proportion of true variance in Y that lies between clusters
• ICC = 002/(02 + 002)
• We apply the same logic to the 3-level model:
• Proportion of total variance that lies between level-3 units:

0002 /(02 + 002 + 0002)

• Proportion of level-1 + level –2 variance that lies between level-2 units:

002 /(02 + 002)

Part IV.B.

Meta-Analysis

Meta-Analysis
• The problem: we often have a lot of studies, each trying to estimate the same parameter
• effect of small classes on learning rates
• effect of welfare receipt on income, maternal depression, child welfare, etc.
Multiple studies
• Suppose we conducted a number of similar studies to estimate the effect of treatment T on outcome Y.
• Each study gives us an estimate of d, the standardized effect of T on Y.
• The estimates of d may vary across studies – Why?
• We would like to estimate the true average effect of T in the population
Possible reasons for varying estimates across studies
• The d’s may vary because of sampling variance (each study is conducted with a different sample)
• The d’s may vary because of differences in the populations of each study sample
• The d’s may vary because of differences in the study design (e.g., different instruments)
• The d’s may vary because of differences in the treatment studied (differences in implementation, duration, etc.)
Approach 1
• In each study, we fit a regression model to estimate the treatment effect 1:
• But the treatment effect may vary across studies:
• Here 10 is the true mean effect of T across studies, and u1j is the deviation of the effect in study j from this mean.
Approach 1
• If we had access to all the data from each study, we could fit a multilevel model:
• But what if we don’t have access to all the original data?
Approach 2
• Recall, that in each study, we have a regression model like this:
• Typically, each study will report the estimate of 1 and some measure of its sampling variance (its standard error)
• Remember (lecture 4) that we estimate the grand mean by weighting individual estimates by their precision (the inverse of the variance of the estimate)
• We use the standard errors of the study-specific effect estimates to construct these weights, so we don’t need the original data.
Key Points
• We need from each study an estimate of the treatment effect and its sampling variance (standard error)
• The treatment effects must be measured in the same metric across all studies
Part IV.C.

Cross-Classified Data and Models

Cross-Classified Data
• Observations nested in multiple, non-hierarchical units
• e.g. persons nested in schools and neighborhoods
• patients nested in multiple doctors/clinics
• students nested in multiple classrooms
• repeated observations on relationship dynamics nested in partners (where partner changes are common)
Cross-Classified Data
• Yijk is outcome Y for person i in neighborhood j and school k
Cross-Classified Data
• In 3-level hierarchical data, each observation can be decomposed into:
• a grand mean (common to all observations)
• a row-specific (e.g. school) deviation from the grand mean
• a column-specific (e.g. nbhd) deviation from the grand mean
• a row*column-specific deviation
• and individual deviation from the row*column mean
Resources:
• Textbooks
• Raudenbush & Bryk (2002) Hierarchical Linear Models. Sage.
• Singer & Willett (2003) Applied Longitudinal Data Analysis
• Multilevel Listserv
• http://www.nursing.teaching.man.ac.uk/staff/mcampbell/multilevel.html
• http://www.jiscmail.ac.uk/lists/multilevel.html
Resources:
• Software
• HLM
• MLWin
• SAS (PROC MIXED)
• SPSS
• Stata (-gllamm-)