Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)?

Longitudinal Data Analysis:Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University

Road Map • Why do we want to analyze longitudinal data under multilevel modeling (MLM) framework? • Dependency issue • Advantages of using MLM over traditional Methods (e.g., Univariate ANOVA, Multivariate ANOVA) • Review of important parameters in MLM • How can we do it under SPSS?

Regression Model: e.g. DV: Test Scores of 1st Year Grad-Level Statistics IV: GRE_M (GRE Math Test Score) 150 Students (i = 1,…,150) One of the important Assumptions for OLS regression? (Observations are independent from each other)

Ignoring the clustered structure (or dependency between observations) in the analyses can result in: • Bias in the standard errors *Bias in the test of significance and confidence interval (Type I errors: Inflated alpha level (e.g. set α=.05; actual α=.10))  non-replicable results

Advantages of MLM over the traditional Methods on analyzing longitudinal data • Univariate ANOVA—Restriction on the error structure: Compound Symmetry (CS) type error structure (higher statistical power but not likely to be met in longitudinal data) • Multivariate ANOVA—No restriction on the error structure: Unstructured (UN) type error structure (often too conservative, lower statistical power); can only handle completely balanced data (Listwise deletion) • More…

Analyzing Longitudinal Data: • Example • (Based on Actual Data—variable names changed for ease of presentation): Compare two different teaching methods on Achievement over time • Teaching Methods: 78 students are randomly assigned to either: A. Lecture (Control group; 39 students) or B. Computer (Treatment group; 39 students) • 4 Achievement (Ach) scores (right after the course, 1 year after, 2 year after, & 3 year after) were collected from each student after treatment (i.e. statistics course)

Achievement Computer Time=0 : Immediately posttest measure Lecture 1 3 2 Time (Year)

Student 36 Acht β1 e3 e2 e1 e0 β0 Timet 1 2 3 0 Multi-Level Model (MLM) A Simple Regression Model for ONE student (student 36) • Note: Start with simple growth model • Introduce treatment in example at end (t=0,1,2,3) V(eti)=σ2 et: Captures variation of individual achievement scores from the fitted regression model WITHIN student 36

Student 27 Student 36 Achti β1_Student 36 β1_Student 27 Β0_Student 36 Student 52 Β0_Student 27 Β0_Student 52 Timeti 1 2 3 0 Compare to (Micro Level Model) (i=1,2,3,…,78)

Student 27 Student 36 Achti β1_Student 36 β1_Student 27 Β0_Student 36 Student 52 Β0_Student 27 Β0_Student 52 Timeti 1 2 3 0 Grand Intercept Variance of the intercepts Grand Slope Variance of the Slopes

Ach γ00 Time 0 Captures the deviations of the 78 slopes from the Grand slope γ10 Student 27 Captures the deviations of the 78 intercepts from the grand intercept γ00 Student 36 Overall Model Student 52 No variation among the 78 intercepts

Ach Student 27 γ10 Overall Model γ10 Student 36 γ10 Student 52 γ10 No variation among the 78 slopes Time

Ach Overall Model Time

Summary • G: Captures between- student differences • R: Captures within-student random errors Variance of the Intercepts Grand Intercept Variance of the Slopes Grand Slope Covariance between Intercepts and Slopes V(eti)=σ2

MACRO vs. MICRO • UNITS:

MACRO vs. MICRO (Cont.) • MODELS: MICRO level model: regression model fits the observations within each MACRO unit MACRO level model: model captures the differences between the overall model and individual regression models from different macro units

Dependent Variable: Math Achievement (Achieve, Repeat measures /Micro Level) • Predictors: • Repeated measure (MICRO) Level Predictor: Time (& any time varying covariates) • Student (MACRO) Level Predictor: Computer (Different teaching methods) (& any time-invariant variables such as gender)

Data format under MANOVA approaches: • Student Treat T0 T1 T2 T3 • S1 0 5 3 2 3 • S2 1 5 25 -- 33 • S3 1 -- 19 17 26 • S1 has responses on all time points • S2 has missing response at time 2 (indicated by "--") • S3 has missing response at time 0. • MANOVA: only retains S1 in the analysis (SPSS Data Format)

Data format for MANOVA Student Treat T0 T1 T2 T3 S1 0 5 3 2 3 S2 1 5 25 -- 33 S3 1 -- 19 17 26 Student Treat Time DV S1 0 0 5 S1 0 1 3 S1 0 2 2 S1 0 3 3 S2 1 0 5 S2 1 1 25 S2 1 3 33 S3 1 1 19 S3 1 2 17 S3 1 3 26 Data format for Multilevel Model (All 3 students are included in the analyses)

Student Treat Time DV S1 0 0 5 S1 0 7 3 S1 0 12 2 S1 0 13 3 S2 1 1 5 S2 1 3 9 S2 1 4 5 S2 1 6 25 S3 1 3 18 S3 1 15 19 S3 1 28 17 S3 1 31 26 Can you transform this dataset back into multivariate format???

Questions • 1. On average, is there any trend of the math achievement over time? • 2. Are there any differences between students on the trend of math achievement over time? (Do all students have the same trend of math achievement over time?)

Grand Intercept Micro Level (Level 1): Macro Level (Level 2): Grand Slope

Micro Level V(U1i)=τ11 V(U0i)=τ00 Combined Model Between School Differences Grand Intercept Within School Errors V(eti)=σ2 Grand Slope Macro Level

Red: Computer Blue: Lecture

MAti =γ00 + γ10Timeti+U0i +U1iTimeti+ eti SPSS MIXED Syntax: MIXED mathach with Time /METHOD = REML /Fixed = intercept Time /Random = intercept Time |Subject(Subid) COVTYPE (UN) /PRINT = G SOLUTION TESTCOV. Execute. DV with Continuous IV by Categorical IV 1 2 Default: REML (Restricted Maximum Likelihood) Other option: ML (Maximum Likelihood) Specify random effects: Effects capture the between- School differences 3 Captures the overall model 4 5 Structure of G matrix (Unstructured) Produce asymptotic standard errors and Wald Z-tests for The covariance Parameter estimates Print G matrix Requests for regression coefficients identity variable for Macro level Units (e.g., Subid)

SPSS Output Basic Information

(γ00) Average MA score at Time=0 (γ10)Average Trend of the MA score Requested by the “Solution” command in the PRINT statement (Line 5)

Requested by the “TESTCOV” command in the PRINT statement (Line 5) σ2 τ00 τ01 τ10 τ11 Asymptotic standard errors and Wald Z-tests Requested by the “G” command in the PRINT statement (Line 5) τ00 τ01 τ10 τ11

Can I have a simpler G matrix (i.e. τ01= τ10 =0) • Compare Likelihood Ratio Test! With -2LL: ? -2LL: 2509.873

Syntax for fitting simpler G SPSS syntax /random = intercept Time |subject(Subid) COVTYPE (Diag)

Choose This (Model with τ01= τ10 =0) -2 Res Log Likelihood 2509.873 (or Deviance) (Model with τ01= τ10 ≠0) -2 Res Log Likelihood 2509.873 (or Deviance) χ2(1)=.000, p=1.00

Compare to model with τ11= 0 SPSS syntax /random =intercept |subject(Subid) COVTYPE (Diag)

Choose This (Model with τ01=τ10=0, τ11≠0) -2 Res Log Likelihood 2509.873 (Model with τ11=τ01=τ10= 0) -2 Res Log Likelihood 2524.387 χ2(1)=14.51, p<.001 Halved P-value

Result of the final Model σ2 τ00 τ11 γ00 γ10

1. On average, is there any trend of the math achievement over time? • 2. Are there any differences between students on the trend of math achievement over time? (Or, do all students have the same trend of math achievement over time?) τ00 = 201.71 τ11 = 14.56 • Q3. If Yes to Q2, what causes the differences?

Null Hypothesis: Different teaching methods have SAME effects on achievement over time (H0: γ11 = 0) • Micro Level (Level 1): MAti = 0i + 1i Timeti+ eti (Variance of eti = σ2) • Combined Model: MAti =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi + U0i + U1i SESti + eti • Macro Level (Level 2): β0i =γ00 + γ01 Compi + U0i β1i =γ10 + γ11Compi + U1i (Variance of U0i = τ00; Variance of U1i = τ11)

MAij =γ00 + γ01 Compi + γ10 Timeti + γ11Timeti*Compi+ U0i + U1i Timeti + eti • SPSS PROC MIXED Syntax: MIXED mathach with Time /METHOD = REML /Fixed = intercept Comp Time Time*Comp /Random = intercept Time |Subject(Subid) COVTYPE (Diag) /PRINT = G SOLUTION TESTCOV. Execute.

With Comp in the Macro models Without Comp in the Macro models

(WITH “Comp” in the model) (WITHOUT “Comp” in the model) Proportion of variance in the intercept ( ) explained by “Comp”=(201.71-176.16)/201.71 = .13 (or 13%) Proportion of variance in the slope ( ) explained by “Comp”=(14.56-9.81)/14.56 = .33 (or 33%)

Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 50.3769 2.4764 76 20.34 <.0001 time 0.5756 0.8445 232 0.68 0.4962 computer 7.7583 3.5021 76 2.22 0.0297 time*comp 3.6009 1.1943 232 3.02 0.0029

Overall Model for students in the Lecture method group Overall Model for students in the Computer method group Random Effect V(eti)=σ2=90.00

Achievement Computer Time=0 : Immediately posttest measure Lecture Time (Year)

Conclusion • Advantages of using MLM over traditional ANOVA approaches for analyzing longitudinal data: • 1. Can flexibly model the variance function • 2. Retain meaning of the random effects • 3. Explore factors which predict individual differences in change over time (e.g., Treatment effect) • 4.Take both unequal spacing and missing data into account

Take Home Exercise A clinical psychologist wants to examine the impact of the stress level of each family member (STRESS) on his/her level of symptomatology (SYMPTOM). There are 100 families, and families vary in size from three to eight members. The total number of participants is 400. a) Can you write out the model? (Hint: What is in the micro model? What is in the macro model?) b) Can you write out the syntax (SPSS) to analyze this model?

c) In designing the study, what possible macro predictors do you think the clinical psychologist should include in her study? (e.g. family size?) d) In designing the study, what possible micro predictors do you think the clinical psychologist should include in her study? (e.g. participant’s neuroticism?) e) Can you write out the model? (Hint: What is in the micro model? What is in the macro model) f) Can you write out the syntax (SPSS) to analyze this model?

b) SYMPTOMij = γ00 + γ10 STRESSij + U0j + U1j STRESSij + eij SPSS Syntax: MIXED Symptom with Stress /fixed = intercept Stress /random = intercept Stress |subject (Family) COVTYPE (UN) /PRINT = G SOLUTION TESTCOV. execute.

a) Micro-level model: SYMPTOMij = β0j + β1j STRESSij + eij Macro-level model: β0j = γ00 + U0j β1j = γ10 + U1j Combined model: SYMPTOMij = γ00 + γ10 STRESSij + U0j + U1j STRESSij + eij

THE END! THANK YOU!

Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)?