Introduction to Structural Equation Modeling (SEM) Day 2: November 15, 2012

Introduction to Structural Equation Modeling (SEM)Day 2: November 15, 2012 Rob Cribbie Quantitative Methods Program – Department of Psychology Coordinator - Statistical Consulting Service COURSE MATERIALS AVAILABLE AT: WWW.PSYCH.YORKU.CA/CRIBBIE

What are we going to do today? Confirmatory factor analysis Full structural equation models

Review from last week…. • Definitions of SEM • SEM lingo • SEM assumptions • Model Identification • Fit indices (RMSEA, CFI, TLI, IFI, SRMR)

Least Squares Regression Example • This example helps to bridge the gap between regression and SEM • Study Description: • 100 Psychology Graduate Students • Outcome: Depression • Predictors: • Hours Worked per Week • Quality of Relationship with Supervisor • Research Productivity

Multiple Regression Output depression ~ hours + rel_superv + res_prod Coefficients: Estimate Std. Error t Pr(>|t|) (Intercept) 8.81743 2.14132 4.118 8.1e-05 *** hours 0.13045 0.05561 2.346 0.02106 * rel_superv -0.05621 0.08027 -0.700 0.48544 res_prod -0.35047 0.10379 -3.377 0.00106 ** Residual standard error: 3.005 on 96 degrees of freedom Multiple R-squared: 0.1716, Adjusted R-squared: 0.1457 F-statistic: 6.628 on 3 and 96 DF, p-value: 0.0004072

SEM Model

SEM Results Squared Multiple Correlation (R2) = .17

Regression/SEM Example Summary • The only difference between the Regression and SEM analyses is the estimation method • SEM: Maximum Likelihood • Iterative attempt to find parameter values that fit the data • Regression: Least Squares • Parameter values that minimize the residuals (i.e., observed – predicted) • Regression and SEM produced parameter estimates and r-squared values for depression that were almost identical

Confirmatory Factor Analysis (CFA) • A structural equation model often consists of two components: • a measurement model linking a set of observed variables to a usually smaller set of latent variables • a structural model linking the latent variables through a series of specified relationships. • CFA corresponds to the measurement model of SEM

Exploratory Factor Analysis (EFA) versus CFA • With EFA, investigators are interested in exploring patterns within the data, whereas with CFA, investigators are interested in explicitly testing specific hypotheses about how the observed variable are related • Exploratory factor analysis (EFA • imposes no substantive constraints on the data • there are no restrictions on the pattern of relationships between observed and latent variables (e.g., cross-loadings are permitted and the number of factors is generally not fixed) • EFA is data driven

EFA • For EFA, each common factor is assumed to affect every observed variable, with the common factors being either all correlated or uncorrelated (i.e., orthogonal or oblique factors) • Can be estimated with ordinary statistical software packages (e.g., R, SPSS) • Once the model is estimated, factor scores, proxies of latent variables, are calculated and used for follow-up analysis • e.g., use factor scores to predict a different outcome in a separate analysis

CFA • Confirmatory factor analysis (CFA), on the other hand, is theory- or hypothesis driven. • WithCFA it is possible to place substantively meaningful constraints on the factor model • For example, researchers can specify the number of factors, which observed variables should load on which latent variables, which factors should be correlated, etc. • Unlike EFA, CFA produces many goodness-of-fit measures to evaluate the model • Recall that it is the constraints on the model (e.g., limited number of factors, observed variables that load on only one factor) that determines how well the model fits (i.e., are those constraints reasonable)

Path Diagrams with Latent Variables • Measurement models • Generally, latent variables “cause” the observed/indicator variables (reflective indicators), as shown by single-headed arrows pointing away from the latent variable and towards the observed variables • E.g., Latent depression with indicators representing scores on three different depression scales (latent depression “causes” scores on the observed variables) • However, in some instances the observe variables ‘combine’ to determine the latent variable (formative indicators) • E.g., Latent socioeconomic status variable with indicators income, occupation prestige, and level of education

Sample CFA

Sample CFA with higher order factor

Model Identification- Review • Need to scale the latent variables in order to identify the model • 1) set one of the regression coefficients for one indicator equal to 1. • All other indicators are interpreted relative to this value • OR • 2) set the variance of the latent variable to 1 (standardizing) • Most common method for CFA

Confirmatory Factor Analysis • Indicators are assumed to be normally distributed variables • What about items from a scale? • Likert-type items are by nature categorical: covariances are smaller than they should be, model fit tests are biased, parameter estimates and std. errors are biased. • However, research has found that ordered variables with more than 5 categories can often be treated as continuous • For categorical items (e.g., items with less than 5 categories), better to use polychoric correlations or item response theory

CFA Example • Greenglasset al. were interested in the influence of a construct called “energized state”; more specifically whether it could influence coping and stress outcomes • Do individuals experiencing this energized state cope better with stress? • A measurement model was needed to examine the validity of this construct • Several positive personality variables were measured as indicative of an energized state • Optimism, positive affect, tendency to perceive difficulties as a challenge, and vigor • N = 404 • Variance of the latent variables is set at 1

Example CFA

CFA • Review: Stages of Modeling • Ensure that model is identified • Screen SEM assumptions: • multivariate outliers, univariate normality, multivariate normality, linearity in the relationships between your variables • Check overall model fit • Check standardized residual covariance matrix/modification indices if model fit is poor • Post hoc/exploratory analyses: Make theoretically appropriate changes to model and re-fit • Interpret parameter estimates.

CFA example • Model fit was good (although RMSEA is a little high):

Example CFA • Parameter Estimates

Example CFA • Standardized Parameter Estimates: • Numerous rules of thumb, but standardized parameter estimates are often expected to be >.5

Example CFA Model is a reasonable fit to the data All loadings on the general factor are statistically significant There is a question regarding whether optimism is an important contributor to the latent construct since its loading is relatively small Could use “energized state” variable as part of a full structural equation model

Full Structural Equation Models • Once you have established the measurement models for your latent variables, you can now evaluate the structural portion of your hypothesized model • i.e., the relationships among the latent variables and observed variables of interest.

Full SEM Example • A researcher was interested in whether attitudes regarding quantitative ability at the start of a statistics course predicted quantitative performance at the end of the course • 2 latent variables • Quant Attitudes – 3 indicators • Anxiety, hinderances to doing well in a stats course, self-efficacy • Quant Performance – 2 indicators • Average homework grade, average exam grade • One indicator for each latent variable had its loading fixed to 1

Quantitative Attitudes and Performance

Full SEM example • N = 129 • χ² (4) = 3.23, p = .519 (Excellent!) • PROBLEM: • The following variances are negative • e10 is the residual variance for the observed variable “exam average”

Improper Solutions • Tempting to look at the “problem” variables (e.g., the residual variance for exam average) and deal with the issue by “fixing” the variance to a positive value (e.g., .01) • In some instances this is necessary, especially when the value is close to 0 and all other parts of the model fit well • Better to think carefully about the variables in the whole model • Is something misspecified? • Are there important parameters missing?

Full SEM Example • If we look through the output, we see that homework average is not a significant indicator of quantitative performance • Further, if we go back to the bivariate correlations among our variables, we further see that homework average is not correlated with any of the indicator variables for quantitative attitudes • Perhaps exam average alone is a better representation of quantitative performance?

Quantitative Attitudes and PerformanceNo Homework Average χ² (2) = 1.8, p = .413 CFA = 1.00 IFI = 1.00 TLI = 1.00 RMSEA = 0 90% CI = (0, .169) SRMR =.026

Quantitative Attitudes and PerformanceParameter Estimate • Note: The model on the previous slide is identical to just including the observed ‘exam average’ variable as the outcome (instead of creating a latent ‘quantitative performance’ variable)

Quantitative Attitudes and PerformanceSummary • Model fit well with homework average present, but it was not contributing to the model (and in fact it was leading to other issues) • Without homework average, the latent Quantitative Attitudes variable was a significant predictor of Quantitative Performance (now simply exam scores), explaining approximately 20% of the variability in Quantitative Performance • The relationship was negative, as expected, with higher levels of negative attitudes predicting lower scores (and vice versa)

SEM Example Two • Evaluate the effects of a sixth grade intervention for reducing early sexual behaviours • More specifically, do these sixth grade intervention strategies reduce the amount of sexual behaviour in grade 7 (time 2) and grade 8 (time 3) • ‘Sexual Gestalt’ is a latent variable that is made up of psychosocial variables related to the individual’s views toward early sexual behaviour

SEM Example Two Sexual 1 e1 Residual Limits 1 1 Sexual Behavior (T2) Peer 1 e2 Norms Sexual Sexual Gestalt Behavior (T3) Unwanted 1 e3 1 Advances Residual Parental 1 e4 Views

Results for Model • Chi-square test of absolute model fit • Chi-square = 33.42 with 9 DF, p < .0001 • Our model does not fit the data on an absolute basis (which is extremely common given that sample sizes are usually large and any non-zero residuals will result in a significant chi-square) • Does our model fit the data on a descriptive or approximate basis? • Descriptive fit measures • CFI = .96 • RMSEA = .062 • SRMR = .04 • Reasonable fit …. but can we do better???

SEM Example: Model Modification • Largest standardized residual covariances: • Sexual Limits - Sexual Behavior (t3): -2.43 • Peer Norms - Sexual Behavior (t3): -1.84 • Modification indices suggest that Sexual Limits to Sexual Behavior (t3) is the single best path to free for estimation • Index value = 9.10 • what the (minimum) expected drop in the model chi- square fit statistic would be if we were to free this parameter • Modification indices suggest that Sexual Gestalt to Sexual Behavior (t3) is the next best path to free for estimation

SEM Example: Model Modification • Makes more sense (probably) to connect Sexual Gestalt to Sexual Behavior at time 3 than it does to connect Sexual Limits to Behavior • It is important to always consider which of the possible modifications makes most sense (in terms of parsimony, theory, etc.), instead of blindly making modifications • Re-specify model with one additional path from the Sexual Gestalt factor to Sexual Behavior at time 3

SEM Example: Modified Model Sexual 1 e1 Residual Limits 1 1 Sexual Behavior (T2) Peer 1 e2 Norms Sexual Sexual Gestalt Behavior (T3) Unwanted 1 1 e3 Advances Residual Parental 1 e4 Views

SEM Example: Modified Model Results • Chi-square = 17.91 with 8 DF, p =.02 • 15 unit drop in chi-square value for only one DF • a good tradeoff! • Approximate Fit Indices: • CFI = .98 • RMSEA = .04 • SRMR = .02 • GFI = .98 • AGFI = .95 • No standardized residual covariances exceed |1.50|; most are below |1.00|

SEM Example: Modified Model Results .52 .29 Sexual 1 e1 Residual Limits 1 1.00 Sexual Behavior (T2) .27 Peer 1 -2.20* e2 Norms -.95* .31* .28 Sexual -1.49* Sexual Gestalt Behavior (T3) -.28* .07 Unwanted 1 1 e3 Advances 2.83 Residual -.30* Unstandardized estimates .39 Chi-square (8 df) = 17.912 Parental 1 e4 Views p = .022

SEM Example: Modified Model Results .49 Sexual e1 Residual Limits .72 .70 Sexual .48 Behavior (T2) Peer -.85 e2 Norms -.69 .21 .33 Sexual -.39 Sexual Gestalt Behavior (T3) .25 -.50 Unwanted e3 Advances Residual -.25 .06 Parental e4 Views

Introduction to Structural Equation Modeling (SEM) Day 2: November 15, 2012

Introduction to Structural Equation Modeling (SEM) Day 2: November 15, 2012

Presentation Transcript

Structural Equation Modeling : A simple-complex multivariate technique

Structural Equation Modeling 3

Structural Equation Modeling

Structural Equation Modeling (SEM) Essentials

Introduction to structural equation modeling

Structural Equation Modeling (SEM)

Structural Equation Modeling

What is? Structural Equation Modeling ( A Very Brief Introduction )

Structural Equation Modeling Using Mplus

CJT 765: Structural Equation Modeling

Structural Equation Modeling

Structural Equation Modeling

CJT 765: Structural Equation Modeling

EPSY 651: Structural Equation Modeling I

Structural Equation Modeling