Stat 415a: Structural equation Modeling

Stat 415a: Structural equation Modeling Fall 2013 Instructor: Fred Lorenz Department of Statistics Department of Psychology

Outline Part 1: Introduction Part 2: Classical path analysis Part 3: Non-recursive models for reciprocity Part 4: Measurement models Part 5: Model integration Part 6: Concluding comments

Part 1: Introduction The language of structural equations Motivation: Why structural equation modeling? Historic overview of SEM Introduction to the data sets used in class examples and homework assignments

Part 1.1: The language of SEM • Path diagrams: • Denote concepts (ellipse) and their measures (box) • Distinguish uni-directed & bi-directed arrows • Notation & subscripts • Distinguish • explanatory (independent) & response (dependent) variables • endogenous & exogenous variables

Part 1.2: Motivation Why structural equation modeling? * Explanation rather than prediction * Conceptual simplification (parsimony) * Reliability (random measurement error) * Validity (systematic measurement error)

Part 1.2, cont. . . . • Explanation • We have causal interests. . . . Does X cause Y? • We want to know “why?” • Is there a 3rd variable that mediates between X and Y? • Or is there a 3rd antecedent variable that explains both?

Part 1.2, cont. . . • Conceptual simplification * Ockum’s razor: all things equal, we prefer the simplest (most parsimonious) explanation * And we avoid the “bulkanization” of concepts * We distinguish “latent” variable from their observed manifestations * And we distinguish causative (formative) from effects (reflexive) variables

Part 1.2, cont. Reliability (random measurement error) • We do not measure our concepts without error . • Measurement error “attenuates” (reduces) the strength of correlations, and the strength of regression coefficients. • Goal : work harder so that you measure your concepts real good! • Or: obtain auxiliary information to estimate measurement error.

Part 1.2, cont. . . Validity (systematic measurement error) • Our observed indicators do not always measure what they are intended to measure. • Maybe less than they should; sometime more. • Often validity is threatened by “method variance,” one example of which is “glop.” • Sometimes the systematic error associated with method variance can be estimated.

Part 1.3: Historic overview Wright traced genetic inheritability Path analysis (recursive models) in Sociology (Duncan 1966). Non-recursive (simultaneous equation) models in economics Measurement models from psychometrics (Lawley 1940) Integrated recursive & non-recursive measurement error models (Joreskog, Keesing & Wiley notation).

Part 1.4:Data set used in course The Iowa Family Transition Project (FTP) 550 rural Iowa families (1989 – 2009) Interview: mother, father, “target child” And later, target’s romantic partner

Part 2: Classical path analysis Some themes. . . Model specification Model estimation with PROC REG and PROC CALIS Model evaluation & model comparisons The decomposition of effects

Part 2.1: Model specification • Theoretical specification • What is the order of variables in a model • And how do you decide? • On logic • Or empirical evidence? • Specifying (operationalizing) the model • How do you express a theoretical model in paths and equations?

Part 2.2: Model estimation Ordinary least squares regression (use PROC REG is SAS) Maximum likelihood estimation, or variations on it (PROC CALIS) Standardized vsunstandardized coefficients

Part 2.3: Model evaluation: Evaluate specific paths of a model Is a proposed path significant? Is an hypothesized null path really not significant? Is there evidence of mediation? Spuriousness? Evaluate the overall model How well does a model fit the data?

Part 2.3: Model comparison The two extremes: saturated (fully recursive) model null model of complete independence Where does your model fit? Implicit model comparison and the chi-square test Explicit model comparisons and the change in chi- square.

Part 2.3, cont: Chi-square goodness-of-fit What does the chi-square statistics test do when comparing model? Compare the expected distribution under the null hypothesis to the observed distribution The greater the difference between expected and observed distributions, the larger the chi- square

Part 2.3, cont.: The chi-square in SEM The statistic: T = (N – 1)F F = min(obs – expected) df = p*(p+1)/2 - t t = # parameters being estimated T ~  2 (df)

Part 2.4, cont.: Two types of model comparison Absolute fit: Compare model to the “saturated” model, which fits the data perfectly Relative fit: Compare model to the model of complete independence (like a model with no predictors in OLS regression

Part 2.3, cont.: Some absolute fit indices Goodness of fit (GFI) and adjusted goodness of fit (AGFI) indices Standardized root mean residual (SRMR) Root mean square error of approximation (RMSEA) Goal: Get the values as small (close to zero) as possible

Part 2.3, cont.: Some relative fit indices • Tucker-Lewis non-normed fit index (TLI; 2) • Normed fit index (NFI; 1) • Relative fit index (RFI; 1) • Incremental fit index (IFI; 2 )

Part 2.3, cont. : Some more relative fit indices • Comparative fit index (CFI) • Goal: Get the values as close to 1.0 as possible • Akaike Information Criterion (AIC)

Part 2.4: Decomposition of effects The correlation between two variables can be decomposed into 4 parts: * direct effects * indirect effects * spurious effects * associational effects

Part 2.4, cont. : Decomposition of effects The total effect of one variable on another can be decomposed into 2 parts: Total effect = Direct effect + Indirect effect Direct effect Indirect effect To calculate, use EFFPART in PROC CALIS

Part 3: Non-recursive models Reciprocity & causal order in survey data. . . Model specification: writing the equations Model identification Interpreting model results: reciprocity and causal order Model comparison and evaluation

Part 3, cont.: What is “identification?” Identification in simple algebra: * two unknowns  at least two equations * k unknowns  at least k equations In SEM: the necessary condition df = (p)(p+1)/2 – t t = number of parameters estimated p = number of observed variables.

Part 3, cont., Insuring identification Bollens (1989) rules for identification for non-recursive models Line up the equations The necessary order condition The sufficient rank condition Some observations on identification

Part 4: Measurement models Measurement error vs. mistakes in measuring * A note on classical test theory Random vs systematic measurement error * The concept of method variance * Managing method variance Confirmatory vs. exploratory factor analysis * A note on notation * Write down the (restricted) equations Confirmatory factor analysis (CFA) * Model specification & identification * Model estimation & evaluation

Part 4, cont., CFA identification Rules for identification * The usual (necessary) t-rule * The three indicator rule * The two indicator rule Comments on identification

Part 4, cont.: CFA model comparison Nested vs. non-nested models Evaluating specific models using the chi-square Comparing nested models using the change in chi- square: indices and graphic displays

Part 5: Integrated model What do integrated models look like? Integrating the measurement & structural components Specifying (writing) the equations Model identification Is the measurement model identified? Is the structural portion of the model identified?

Part 5, cont.: Integrated model Model estimation using PROC CALIS Standardized vs non-standardized coefficients Model evaluation Significant and non-significant structural paths Significant and non-significant correlated errors Model comparison Chi-square goodness of fit statistic for absolute fit Relative fit indices for comparing nested models

Part 6: Concluding comments Concluding comments Complete homework projects

Stat 415a: Structural equation Modeling