Epsy 651 structural equation modeling i
Download
1 / 53

EPSY 651: Structural Equation Modeling I - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

EPSY 651: Structural Equation Modeling I. Where does SEM fit in Quantitative Methodology?. Draws on three traditions in mathematics and science: Psychology (Spearman, Kelley, Thurstone, Cronbach, etc. Sociology (Wright)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' EPSY 651: Structural Equation Modeling I' - walter-thornton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Where does sem fit in quantitative methodology
Where does SEM fit in Quantitative Methodology?

Draws on three traditions in mathematics and science:

Psychology (Spearman, Kelley, Thurstone, Cronbach, etc.

Sociology (Wright)

Agriculture and statistics: (Pearson, Fisher, Neymann, Rao, etc.)

Largely due to Jöreskog in 1960s & 1970s

Map below shows its positioning


Manifest modeling
MANIFEST MODELING

  • Classical statistics within the parametric tradition

  • Canonical analysis subsumes most methods as special cases


Latent modeling
LATENT MODELING

  • Psychological concept of “FACTOR” is central to latent modeling: unobserved directly but “indicated” through observed variables

  • Emphasis on error as individual differences as well as problem of observation (measurement) rather than “lack of fit” conception in manifest modeling


Structural equation modeling purposes
STRUCTURAL EQUATION MODELING PURPOSES

  • MODEL real world phenomena in social sciences with respect to

    • POPULATIONS

    • ECOLOGIES

    • TIME


Sem procedure
SEM PROCEDURE

  • FOCUS ON DECOMPOSITION OF COVARIANCE MATRIX:

    xy = (x,y,2x,2y,xy) + (ex,ey, exy)

    x =  + 

    y = By + x + e


Testing in sem
TESTING in SEM

  • SEM tests A PRIORI (theoretically specified) MODELS

  • SEM has potential to consider model revisions

  • SEM is not necessarily good for exploratory modeling


Sem comparisons
SEM COMPARISONS

  • SEM can COMPARE Ecologies or Populations for identical models or

  • Simultaneously compare multiple groups or ecologies with each having unique models

  • Statistical testing is available for all parts of all models as well as overall model fit



Karl Pearson (1857-1936. (exerpted from E S Pearson, Karl Pearson: An Appreciation of some aspects of his life and works, Cambridge University Press, 1938).


Pearson correlation
Pearson Correlation

n

 (xi – xx)(yi – yy)/(n-1)

rxy = i=1_____________________________ = sxy/sxsy

sx sy

=  zxizyi/(n-1)

= COVARIANCE / SD(x)SD(y)


Covariance
COVARIANCE

  • DEFINED AS CO-VARIATION

    COVxy = Sxy

  • “UNSTANDARDIZED CORRELATION”

  • Distribution is statistically workable

  • Basis of Structural Equation Modeling (SEM) is constructing models for covariances of variables


correlation covariance

 1 – r2

se = standard deviation of errors

.364 (40)

.932(.955)

SAT

Math

Calc

Grade

error

Figure 3.4: Path model representation of correlation between SAT Math scores and Calculus Grades


Path models
Path Models

  • path coefficient -standardized coefficient next to arrow, covariance in parentheses

  • error coefficient- the correlation between the errors, or discrepancies between observed and predicted Calc Grade scores, and the observed Calc Grade scores.

  • Predicted(Calc Grade) = .00364 SAT-Math + .5

  • errors are sometimes called disturbances


e

X

Y

X

Y

Y

X

a

b

c

Figure 3.2: Path model representations of correlation


Bivariate data
BIVARIATE DATA

  • 2 VARIABLES

  • QUESTION: DO THEY COVARY?

  • IF SO, HOW DO WE INTERPRET?

  • IF NOT, IS THERE A THIRD INTERVENING (MEDIATING) VARIABLE OR EXOGENOUS VARIABLE THAT SUPPRESSES THE RELATIONSHIP? OR MODERATES THE RELATIONSHIP


Idealized scatterplot
IDEALIZED SCATTERPLOT

  • POSITIVE RELATIONSHIP

Y

Prediction line

X


Idealized scatterplot1
IDEALIZED SCATTERPLOT

  • NEGATIVE RELATIONSHIP

95% confidence interval around prediction

Y

Prediction line

Y.

X.

X


Idealized scatterplot2
IDEALIZED SCATTERPLOT

  • NO RELATIONSHIP

Y

Prediction line

X


Suppressed scatterplot
SUPPRESSED SCATTERPLOT

  • NO APPARENT RELATIONSHIP

Y

MALES

Prediction lines

FEMALES

X


Modeeration and suppression in a scatterplot
MODEERATION AND SUPPRESSION IN A SCATTERPLOT

  • NO APPARENT RELATIONSHIP

Y

MALES

Prediction lines

FEMALES

X


Idealized scatterplot3
IDEALIZED SCATTERPLOT

  • POSITIVE CURVILINEAR RELATIONSHIP

Y

Quadratic

prediction line

Linear

prediction line

X



Hypotheses about correlations1
Hypotheses about Correlations

  • One sample tests for Pearson r

  • Two sample tests for Pearson r

  • Multisample test for Pearson r

  • Assumptions: normality of x, y being correlated


One sample test for pearson r
One Sample Test for Pearson r

  • Null hypothesis:  = 0, Alternate   0

  • test statistic: t = r/ [(1- r2 ) / (n-2)]1/2

    with degrees of freedom = n-2


One sample test for pearson r1
One Sample Test for Pearson r

  • ex. Descriptive Statistics for Kindergarteners on a Reading Test (from SPSS)

  • Mean Std. Deviation N

  • Naming letters .5750 .3288 76

  • Overall reading .6427 .2414 76

  • Correlations

  • Naming Overall

  • Naming letters 1.000 .784**

  • Sig. (1-tailed) . .000

  • N 76 76

  • Overall reading .784** 1.000

  • Sig. (1-tailed) .000 .

  • N 76 76

  • ** Correlation is significant at the 0.01 level (1-tailed).


One sample test for pearson r2
One Sample Test for Pearson r

Null hypothesis:  = c, Alternate   c

  • test statistic: z = (Zr - Zc )/ [1/(n-3)]1/2

    where z=normal statistic, Zr = Fisher Z transform


Fisher s z transform
Fisher’s Z transform

  • Zr = tanh-1 r = (1/2) ln[(1+  r  ) /(1 -  r |)]

  • This creates a new variable with mean Z and SD 1/1/(n-3) which is normally distributed


Non null r example
Non-null r example

  • Null: (girls) = .784

  • Alternate: (girls)  .784

    Data: r = .845, n= 35

  • Z (girls=.784) = 1.055, Zr(girls=.845)=1.238

    z = (1.238 - 1.055)/[1/(35-3)]1/2

    = .183/(1/5.65685) = 1.035, nonsig.


Two sample test for difference in pearson r s
Two Sample Test for Difference in Pearson r’s

  • Null hypothesis: 1 = 2

  • Alternate hypothesis 1  2

  • test statistic:

    z =( Zr1 - Zr2 ) / [1/(n1-3) + 1/(n2-3)]1/2

    where z= normal statistic


Example
Example

  • Null hypothesis: girls = boys

  • Alternate hypothesis girls  2boys

  • test statistic: rgirls = .845, rboys = .717 ngirls = 35, nboys = 41

    z = Z(.845) - Z(.717) / [1/(35-3) + 1/(41-3)]1/2

    = ( 1.238 - .901) / [1/32 + 1/38] 1/2

    = .337 / .240 = 1.405, nonsig.


Multisample test for pearson r
Multisample test for Pearson r

  • Three or more samples:

  • Null hypothesis: 1 = 2 = 3 etc

  • Alternate hypothesis: some i  j

  • Test statistic: 2 = wiZ2i - w.Z2w

    which is chi-square distributed with #groups-1 degrees of freedom and

    wi = ni-3, w.= wi , and

    Zw = wiZi /w.



Multiple group models of correlation
Multiple Group Models of Correlation

  • SEM approach models several groups with either the SAME or Different correlations:

boys

xy = a

X

y

girls

xy = a

X

y


Multigroup sem
Multigroup SEM

  • SEM Analysis produces chi-square test of goodness of fit (lack of fit) for the hypothesis about ALL groups at once

  • Other indices: Comparative Fit Index (CFI), Normed Fit Index (NFI), Root Mean Square Error of Approximation (RMSEA)

  • CFI, NFI > .95 means good fit

  • RMSEA < .06 means good fit


Multigroup sem1
Multigroup SEM

  • SEM assumes large sample size, multinormality of all variables

  • Robust as long as skewness and kurtosis are less than 3, sample size is probably > 100 per group (200 is better), or few parameters are being estimated (sample size as low as 70 per group may be OK with good distribution characteristics)



Multiple regression analysis1
Multiple regression analysis

  • The test of the overall hypothesis that y is unrelated to all predictors, equivalent to

  • H0: 2y123… = 0

  • H1: 2y123… = 0

  • is tested by

  • F = [ R2y123… / p] / [ ( 1 - R2y123…) / (n – p – 1) ]

  • F = [ SSreg / p ] / [ SSe / (n – p – 1)]


Multiple regression analysis2
Multiple regression analysis

SOURCE df Sum of Squares Mean Square F

x1, x2… p SSreg SSreg / p SSreg/ p

SSe /(n-p-1)

e (residual) n-p-1 SSe SSe / (n-p-1)

total n-1 SSy SSy / (n-1)


Multiple regression analysis predicting depression
Multiple regression analysis predicting Depression

LOCUS OF CONTROL, SELF-ESTEEM, SELF-RELIANCE


SSreg

ssx1

SSy

SSe

ssx2

Fig. 8.4: Venn diagram for multiple regression with two predictors and one outcome measure


Type I

ssx1

SSx1

SSy

SSe

SSx2

Type III

ssx2

Fig. 8.5: Type I contributions


Type III

ssx1

SSx1

SSy

SSe

SSx2

Type III

ssx2

Fig. 8.6: Type IIII unique contributions


Multiple regression anova table
Multiple Regression ANOVA table

SOURCE df Sum of Squares Mean Square F

(Type I)

  • Model 2 SSreg SSreg / 2 SSreg / 2

  • SSe / (n-3)

  • x1 1 SSx1 SSx1 / 1 SSx1/ 1

  • SSe /(n-3)

  • x2 1 SSx2  x1 SSx2  x1SSx2  x1/ 1

  • SSe /(n-3)

  • e n-3 SSe SSe / (n-3)

  • total n-1 SSy SSy / (n-3)


PATH DIAGRAM FOR REGRESSION

 = .5

X1

.387

r = .4

Y

e

X2

 = .6

R2 = .742 + .82

- 2(.74)(.8)(.4)

 (1-.42)

= .85


Depression
Depression

e

.471

.4

LOC. CON.

-.317

DEPRESSION

SELF-EST

R2 = .60

-.186

SELF-REL


Shrinkage r 2
Shrinkage R2

  • Different definitions: ask which is being used:

    • What is population value for a sample R2?

    • R2s = 1 – (1- R2)(n-1)/(n-k-1)

    • What is the cross-validation from sample to sample?

    • R2sc = 1 – (1- R2)(n+k)/(n-k)


Estimation methods
Estimation Methods

  • Types of Estimation:

    • Ordinary Least Squares (OLS)

      • Minimize sum of squared errors around the prediction line

    • Generalized Least Squares

      • A regression technique that is used when the error terms from an ordinary least squares regression display non-random patterns such as autocorrelation or heteroskedasticity.

    • Maximum Likelihood


Maximum likelihood estimation
Maximum Likelihood Estimation

  • Maximum likelihood estimation

  • There is nothing visual about the maximum likelihood method - but it is a powerful method and, at least for large samples, very preciseMaximum likelihood estimation begins with writing a mathematical expression known as the Likelihood Function of the sample data. Loosely speaking, the likelihood of a set of data is the probability of obtaining that particular set of data, given the chosen probability distribution model. This expression contains the unknown model parameters. The values of these parameters that maximize the sample likelihood are known as theMaximum Likelihood Estimatesor MLE's.  Maximum likelihood estimation is a totally analytic maximization procedure.

  • MLE's and Likelihood Functions generally have very desirable large sample properties: 

    • they become unbiased minimum variance estimators as the sample size increases

    • they have approximate normal distributions and approximate sample variances that can be calculated and used to generate confidence bounds

    • likelihood functions can be used to test hypotheses about models and parameters 

  • With small samples, MLE's may not be very precise and may even generate a line that lies above or below the data pointsThere are only two drawbacks to MLE's, but they are important ones: 

    • With small numbers of failures (less than 5, and sometimes less than 10 is small), MLE's can be heavily biased and the large sample optimality properties do not apply

  • Calculating MLE's often requires specialized software for solving complex non-linear equations. This is less of a problem as time goes by, as more statistical packages are upgrading to contain MLE analysis capability every year.


Outliers
Outliers

  • Leverage (for a single predictor):

  • Li = 1/n + (Xi –Mx)2 / x2 (min=1/n, max=1)

  • Values larger than 1/n by large amount should be of concern

  • Cook’s Di = (Y – Yi) 2 / [(k+1)MSres]

    • the difference between predicted Y with and without Xi


Outliers1
Outliers

  • In SPSS under SAVE options COOKs and Leverage Values are options you can select

  • Result is new variables in your SPSS data set with the values for each case given

  • You can sort on either one to investigate the largest values for each

  • You can delete the cases with largest values and recompute the regression to see if it changed


t12 t13 t14 COO_1 LEV_1

63 50 44 .03855 .01520

42 50 68 .02422 .04943

41 55 46 .02065 .02010

56 55 52 .01915 .02349

56 60 57 .01696 .01056

41 39 41 .01689 .02435

77 39 65 .01525 .01520

52 65 54 .01448 .01607

39 39 65 .01425 .02289

30 45 60 .01242 .01346

53 60 68 .01133 .03147

52 55 68 .01060 .00693

55 39 41 .01047 .00512

42 80 68 .00918 .02459

59 80 68 .00907 .01098

48 65 46 .00885 .00160


ad