1 / 20

Soc 3306a Multiple Regression

Soc 3306a Multiple Regression. Testing a Model and Interpreting Coefficients. Assumptions for Multiple Regression. Random sample Distribution of y is relatively normal Check histogram for DV Standard deviation of y is constant for each value of x Check scatterplots ( Figure 1 ).

arnon
Download Presentation

Soc 3306a Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients

  2. Assumptions for Multiple Regression • Random sample • Distribution of y is relatively normal • Check histogram for DV • Standard deviation of y is constant for each value of x • Check scatterplots (Figure 1)

  3. Problems to Watch For… • Violation of assumptions, especially normality of DV and heteroscedasticity (Figure 1) • Simpson’s Paradox (Figure 3) • Multicollinearity (Figure 1 and 2)

  4. Building a Model in SPSS (Figure 2) • Should be driven by your theory • You can add your variables on at a time, checking at each step whether there is significant improvement in the explanatory power of the model. Use Method=Enter. • In Block 1, enter your main IV. Under Statistics, ask for R2 change. • Click next, and enter additional IV. • Check the Change Statistics in the Model Summary watch changes in R2 and coefficients (esp. partial correlations) carefully.

  5. Multiple Correlation R (Figure 2) • Measures correlation of all IV’s with DV • Is the correlation of y values with the predicted y values • Always positive (between 0 and +1)

  6. Coefficient of Determination R2 (Figure 2) • Measures the proportional reduction in error (PRE) in predicting y using the prediction equation (taking x into account) rather than the mean of y • R2 = (TSS – SSE)/TSS • This is the explained variation in y

  7. TSS, SSE and RSS • TSS = Total variability around the mean of y • SSE = Residual sum of squares or error • This is the unexplained variability • RSS = TSS – SSE • This is the regression sum of squares • The explained variability in y

  8. F Statistic and p-value • Look at the ANOVA table (Figure 2) • F is the ratio of the regression mean square (RSS/df) and the residual (error) mean square (SSE/df) • The larger the F, the smaller the p-value • Small p-value (<.05, .01, or .001) is strong evidence for the significance of the model

  9. Slope (b), β, t-statistic and p-value(Coefficients Table in Figure 2) • Slope is measured in actual units of variables. Change in y for 1 unit of x • In multiple regression, each slope is controlled for all other x variables • β is standardized slope – can compare strength • t = b/se with df= n-(k+1), note: k = # of predictors • Small p-value indicates significant relationship with y, controlling for other variables in model • Note: in bivariate regression, t2 = F and β = r

  10. Multicollinearity (Figure 1 and 2) • Two independent variables in the model, i.e. x1 and x2, are correlated with y but also highly correlated (>.600 - .700) with each other • Both are explaining the same proportion of variation in y but adding x2 to the model does not increase explanatory value (R, R2) • Check correlation between IV’s in correlation matrix. • Ask for and check partial correlations in multiple regression (Part and Partial under Statistics) • If partial correlation in multiple model much lower than bivariate correlation, multicollinearity indicated

  11. Types of Multivariate Relationships • 1. Spuriousness • 2. Causal chains (intervening variable) • 3. Multiple causes (independent effects) • 4. Suppressor variables • 5. Interaction effects • Multiple regression can test for all of these

  12. 1. Spuriousness (Figure 3) • A spurious relationship means model is incorrectly specified • Indicated by change in the sign of partial correlations • Can also check the partial regression plots (ask for all partial plots under Plots) • The bivariate relationship between acceleration time and vehicle weight was negative (as weight went up, time to accelerate to 60 mph went down) – but makes no sense! • When horsepower was added to the model, partial relationship of Acc x Wt became positive • When relationship changes (ie – to +) or disappears a spurious relationship may be present • In this case, variation in both Acceleration and Weight caused by Horsepower • The situation in Figure 3 is called Simpson’s Paradox

  13. 2. Causal Chains • Intervening variable changes relationship between x and y • A relationship exists between X1 and Y at the bivariate level, but disappears with the addition of control variable(s) • Results can look the same as spuriousness • Major difference is interpretation (see Agresti Ch. 11) • Need to rely on theory • Bivariate: X1 Y • Multivariate: X1X2Y • Although effect of X1 on Y disappears, X1 is still part of “causal explanation” as an “indirect” cause

  14. 2. Causal Chains (cont.) • Two possibilities for causal chains: • If slope of X1 no longer significant after introducing X2, we have an indirect causal effect • X1X2Y • Or if the strength of the slope is weaker yet still significant X1 has both an indirect and direct causal effect • X1 Y X2

  15. 3. Multiple Causes • Y (DV) has multiple causes • Independent variables have relatively separate effects on the dependent variable • Introduction of controls does little to change bivariate correlations and the bivariate slopes stay similar • Compare bivariate to partial correlations in multiple model and compare slopes in the bivariate and multiple models

  16. 4. Suppressor Variables • Initially slope between X1 and Y non-significant • When add control variable X2 the slope becomes significant • X2 is associated with both X1 and Y which hides the initial relationship • X2 Y X1

  17. 4. Interactions • Not all IV effects on Y are independent and often IV’s interact with one another in their effect on Y • Usually suggested by theory • An interaction is present when you enter control variable and the original bivariate association differs by level of the control variable • Does the slope of X1 differ by category of X2 when explaining Y? • Can test this by introducing “interaction terms” into the multiple regression model (for example see optional reading Agresti Ch. 11 p. 340-343)

  18. Interactions (cont.) • Interaction term is the cross-product of X1 and X2 and is entered into model together with X1 and X2 (go to Transform>Compute variable…) • Regression model becomes: • E(y) = a + b1x1 + b2x2 + b3x1x2 • Produces main effects and an interaction effect • If interaction not significant, drop from model since the effects of X1 and X2 are independent of one another, and interpret main effects • See Figure 4

  19. Interpreting Interactions • If interaction slope is significant, main effects should be interpreted in context of the interaction model. See Figure 5. • E(y) = a + b1x1 + b2x2 + b3x1x2 • Income (Y) is determined by Respondent’s Education (x1), Spouse’s Education (x2) and the interaction of x1x2 • By setting x2 at distinct levels (i.e. 10 and 20 years), can calculate or graph the changing slopes for x1 (again, see Agresti)

  20. A Few Tips for SPSS Mini 6 • Review the relevant powerpoint slides and accompanying handouts • Read assignment over carefully before starting. • When creating your model, build your model carefully one block at a time. • Watch for spurious relationships. Revise model if needed. • Drop any unnecessary variables (i.e. evidence of multicollinearity or new variables that do not appreciably increase R2.) Keep your model simple. Aim for good explanatory value with the least variables possible.

More Related