Week 14. Chapter 16 – Partial Correlation and Multiple Regression and Correlation. Chapter 16. Partial Correlation and Multiple Regression and Correlation. In This Presentation . Partial correlations Multiple regression Using the multiple regression line to predict Y
Chapter 16 – Partial Correlation and Multiple Regression and Correlation
Partial Correlation and Multiple Regression and Correlation
Using the multiple regression line to predict Y
Multiple correlation coefficient (R2)
Limitations of multiple regression and correlation
Multiple Regression and Correlation allow us to:
Disentangle and examine the separate effects of the independent variables.
Use all of the independent variables to predict Y.
Assess the combined effects of the independent variables on Y.
Partial Correlation measures the correlation between X and Y controlling for Z
Comparing the bivariate (“zero-order”) correlation to the partial (“first-order”) correlation allows us to determine if the relationship between X and Y is direct, spurious, or intervening
Interaction cannot be determined with partial correlations
Note the subscripts in the symbol for a partial correlation coefficient:
which indicates that the correlation coefficient is for X and Y controlling for Z
The table below lists husbands’ hours of housework per week (Y), number of children (X), and husbands’ years of education (Z) for a sample of 12 dual-career households
A correlation matrix appears below
The bivariate (zero-order) correlation between husbands’ housework and number of children is +0.50
This indicates a positive relationship
Calculating the partial (first-order) correlation between husbands’ housework and number of children controlling for husbands’ years of education yields +0.43
Comparing the bivariate correlation (+0.50) to the partial correlation (+0.43) finds little change
The relationship between number of children and husbands’ housework controlling for husbands’ education has not changed
Therefore, we have evidence of a direct relationship
Previously, the bivariate regression equation was:
In the multivariate case, the regression equation becomes:
Y = a + b1X1 + b2X2
Above is the model summary, which has some important statistics. It gives us R and R square for the regression of Y (female life expectancy) on the two predictors. R is .905, which is a very high correlation. R square tells us what proportion of the variation in female life expectancy is explained by the two predictors, a very high .818. It gives us the standard error of estimate, which we can use to put confidence intervals around the unstandardized regression coefficients
Next we look at the F test of the significance of the
Regression equation, Y = .342 X1 + .636 X2. Is this so much better a predictor of female literacy (Y) than simply using the mean of Y that the difference is statistically significant? The F test is a ratio of the mean square for the regression equation to the mean square for the “residual” (the departures of the actual scores on Y from what the regression equation predicted). In this case we have a very large value of F, which is significant at p <.001. Thus it is reasonable to conclude that our regression equation is a significantly better predictor than the mean of Y.
Finally, your output provides confidence intervals around the unstandardized regression coefficients. Thus we can say with 95% confidence that the unstandardized weight to apply to daily calorie intake to predict female life expectancy ranges between .004 and .010, and that the undstandardized weight to apply to percentage of people who read ranges between .247 and .383
Multiple regression and correlation are among the most powerful techniques available to researchers. But powerful techniques have high demands.
These techniques require:
Every variable is measured at the interval-ratio level
Each independent variable has a linear relationship with the dependent variable
Independent variables do not interact with each other
Independent variables are uncorrelated with each other
When these requirements are violated (as they often are), these techniques will produce biased and/or inefficient estimates. There are more advanced techniques available to researchers that can correct for violations of these requirements. Such techniques are beyond the scope of this text.