week 14 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Week 14 PowerPoint Presentation
Download Presentation
Week 14

Loading in 2 Seconds...

play fullscreen
1 / 20

Week 14 - PowerPoint PPT Presentation


  • 159 Views
  • Uploaded on

Week 14. Chapter 16 – Partial Correlation and Multiple Regression and Correlation. Chapter 16. Partial Correlation and Multiple Regression and Correlation. In This Presentation . Partial correlations Multiple regression Using the multiple regression line to predict Y

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Week 14' - lottie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
week 14

Week 14

Chapter 16 – Partial Correlation and Multiple Regression and Correlation

chapter 16

Chapter 16

Partial Correlation and Multiple Regression and Correlation

in this presentation
In This Presentation

Partial correlations

Multiple regression

Using the multiple regression line to predict Y

Multiple correlation coefficient (R2)

Limitations of multiple regression and correlation

introduction
Introduction

Multiple Regression and Correlation allow us to:

Disentangle and examine the separate effects of the independent variables.

Use all of the independent variables to predict Y.

Assess the combined effects of the independent variables on Y.

partial correlation
Partial Correlation

Partial Correlation measures the correlation between X and Y controlling for Z

Comparing the bivariate (“zero-order”) correlation to the partial (“first-order”) correlation allows us to determine if the relationship between X and Y is direct, spurious, or intervening

Interaction cannot be determined with partial correlations

partial correlation1
Partial Correlation

Note the subscripts in the symbol for a partial correlation coefficient:

rxy●z

which indicates that the correlation coefficient is for X and Y controlling for Z

partial correlation2
Partial Correlation

Example

The table below lists husbands’ hours of housework per week (Y), number of children (X), and husbands’ years of education (Z) for a sample of 12 dual-career households

partial correlation3
Partial Correlation

Example

A correlation matrix appears below

The bivariate (zero-order) correlation between husbands’ housework and number of children is +0.50

This indicates a positive relationship

partial correlation4
Partial Correlation

Example

Calculating the partial (first-order) correlation between husbands’ housework and number of children controlling for husbands’ years of education yields +0.43

partial correlation5
Partial Correlation

Example

Comparing the bivariate correlation (+0.50) to the partial correlation (+0.43) finds little change

The relationship between number of children and husbands’ housework controlling for husbands’ education has not changed

Therefore, we have evidence of a direct relationship

multiple regression
Multiple Regression

Previously, the bivariate regression equation was:

In the multivariate case, the regression equation becomes:

multiple regression1
Multiple Regression

Y = a + b1X1 + b2X2

Notation

  • a is the Y intercept, where the regression line crosses the Y axis
  • b1 is the partial slope for X1 on Y
    • b1 indicates the change in Y for one unit change in X1, controlling for X2
  • b2 is the partial slope for X2 on Y
    • b2 indicates the change in Y for one unit change in X2, controlling for X1
multiple regression using spss
Multiple Regression using SPSS
  • Suppose we are interested in the link between Daily Calorie Intake and Female Life Expectancy in a third world country
  • Suppose further that we wish to look at other variables that might predict Female life expectancy
    • One way to do this is to add additional variables to the equation and conduct a multiple regression analysis.
    • E.g. literacy rates with the assumption that those who read can access health and medical information
multiple regression using spss steps to set up the analysis
Multiple Regression using SPSS: Steps to Set Up the Analysis
  • In Data Editor go to Analyze/ Regression/ Linear and click Reset
  • Put Average Female Life Expectancy into the Dependent box
  • Put Daily Calorie Intake and People who Read % into the Independents box
  • Under Statistics, select Estimates, Confidence Intervals, Model Fit, Descriptives, Part and Partial Correlation, R Square Change, Collinearity Diagnostics, and click Continue
  • Under Options, check Include Constant in the Equation, click Continue and then OK
  • Compare your output to the next several slides
interpreting your spss multiple regression output
Interpreting Your SPSS Multiple Regression Output
  • First let’s look at the zero-order (pairwise) correlations between Average Female Life Expectancy (Y), Daily Calorie Intake (X1) and People who Read (X2). Note that these are .776 for Y with X1, .869 for Y with X2, and .682 for X1 with X2

r YX1

r X1X2

r YX2

examining the regression weights
Examining the Regression Weights
  • Above are the raw (unstandardized) and standardized regression weights for the regression of female life expectancy on daily calorie intake and percentage of people who read.
  • The standardized regression coefficient (beta weight) for daily caloric intake is .342.
  • The beta weight for percentage of people who read is much larger, .636.
    • What this weight means is that for every unit change in percentage of people who read (that is, for every increase by a factor of one standard deviation on the people who read variable), Y (female life expectancy) will increase by a multiple of .636 standard deviations.
    • Note that both the beta coefficients are significant at p < .001
r r square and the see
R, R Square, and the SEE

Above is the model summary, which has some important statistics. It gives us R and R square for the regression of Y (female life expectancy) on the two predictors. R is .905, which is a very high correlation. R square tells us what proportion of the variation in female life expectancy is explained by the two predictors, a very high .818. It gives us the standard error of estimate, which we can use to put confidence intervals around the unstandardized regression coefficients

f test for the significance of the regression equation
F Test for the Significance of the Regression Equation

Next we look at the F test of the significance of the

Regression equation, Y = .342 X1 + .636 X2. Is this so much better a predictor of female literacy (Y) than simply using the mean of Y that the difference is statistically significant? The F test is a ratio of the mean square for the regression equation to the mean square for the “residual” (the departures of the actual scores on Y from what the regression equation predicted). In this case we have a very large value of F, which is significant at p <.001. Thus it is reasonable to conclude that our regression equation is a significantly better predictor than the mean of Y.

confidence intervals around the regression weights
Confidence Intervals around the Regression Weights

Finally, your output provides confidence intervals around the unstandardized regression coefficients. Thus we can say with 95% confidence that the unstandardized weight to apply to daily calorie intake to predict female life expectancy ranges between .004 and .010, and that the undstandardized weight to apply to percentage of people who read ranges between .247 and .383

limitations
Limitations

Multiple regression and correlation are among the most powerful techniques available to researchers. But powerful techniques have high demands.

These techniques require:

Every variable is measured at the interval-ratio level

Each independent variable has a linear relationship with the dependent variable

Independent variables do not interact with each other

Independent variables are uncorrelated with each other

When these requirements are violated (as they often are), these techniques will produce biased and/or inefficient estimates. There are more advanced techniques available to researchers that can correct for violations of these requirements. Such techniques are beyond the scope of this text.