1 / 26

Chapter 7 - PowerPoint PPT Presentation

Chapter 7. Correlation, Bivariate Regression, and Multiple Regression. Pearson’s Product Moment Correlation. Correlation measures the association between two variables. Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about ' Chapter 7' - nora-tyler

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Chapter 7

Correlation, Bivariate Regression, and Multiple Regression

• Correlation measures the association between two variables.

• Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable.

• r ranges from +1 to -1.

• Correlation can be used for prediction.

• Correlation does not indicate the cause of a relationship.

• Scatter plot gives a visual description of the relationship between two variables.

• The line of best fit is defined as the line that minimized the squared deviations from a data point up to or down to the line.

Will a Linear Fit Work? Relationship

y = 0.5246x - 2.2473

R2 = 0.4259

Linear Fit Relationship

y = 0.0012x - 1.0767

R2 = 0.0035

Evaluating the Strength of a Correlation Relationship

• For predictions, absolute value of r < .7, may produce unacceptably large errors, especially if the SDs of either or both X & Y are large.

• As a general rule

• Absolute value r greater than or equal .9 is good

• Absolute value r equal to .7 - .8 is moderate

• Absolute value r equal to .5 - .7 is low

• Values for r below .5 give R2 = .25, or 25% are poor, and thus not useful for predicting.

Significant Correlation?? Relationship

If N is large (N=90) then a .205 correlation is significant.

How much variance in Y is X accounting for?

r = .205

R2 = .042, thus X is accounting for 4.2% of the variance in Y.

This will lead to poor predictions.

A 95% confidence interval will also show how poor the prediction is.

Venn diagram shows (R Relationship2) the amount of variance in Y that is explained by X.

R2=.64 (64%) Variance in Y that is explained by X

Unexplained Variance in Y. (1-R2) = .36, 36%

The vertical distance (up or down) from a data point to the line of best fit is a RESIDUAL.

r = .845

R2 = .714 (71.4%)

Y = mX + b

Y = .72 X + 13

Standard Error of Estimate line of best fit is a RESIDUAL.(SEE)SD of Y

Prediction Errors

The SEE is the SD of the prediction errors (residuals) when predicting Y from X. SEE is used to make a confidence interval for the prediction equation.

Bivariate Linear Regression line of best fit is a RESIDUAL.

Linear Regression: Statistics line of best fit is a RESIDUAL.

Enter the variables

Click Statistics Button

Linear Regression: Statistics Settings line of best fit is a RESIDUAL.

Linear Regression: Output line of best fit is a RESIDUAL.

71.5% percent of the variance in Y is explained by X.

Correlation (r) r = .845 between X and Y.

Regression Output line of best fit is a RESIDUAL.

Prediction Equation

Y = .726 (X) + 12.859

95% CI

Y = .726 (X) + 12.859 ± 1.96 (6.06)

The SE line of best fit is a RESIDUAL.E is used to compute confidence intervals for prediction equation.

Example of a 95% confidence interval. line of best fit is a RESIDUAL.

Both r and SDY are critical in accuracy of prediction.

If SDY is small and r is big, predictions are will be small.

If SDY is big and r is small, predictions are will be large.

We are 95% sure the mean falls between 45.1 and 67.3

Multiple Regression line of best fit is a RESIDUAL.

• Multiple regression is used to predict one Y (dependent) variable from two or more X (independent) variables.

• The advantage of multivariate or bivariate regression is

• Provides lower standard error of estimate

• Determines which variables contribute to the prediction and which do not.

Multiple Regression line of best fit is a RESIDUAL.

• b1, b2, b3, … bn are coefficients that give weight to the independent variables according to their relative contribution to the prediction of Y.

• X1, X2, X3, … Xn are the predictors (independent variables).

• C is a constant, similar to Y intercept.

• Body Fat = Abdominal + Tricep + Thigh

List the variables and order to enter into the equation line of best fit is a RESIDUAL.

• X2 has biggest area (C), it comes in first.

• X1 comes in next area (A) is bigger than area (E). Both A and E are unique, not common to C.

• X3 comes in next, it uniquely adds area (E).

• X4 is not related to Y so it is NOT in the equation.

Ideal Relationship Between Predictors and Y line of best fit is a RESIDUAL.

Each variable accounts for unique variance in Y

Very little overlap of the predictors

Order to enter?

X1, X3, X4, X2, X5