1 / 26

9-2 / 9.3

Correlation and Regression. 9-2 / 9.3. Definition Linear Correlation Coefficient r. measures strength of the linear relationship between paired x and y values in a sample. n  xy - (  x )(  y ). r =. n (  x 2 ) - (  x ) 2 n (  y 2 ) - (  y ) 2.

gustav
Download Presentation

9-2 / 9.3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Regression 9-2 / 9.3

  2. Definition • Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample nxy - (x)(y) r = n(x2) - (x)2 n(y2) - (y)2

  3. Formula for b0 and b1 (y) (x2) - (x) (xy) b0 = (y-intercept) n(x2) - (x)2 n(xy) - (x) (y) b1 = (slope) n(x2) - (x)2

  4. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household Review Calculations Find the Correlation and the Regression Equation (Line of Best Fit)

  5. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household Review Calculations Using a calculator: b0 = 0.549 b1= 1.48 y = 0.549 + 1.48x r = 0.842

  6. r represents linear correlation coefficient for a sample  (ro) represents linear correlation coefficient for a population -1 r 1 r measures strength of a linear relationship. -1 is perfect negative correlation & 1 is perfect positive correlation Notes on correlation

  7. If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Interpreting the Linear Correlation Coefficient

  8. Two methods Both methods let H0: = (no significant linear correlation) H1:  (significant linear correlation) Formal Hypothesis Test

  9. Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: r t = 1 - r 2 n - 2 Critical values: use Table A-3 with degrees of freedom = n - 2

  10. Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Much easier Method 2: Test Statistic is r (uses fewer calculations)

  11. TABLE A-6 Critical Values of the Pearson Correlation Coefficient r = .01 n = .05 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256

  12. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household Is there a significant linear correlation? n = 8  = 0.05 H0:  = 0 H1 : 0 Test statistic is r = 0.842

  13. = .01 = .05 n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Is there a significant linear correlation? n = 8  = 0.05 H0:  = 0 H1 : 0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table A-6 with n = 8 and = 0.05) TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

  14. Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Reject = 0 Fail to reject  = 0 Reject = 0 1 - 1 r = - 0.707 0 r = 0.707 Sample data: r = 0.842

  15. Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0:  = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject = 0 Fail to reject  = 0 Reject = 0 1 - 1 r = - 0.707 0 r = 0.707 Sample data: r = 0.842

  16. Definition Regression Model Regression Equation ^ y= b0 + b1x y= b0 + b1x + e Regression Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables

  17. y-intercept of regression equation 0b0 Slope of regression equation 1b1 Equation of the regression line y = 0 + 1x + e y = b0 + b1 Notation for Regression Equation Population Parameter Sample Statistic ^ x

  18. Definition Regression Equation ^ y= b0 + b1x Regression Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables • Regression Line • (line of best fit or least-squares line) • is the graph of the regression equation

  19. 1. We are investigating only linearrelationships. 2. For each x value, y is a random variable having a normal distribution. 3. There are many methods for determining normality. 3. The regression line goes through (x, y) Assumptions & Observations

  20. 1. If there is no significant linear correlation, don’t use the regression equation to make predictions. 2. Stay within the scope of the available sample data when making prediction. • Guidelines for Using The • Regression Equation

  21. Outlier a point lying far away from the other data points Influential Points points which strongly affect the graph of the regression line Definitions

  22. Definitions Residual (error) for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation. Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible. Residuals and the Least-Squares Property ^ ^

  23. 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 Residuals and the Least-Squares Property x 1 2 4 5 ^ y= 5 + 4x y 4 24 8 32 y • Residual = 7 • Residual = 11 • Residual = -13 • Residual = -5 x

  24. Total Deviation from the mean of the particular point (x, y) the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean y Explained Deviation the vertical distance y - y, which is the distance between the predicted y value and the horizontal line passing through the sample mean y Unexplained Deviation the vertical distance y - y, which is the vertical distance between the point (x, y) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.) Definitions ^ ^ ^

  25. 39 37 35 33 31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1 0 Unexplained, Explained, and Total Deviation y (5, 32) Unexplained deviation (y - y) • Total deviation (y - y) (5, 25) ^ • Explained deviation (y - y) ^ • y = 17 (5, 17) ^ y = 5 + 4x x 0 1 2 3 4 5 6 7 8 9

  26. (total deviation) = (explained deviation) + (unexplained deviation) ^ ^ (y - y) = (y - y) + (y - y) (total variation) = (explained variation) + (unexplained variation) ^ Σ(y - y) 2 = Σ (y - y) 2 + Σ(y - y) 2 ^

More Related