1 / 38

MVS 250: V. Katch

S TATISTICS. Chapter 5 Correlation/Regression. MVS 250: V. Katch. Paired Data is there a relationship if so, what is the equation use the equation for prediction. Overview. Correlation. Correlation

lane-cherry
Download Presentation

MVS 250: V. Katch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STATISTICS Chapter 5 Correlation/Regression MVS 250: V. Katch

  2. Paired Data is there a relationship if so, what is the equation use the equation for prediction Overview

  3. Correlation

  4. Correlation exists between two variables when one of them is related to the other in some way Definition

  5. 1. The sample of paired data (x,y) is a random sample. 2. The pairs of (x,y) data have a bivariate normal distribution. Assumptions

  6. Scatterplot (or scatter diagram) is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point. Definition

  7. Scatter Diagram of Paired Data

  8. Scatter Diagram of Paired Data

  9. Positive Linear Correlation y y y x x x (a) Positive (b) Strong positive (c) Perfect positive Scatter Plots

  10. Negative Linear Correlation y y y x x x (d) Negative (e) Strong negative (f) Perfect negative Scatter Plots

  11. No Linear Correlation y y x x (h) Nonlinear Correlation (g) No Correlation Scatter Plots

  12. xy/n - (x/n)(y/n) r = (SDx) (SDy) • Definition • Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample Where xy/n is the mean of the cross products; (x/n) is the mean of the x variable; (y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable

  13. n number of pairs of data presented  denotes the addition of the items indicated. x/ndenotes the mean of all x values. y/ndenotes the mean of all y values. xy/ndenotes the mean of the cross products [x times y, summed; divided by n] r linear correlation coefficient for a sample  linear correlation coefficient for a population Notation for the Linear Correlation Coefficient

  14. Round to three decimal places Use calculator or computer if possible Rounding the Linear Correlation Coefficient r

  15. 1. -1 r 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not       change. 4. r measures strength of a linear relationship. Properties of the Linear Correlation Coefficient r

  16. If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Remember to use n-2 Interpreting the Linear Correlation Coefficient

  17. 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation. Common Errors Involving Correlation

  18. 250 200 150 Distance (feet) 100 50 0 1 2 3 4 5 6 7 8 0 Time (seconds) Common Errors Involving Correlation

  19. Correlation is Not Causation A B C

  20. Correlation Calculations Rank Order Correlation -RhoPearson’s - r

  21. Rank Order Correlation

  22. Rank Order Correlation, cont Rho = 1- [6 (∑D2) / N (N2-1)] Rho = 1- [6(58)/10(102-1)] Rho = 1- [348 / 10 (100 -1)] Rho = 1- [348 / 990] Rho = 1- 0.352 Rho = 0.648 (∑D2 = 58) N=10

  23. xy/n - (x/n)(y/n) r = (SDx) (SDy) Pearson’s r r = 32.86 - (5.5) (5.5)/(3.03) (3.03) r = 35.86 - 30.25 / 9.09 r = 5.61 / 9.09 r = 0.6172

  24. Pearson’s r Excel Demonstration

  25. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household Is there a significant linear correlation?

  26. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household Is there a significant linear correlation?

  27. Data from the Garbage Project x Plastic (lb) 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 y Household r = 0.842 R2 = 0.71 Is there a significant linear correlation?

  28. = .01 = .05 n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Is there a significant linear correlation? n = 8  = 0.05 H0:  = 0 H1 : 0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table R with n = 8 and = 0.05) TABLE R Critical Values of the Pearson Correlation Coefficient r

  29. Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0:  = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject = 0 Fail to reject  = 0 Reject = 0 1 - 1 r = - 0.707 0 r = 0.707 Sample data: r = 0.842

  30. Method 1: Test Statistic is t (follows format of earlier chapters)

  31. To determine whether there is a significant linear correlation between two variables Two methods Both methods let H0: = (no significant linear correlation) H1:  (significant linear correlation) Formal Hypothesis Test

  32. Test statistic: r Critical values: Refer to Table R (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations)

  33. Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations) Reject = 0 Fail to reject  = 0 Reject = 0 r = 0.811 1 r = - 0.811 0 -1 Sample data: r = 0.828

  34. Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: r t = 1 - r 2 n - 2 Critical values: use Table T with degrees of freedom = n - 2

  35. Select a significance level  Calculate r using Formula 9-1 r The test statistic is The test statistic is r t = Critical values of t are from Table A-6 1 - r 2 n -2 Critical values of t are from Table A-3 with n-2 degrees of freedom If the absolute value of the test statistic exceeds the critical values, reject H0:  = 0 Otherwise fail to reject H0 If H0 is rejected conclude that there is a significant linear correlation. If you fail to reject H0, then there is not sufficient evidence to conclude that there is linear correlation. Start Let H0:  = 0 H1:  0 METHOD 1 METHOD 2 Testing for a Linear Correlation

  36. Why does the critical value of r increase as sample size decreases? A correlation by chance is more likely.

  37. Coefficient of Determination(Effect Size) r2 The part of variance of one variable that can be explained by the variance of a related variable.

  38. Justification for r Formula  (x -x) (y -y) r= (x, y) centroid of sample points (n -1) Sx Sy x = 3 y x - x = 7- 3 = 4 (7, 23) • 24 20 y - y = 23 - 11 = 12 Quadrant 1 Quadrant 2 16 • 12 y = 11 (x, y) • 8 Quadrant 3 Quadrant 4 • • 4 x 0 0 1 2 3 4 5 6 7

More Related