Introduction to Statistics

Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29

Correlation • Chapter 15: • Correlation pp. 466-485 • Not responsible for remainder of the chapter

Correlation • A statistical technique that is used to measure and describe a relationship between two variables • For example: • GPA and TD’s scored • Statistics exam scores and amount of time spent studying

Notation • A correlation requires two scores for each individual • One score from each of the two variables • They are normally identified as X and Y

Three characteristics of X and Y are being measured… • The direction of the relationship • Positive or negative • The form of the relationship • Usually linear form • The strength or consistency of the relationship • Perfect correlation = 1.00; no consistency would be 0.00 • Therefore, a correlation measures the degree of relationship between two variables on a scale from 0.00 to 1.00.

Assumptions • There are 3 main assumptions… • 1. The dependent and independent are normally distributed. We can test this by looking at the histograms for the two variables • 2. The relationship between X and Y is linear. We can check this by looking at the scattergram • 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line. • If the above 3 assumptions have been met, then we can use correlation and test r for significance

Pearson r • The most commonly used correlation • Measures the degree of straight-line relationship • Computation: r = SP / (SSX)(SSY)

Example 1 • A researcher predicts that there is a high correlation between scores on the stats final exam (100 pts max) and scores on the university’s exit exam for graduating seniors (330 pts max)

Example 1 X 30 38 52 90 95 305 X2 900 1,444 2,704 8,100 9,025 22,173 Y 160 180 180 210 240 970 Y2 25,600 32,400 32,400 44,100 57,600 192,100 XY 4,800 6,840 9,360 18,900 22,800 62,700 (SX) (SX2) (SY) (SY2) (SXY)

Example 1 SSX = SX2 - (SX)2 = 22,173 - 3052 = n 5 = 22,173 - 93025/5 = 22,173 - 18,605 = 3,568 SSY = SY2 - (SY)2 = 192,100 - 9702 = n 5 = 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920

Example 1 SP = SXY - (SX)(SY) = n 62,700 - (305)(970) 5 = 62,700 - 295,850/5 = 62,700 - 59,170 = 3,530

Example 1 • r = SP / (SSX)(SSY) = 3,530 / (3,568)(3,920) = 3,530 / 13,986,560 = 3,530 / 3,739.861 = .944

Pearson Correlation: “Rule of Thumb” • If r = 1.00 Perfect Correlation + .70 to +.99 Very strong positive relationship + .40 to +.69 Strong positive relationship + .30 to +.39 Moderate positive relationship + .20 to +.29 Weak positive relationship + .01 to +.19 No or negligible relationship - .01 to -.19 No or negligible relationship - .20 to -.29 Weak negative relationship - .30 to -.39 Moderate negative relationship - .40 to -.69 Strong negative relationship - .70 or higher Very strong negative relationship

Example 1: Interpretation • An r of 0.944 indicates an extremely strong relationship between scores on the stats final exam and scores on the exit exam. As scores on the stats final go up so too do scores on the exit exam. • But we are not finished with the interpretation • See next slide 

Interpretation (Continued)Coefficient of Determination (r2) • The value r2 is called the coefficient of determination because it measures the proportion in variability in one variable that can be determined from the relationship with the other variable • For example: • A correlation of r = .944 means that r2 = .891 (or 89.1%) of the variability in the Y scores can be predicted from the relationship with the X scores

Coefficient of Determination (r2) and Interpret:The coefficient of determination is r2 = .891. Scores on the stats final exam, by itself, accounts for 89.1% of the variation of the exit exam scores.

Example 2 • A researcher predicts that there is a high correlation between years of education and voter turnout • She chooses Alamosa, Boston, Chicago, Detroit, and NYC to test her theory

Example 2 • The scores on each variable are displayed in table format: • Y = % Turnout • X = Years of Education

Scatterplot • The relationship between X and Y is linear.

Make a Computational Table

Example 2 SSX = SX2 - (SX)2 = 782.15 - 62.52 = n 5 = 782.15 - 3906.25/5 = 782.15 – 781.25 = 0.9 SSY = SY2 - (SY)2 = 20374 - 3182 = n 5 = 20374- 101124/5 = 20374 – 20224.80 = 149.20

Example 2 SP = SXY - (SX)(SY) = n 3986.40 - (62.5)(318) 5 = 3986.40 - 19875/5 = 3986.40 – 3975.00 = 11.40

Example 2: Find Pearson r • r = SP / (SSX)(SSY) = 11.4 / (0.9)(149.2) = 11.4 / 134.28 = 11.4/ 11.58 = .984

Example 2: Interpretation • An r of 0.984 indicates an extremely strong relationship between years of education and voter turnout for these five cities. As level of education increases, % turnout increases. • But we are not finished with the interpretation • See next slide 

Coefficient of Determination (r2) and Interpret:The coefficient of determination is r2 = .968. Education, by itself, accounts for 96.8% of the variation in voter turnout.

Pearson’s r • Had the relationship between % college educated and turnout, r =.32. • This relationship would have been positive and weak to moderate. • Had the relationship between % college educated and turnout, r = -.12. • This relationship would have been negative and weak.

Hypothesis Testing with Pearson • We can have a two-tailed hypothesis: Ho: ρ = 0.0 H1: ρ ≠ 0.0 • We can have a one-tailed hypothesis: Ho: ρ = 0.0 H1: ρ < 0.0 (or ρ > 0.0) • Note that ρ (rho) is the population parameter, while r is the sample statistic

Find rcritical • See Table B.6 (page 537) • You need to know the alpha level • You need to know the sample size • See that we always will use:df = n-2

Find rcalculated • See previous slides for formulas

Make you decision… • rcalculated < rcritical thenRetain H0 • rcalculated > rcritical thenReject H0

Always include a brief summary of your results: • Was it positive or negative? • Was it significant ? • Explain the correlation • Explain the variation • Coefficient of Determination (r2)

Credits • http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review • http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1

Introduction to Statistics