1 / 14

Sociology 601 Class 17: October 28, 2009

Sociology 601 Class 17: October 28, 2009. Review (linear regression) new terms and concepts assumptions reading regression computer outputs Correlation (Agresti and Finlay 9.4) the correlation coefficient r relationship to regression coefficient b r-squared: the reduction in error.

ella
Download Presentation

Sociology 601 Class 17: October 28, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sociology 601 Class 17: October 28, 2009 • Review (linear regression) • new terms and concepts • assumptions • reading regression computer outputs • Correlation (Agresti and Finlay 9.4) • the correlation coefficient r • relationship to regression coefficient b • r-squared: the reduction in error

  2. Review: Linear Regression • New terms and concepts • slope • intercept •  •  • negative and positive slopes • zero slope • least squares regression • predicted value • residuals • sums of squares error

  3. Review: Linear Regression • Assumptions • random sample (errors are independent) • linear • no heteroscedasticity • no outliers • Linearity, heteroscedasticity, and outliers can be checked with scattergrams and crosstabs • before computing regressions • on residuals

  4. A Problem with Regression Coefficients Regression coefficients don’t measure the strength of an association in a way that is easily compared across different models with different variables or different scales. • Rescaling one or both axes changes the slope b. • Example: murder rate and poverty rate for 50 US States. Yhat = -.86 + .58X , • where Y = murder rate per 100,000 per year • and X = poverty rate per 100 • If we rescale y, the murder rate, to murders per 100 persons per year, then Yhat = -.00086 + .00058X • (does this mean the association is now weaker?) • If we rescale x, the poverty rate, to proportion in poverty (0.00 -> 1.00), then Yhat = -.86 + 58X • (does this mean the association is now stronger?)

  5. the correlation – a standardized slope • An accepted solution for the problem of scale is to standardize both axes (e.g., change them into z-scores with mean zero and a standard deviation of 1), then calculate the slope. b = Y / X r = (Y /sY)/(X /sX) = (Y /X )*(sX/sY)= b*(sX/sY) where

  6. The Correlation Coefficient, r • r is called … • the Pearson correlation (or simply the correlation) • the standardized regression coefficient (or the standardized slope) • r = b*(sX/sY) • r is a sample statistic we use to estimate a population parameter 

  7. Calculating r: an example Calculating r for the murder and poverty example • b = .58, sX = 4.29, sY = 3.98 • r = b*(sX/sY) = .58*(4.29/3.98) = .629= .63 • alternatively (if the murder rate is per 100 persons), • b = .00058, sX = 4.29, sY = .00398 • r = b*(sX/sY) = .00058*(4.29/.00398) = .629 = .63

  8. Properties of the correlation coefficient r: • –1  r  1 • r can be positive or negative, and has the same sign as b. • r = ± 1 when all the points fall exactly on the prediction line. • The larger the absolute value of r, the stronger the linear association. • r = 0 when there is no linear trend in the relationship between X and Y.

  9. Properties of the Correlation Coefficient r: • The value of r does not depend on the units of X and Y. • The correlation treats X and Y symmetrically • (unlike the slope β) • this means that a correlation implies nothing about causal direction! • The correlation is valid only when a straight line is a reasonable model for the relationship between X and Y.

  10. Calculating a correlation coefficient using STATA • Recall the religion and state control study, where high levels of state regulation were associated with low levels of weekly church attendance. • . correlate attend regul • (obs=18) • | attend regul • -------------+------------------ • attend | 1.0000 • regul | -0.6133 1.0000

  11. An alternative interpretation of r: proportional reduction in error • Old interpretation for murder and poverty example: r = .63, the murder rate for a state is expected to be higher by 0.63 standard deviations for each 1.0 standard deviation increase in the poverty rate. • New interpretation: by using poverty rates to predict murder rates, we explain ?? percent of the variation in states’ murder rates.

  12. Proportional reduction in error: • Predicting Y without using X: Y = Ybar + e1; E1 =  e12 =  (observed Y – predicted Y)2 = Total Sums of Squares = TSS • Predicting Y using X: Y = Yhat + e2 = a + bX + e2; E2 =  e22 =  (observed Y – predicted Y)2 = Sum of Squared Error = SSE • Proportional reduction in error: r2 = PRE = (E1 – E2 ) / E1 = (TSS – SSE) / TSS

  13. Proportional reduction in error. • calculating r 2for the murder and poverty example: • r 2 = .629 2 = .395 • alternatively (using computer output), • r 2 = (TSS – SSE) / TSS = (777.7 – 470.4)/777.7 = .395 • interpretation: 39.5% of the variation in states’ murder rates is explained by its linear relationship with states’ poverty rates.

  14. R-square • r 2is also called the coefficient of determination. • Properties of r 2: • 0  r 2  1 • r 2 = 1 (its maximum value) when SSE = 0. • r 2 = 0 when SSE = TSS. (furthermore, b = 0) • the higher r 2is, the stronger the linear association between X and Y. • r 2 does not depend on the units of measurement. • r 2 takes the same value when X predicts Y as when Y predicts X.

More Related