1 / 19

Correlation

Correlation. Covariance. The variance of a variable X provides information on the variability of X . The covariance of two variables X and Y provides information on the related variability of X and Y together.

adanna
Download Presentation

Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation Correlation

  2. Covariance • The variance of a variable X provides information on the variability of X. • The covariance of two variables X and Y provides information on the related variability of X and Y together. • Note the similarity of the structure of the formulas. Instead of relating X to itself as in the variance, X is related to the other variable Y. Correlation

  3. Excel Note on Covariance • There is an Excel function to calculate covariance:=COVAR(range1, range2) • Unfortunately for most common purposes, Excel does not calculate the sample covariance but instead calculates what is known as the population covariance. • Therefore, in order to transform Excel’s covariance calculation into the more useful sample covariance, it is necessary multiply Excel’s covariance calculation by the factor n/(n-1). Correlation

  4. Covariance • Covariance measures how much Y and X tend to vary in the same direction • High positive covariance means the highest values of Y tend to occur along with the highest values of X • However, it’s hard to interpret because it has no standard scale of reference. A covariance of 300,000 could be trivial while another of 2.1 fairly substantial. Correlation

  5. Correlation Coefficient A more useful expression of this relationship between X and Y is to express it as a percentage of the standard deviations of X and Y. This percentage is known as the “standardized” covariance, or the correlation coefficient (correlation for short), and is commonly denoted by the variable r. In Excel, the correlation formula is =CORREL(range1, range2) Correlation

  6. Correlation Coefficient • The correlation coefficient (r) measures how much Y and X tend to vary in the same direction on a standard scale. (Varying in the same direction is implicitly a linear relationship.) • It will always be between -1 and +1 r = +1 implies a perfect positive relationship r = –1 implies a perfect negative relationship r = 0 implies no linear relationship exists! Correlation

  7. How Much is Enough? • Since it is unlikely that any real social data will have either a perfect positive correlation (r=1) or a perfect negative correlation (r=-1), how does an analyst know if there is “enough” correlation. • A simple rule of the thumb is that a “correlation value” of less than 30% suggests no linear relationship, whereas a “correlation value” of more than 70% suggests a strong linear relationship. Everything in between is, say, “somewhat of a relationship”. Correlation

  8. Examples Correlation

  9. A T-test for formalizing “good enough” • The hypotheses are: H0: correlation= 0 Versus H1: correlation≠ 0 • Approximate the standard error using the formula: • Calculate the T-statistic, n-2 dof. The formula is: Correlation

  10. Example 1 • Suppose for a sample of size 20, the sample covariance between two variables X and Y is 87, the sample variance of X is 100 and the sample variance of Y is 400. Is there a statistically significant linear relationship? Correlation

  11. Example 1 Calculations A linear relationship between the two variables is statistically significant at the 10% level but not at the 5% level. (1.734 and 2.101) Correlation

  12. Example 2 – Readership Data Correlation

  13. Excel’s Data Analysis Correlation • Correlation is most useful for quickly considering possible relationships between many different variables. • Suppose for example that the analysis is examining 10 different variables: X1 … X10 • Using Excel’s correl function would require entering 45 such calculations. • A better exploratory (one-time) way is use Excel’s built-in Data-Analysis Toolpak. Correlation

  14. Correlation Toolpak Example Complete the dialog box Leads to the results Correlation

  15. Measuring Nonlinear Behavior • The correlation coefficient measures linearity. • If there is a nonlinear relationship, rwill underestimate the predictive power of the relationship between the two variables. Correlation

  16. Rank Correlation • Rank correlation measures how two variables are related in a more general way. • A high rank correlation says that large values of X tend to occur with large values of Y, and low with low, whether or not the relationship is linear. • Generally this type of correlation test might be applied to data that is highly skewed. In other words, there are a significant amount of very extreme values. Correlation

  17. Rank Correlation Computation • Compute the ranks for the set of X values, then for the Y values, low to high. • Compute the differences of the ranks and the square of the differences. • The statistic then is: For simplicity, if some values are tied, interpolate the ranks and use the formula above. In this case, technically, the previous correlation calculation should be applied to the rankings rather than the formula above. Correlation

  18. Re-use the T-test for “good enough” • The hypotheses are: H0: correlation= 0 Versus H1: correlation≠ 0 • Approximate the standard error using the formula: • Calculate the T-statistic, n-2 dof. The formula is: Correlation

  19. Example 3 Correlation

More Related