Examining Relationships. Prob. And Stat. 2.2 Correlation. A linear relation is strong if the points lie close to a straight line. A linear relation is weak if the points are widely scattered about a line. CORRELATION.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Prob. And Stat. 2.2
A linear relation is strong if the points lie close to a straight line.
A linear relation is weak if the points are widely scattered about a line.
The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r.
Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are
and sx for the x-values, and andsy for the y-values. The correlation r between x and y is
r = correlation, has no units.
This is the standardized value for the x - variable
The correlation r is an average of the products of the standardized values of the x and y variables for the n individuals.
r is positive when there is a positive association between the variables.
In the example using height and weight, when an individual is above average height, they tend to be above average in weight. Therefore positive.
r is negative when the association between x and y is negative.
In the example again using height and weight, when an individual is below average in height, they tend to be below average in weight. Therefore negative.
1. Correlation makes no distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation.
2. Correlation requires that both variables be quantitative, so that it makes sense to do the arithmetic indicated by the formula for r. We cannot calculate a correlation between the incomes of a group of people and what city they live in, because city is a categorical variable.
3. Because r uses the standardized values of the observations, r does not change when we charge the units of measurement of x, y, or both. Measuring height in inches rather than centimeters and weight in pounds rather than kilograms does not change the correlation between height and weight. The correlation r itself has no unit of measurement; it is just a number.
4. Positive r indicates positive association between the variables, and negative r indicates negative association.
5. The correlation r is always a number between -1 and 1. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 toward either -1 or 1. Values of r close to -1 or 1 indicate that the points in a scatterplot lie close to a straight line. The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line.
6. Correlation measures the strength of only a linear relationship between two variables. Correlation does not describe curved relationships between variables, no matter how strong they are.
7. Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. The correlation for Figure 2.7 (page 93) is r = 0.634 when all 51 observations are included, but rise to r = 0.783 when we omit Alaska and the District of Columbia. Use r with caution when outliers appear in the scatterplot.
Values of r closer to 1 or -1 correspond to strong linear relationships.
It is not easy to guess the value of r from the appearance of a scatterplot.
Changing the plotting scale in a scatterplot may mislead the eye, but does not change the correlation.
Correlation is not a complete description of two variable data, even when the relationship between the variables is linear.
Conclusions based on correlations alone may require rethinking in the light of a more complete description of the data.
It is better to give the means and standard deviations of both x and y along with the correlation. (Because the formula for correlation uses the means and standard deviations, these measures are the proper choice to accompany a correlation.)