Using Correlation to Describe Relationships between two Quantitative Variable. Pearson’s Correlation Coefficient. When we describe the association between two variables, we can use a scatterplot to help our description.
Remember that these numbers are just guidelines. Each set of data is different and the context for the data must be considered.
Notice that the formula is adding terms together (we’ll talk about what those terms are shortly) and then dividing that sum by 1 less than the number of data points we have. So, it appears that we are looking for “an average” of sorts.
Now the terms that we are adding together are the product of z-scores.
Remember that a z-score is the number of standard deviations a piece of data is from the mean of the distribution.
So each term is the product of the z-scores in each direction (x and y) for each point.
So, how can we calculate this value?
Starting with our original formula
Now, the standard deviation of our x-values and the y-values are constants once our data has been collected, so they will be the same for each term in the summation.
This means that we can factor those out of the sum leaving:
Now, expanding the summation gives us:
Now, using the distributive property to multiply the binomials in each term gives:
Then, collapsing the sums gives:
Now, the ∑xiand the ∑yican be written as nxbar and nybar
But two of the last three terms cancel each other out, so we are left with:
Now, substituting the values for each of the variables we find that the correlation coefficient,
r=.96, indicating a strong, linear correlation in which as the amount of fat in the burger increases, so does the calories