Definition. The linear correlation coefficient r measures the strength of the linear relationship between the paired quantitative x and y values in a sample. Requirements. 1. The sample of paired ( x, y ) data is a simple random sample of quantitative data.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Definition
The linear correlation coefficientr measures the strength of the linear relationship between the paired quantitative x and yvalues in a sample.
1. The sample of paired (x, y) data is a simple random sample of quantitative data.
2. Visual examination of the scatterplot must confirm that the points approximate a straightline pattern.
3. The outliers must be removed if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.
n = number of pairs of sample data
denotes the addition of the items indicated.
xdenotes the sum of all xvalues.
x2indicates that each xvalue should be squared and then those squares added.
(x)2 indicates that the xvalues should be added and then the total squared.
xyindicates that each xvalue should be first multiplied by its corresponding yvalue. After obtaining all such products, find their sum.
r = linear correlation coefficient for sample data.
= linear correlation coefficient for population data.
nxy–(x)(y)
r=
n(x2)– (x)2n(y2)– (y)2
Formula
The linear correlation coefficientr measures the strength of a linear relationship between the paired values in a sample.
Computer software or calculators can compute r
1. –1 r 1
2.if all values of either variable are converted to a different scale, the value of r does not change.
3. The value of r is not affected by the choice of x and y. Interchange all x and yvaluesand the value of r will not change.
4. r measures strength of a linear relationship.
5.r is very sensitive to outliers, they can dramatically affect its value.
Example:
Using software or a calculator, r is automatically calculated:
Example:
Using the pizza subway fare costs, we have found that the linear correlation coefficient is r = 0.988. What proportion of the variation in the subway fare can be explained by the variation in the costs of a slice of pizza?
With r = 0.988, we get r2 = 0.976.
We conclude that 0.976 (or about 98%) of the variation in the cost of a subway fares can be explained by the linear relationship between the costs of pizza and subway fares. This implies that about 2% of the variation in costs of subway fares cannot be explained by the costs of pizza.
1. Causation: It is wrong to conclude that correlation implies causality.
2. Averages: Averages suppress individual variation and may inflate the correlation coefficient.
3. Linearity: There may be some relationship between x and y even when there is no linear correlation.
Basic Concepts of Regression
The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and y (called the response variable or dependent variable).
^
The typical equation of a straight liney = mx + b is expressed in the formy = b0 + b1x, where b0 is the yintercept and b1 is the slope.
^
Regression Equation
^
y= b0 + b1x
Given a collection of paired data, the regression equation
algebraically describes the relationship between the two variables.
yintercept of regression equation
Slope of regression equation
Equation of the regression line
0b0
1b1
y = 0 + 1xy = b0 + b1x
^
Population
Parameter
Sample
Statistic
1. The sample of paired (x, y) data is a random sample of quantitative data.
2. Visual examination of the scatterplot shows that the points approximate a straightline pattern.
3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.
calculators or computers can compute these values
(slope)
(yintercept)
Example:
Refer to the sample data given in Table below. Use technology to find the equation of the regression line in which the explanatory variable (or x variable) is the cost of a slice of pizza and the response variable (or y variable) is the corresponding cost of a subway fare.
Example:
Requirements are satisfied: simple random sample; scatterplot approximates a straight line; no outliers
Here are results from four different technologies technologies
All of these technologies show that the regression equation can be expressed asy = 0.0346 +0.945x, where y is the predicted cost of a subway fare and x is the cost of a slice of pizza.
^
^
Example:
Example:
Graph the regression equation(from the preceding Example) on the scatterplot of the pizza/subway fare data and examine the graph to subjectively determine how well the regression line fits the data.
Example:
For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the yvalue that is predicted by using the regression equation. That is,
^
residual = observed y – predicted y = y – y
A straight line satisfies the leastsquares property if the sum of the squaresof the residuals is the smallest sum possible.
A residual plot is a scatterplot of the (x, y) values after each of theycoordinate values has been replaced by the residual value y – y (where y denotes the predicted value of y). That is, a residual plot is a graph of the points (x, y – y).
^
^
^
Residual Plot Analysis
When analyzing a residual plot, look for a pattern in the way the points are configured, and use these criteria:
The residual plot should not have an obvious pattern that is not a straightline pattern.
The residual plot should not become thicker (or thinner) when viewed from left to right.
Residual Plots
Residual Plots
Residual Plots
Coefficient of determination
is the amount of the variation in y that
is explained by the regression line.
explained variation.
r2=
total variation
The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.
(
)
å
2
x
å
=
SS
(
x
)
“sum of squ
ares for
x”
2
=

x
n
(
)
å
2
y
å
=
SS
(
y
)
“sum of squ
ares for
y”
2
=

y
n
å
å
x
y
å
=
SS
(
xy
)
“sum of squ
ares for
xy”
=

xy
n

SS
(
xy
)
42
.
79
=
=
=

r
0.
47
SS
(
x
)
SS
(
y
)
(
7
.
449
)(
1116
.
9
)
The equation is determined by:
b0: yintercept
b1: slope
1) Draw a scatter diagram for this data
2) Find the equation of the line of best fit
^
y
SS
(
xy
)
118
.
75
=
=
=
b
0.
5174
1
SS
(
x
)
229
.
5
(
)
å
å

×
y
b
x

133
(0.
5174
)(
234
)
1
=
=
=
b
1
.
4902
0
n
8
=
+
1
.
49
0.
517
x
Solution 1)
Equation o
f the line
of best f
it:
Job Satisfaction Survey
Solution 2)
22
21
20
19
18
Job
Satisfaction
17
16
15
14
13
12
21
23
25
27
29
31
33
35
37
Salary