4.2 Cautions about Correlation and Regression Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you must be aware of their limitations, beginning with the fact that correlation and regression describe only linear
Correlation and regression are powerful tools for
describing the relationship between two variables.
When you use these tools, you must be aware of
their limitations, beginning with the fact that
correlation and regression describe only linear
Also remember that the correlation r and the
least-squares regression line are not resistant.
Extrapolation is the use of a regression line
prediction far outside the domain of values of
the explanatory variable x that you used to
obtain the line or curve. Such predictions are
often not accurate.
Deriving an equation for baby weight and
age where the age only goes up to 12 months.
Then trying to predict a baby weight at 16
A lurking variables is a variable that is not
among the explanatory or response variables in a
study and yet may influence the interpretation
of relationship among those variables.
A lurking variable can falsely suggest a strong
relationship between x and y or it can hide a
relationship that is really there.
The question of Causation
In many studies of the relationship between two variables, the goal is to establish that changes in the explanatory variable cause changes in the response variable.
Even when a strong association is present, the
conclusion that this association is due to a causal link between the variables is often elusive.
Causation: changes in x cause a change in y
Note: rarely will you find a direct causation
relationship. Just about every relationship has more
than one variable causing the change.
Common Response: changes in both x and y are caused by changes in a lurking variable z.
Confounding: The effects (if any) of x on y is
confounded with the effect of a lurking variable z.