Bivariate Data Analysis. Bivariate Data analysis 4. If the relationship is linear the residuals plotted against the original x - values would be scattered randomly above and below the line.
Bivariate Data Analysis
Bivariate Data analysis 4
If the relationship is linear the residuals plotted against the original x - values would be scattered randomly above and below the line.
A scatter plot of residuals versus the x-values should be boring and have no interesting features, like direction or shape. It should stretch horizontally with about the same amount of scatter throughout. It should show no curves or outliers
When examining residuals to check whether a linear model is appropriate, it is usually best to plot them. The variation in the residuals is the key to assessing how well the model fits.
The pattern of residuals looks more like a parabola. This should indicate that the data were not really linear, but were more likely to be quadratic.
We cannot use a linear model unless the relationship between two variables is linear.
Often re-expression can save the day, straightening bent relationships so that we can fit and use a simple linear model.
When a scatterplot shows a CURVED form that consistently increases or decreases, we can often straighten the form of the plot be re-expressing one or both of the variables.
This plot looks ‘straight’. The correlation is now 0.998, but the increase in correlation is not important. (The original value of 0.979 is already large.) What is important is the form of the plot is now straight, so the correlation is now an appropriate measure of association.
Don’t stray too far from the powers suggested. Taking a high power may artificially inflate R2, but it won’t give a useful or meaningful model. It is better to stick with powers between 2 and -2. Even in that range you should prefer the simpler powers in the ladder to those in the cracks. A square root is easier to understand than the 0.413 power.
Comparing histograms and scatter graphs
The data in the scatter plot below shows the progression of the fastest times for the men’s marathon since the Second World War. We may want to use this data to predict the fastest time at 1 January 2010 (i.e. 64 years after 1 January 1946).
one for say 0 – 23 years and
one for say 23 – 60 years
The data in the scatter plot below comes from a random sample of 60 models of new cars taken from all models on the market in New Zealand in May 2000. We want to use the engine size to predict the weight of a car.