Stat 112: Lecture 15 Notes. Finish Chapter 6: Review on Checking Assumptions (Section 6.4-6.6) Outliers and Influential Points (Section 6.7) Homework 4 is due this Thursday. Please let me know of any ideas you want to discuss for the final project. .
Check residual by predicted plots and residual plots for each variable for pattern in the mean of the residuals.
Remedies: Transformations and Polynomials. To see if remedy works, check new residual plots for pattern in the mean of the residuals..
is the same for all subpopulations.
Check residual by predicted plot for pattern in the spread of the residuals.
Remedies: Transformation of Y. To see if remedy works, check residual by predicted plot for the transformed Y regression.
is normally distributed for all subpopulations.
Check histogram for bell shaped distribution of residuals and normal quantile plot of residuals for approximately straight line.
Remedies: Transformation of Y. To see if remedy works, check histogram and normal quantile plot of residuals for transformed Y regression residuals
Center City Philadelphia is influential; Gladwyne is not. In general,
points that have high leverage are more likely to be influential.
Center City Philadelphia has both influence (Cook’s Distance much
Greater than 1 and high leverage (hat value > 3*2/99=0.06). No other
observations have high influence or high leverage.
# of coefficients in regression model = 2 for simple linear regression.
n=number of observations.
See flowchart attached to end of slides
Does removing the observation change the
EDUC=median number of school years completed for persons 25 and older;
NONWHITE=percentage of 1960 population that is nonwhite; NOX=relative pollution potential of Nox (related to amount of tons of Nox emitted per day per square kilometer);
SO2=relative pollution potential of SO2
2. a) From the scatter plot of MORT vs. NOX we see that NOX values are crunched very tight. A Log transformation of NOX is needed.
b) The curvature in MORT vs. SO2 indicates a Log transformation for SO2 may be suitable.
After the two transformations we have the following correlations:
than 1 –
Residual y (w/o xj) vs. Residual xj (vs the rest of x’s)
(both axes are recentered)
coefficient for that variable in the multiple regression
(Use them the same way as in a simple regression to identify the effect of points for the regression coefficient
of a particular variable)
The enlarged observation New Orleans is an outlier for estimating each coefficient and is highly leveraged for estimating the coefficients of interest on log Nox and log SO2. Since New Orleans is both highly leveraged and an outlier, we expect it to be influential.