Polynomial Regression and Transformations

Polynomial Regressionand Transformations STA 671 Summer 2008

Review • The estimated residuals e1,…,en provide the best method for checking the assumptions. • Remember the residuals εi ~ N(0,σ). The estimated residuals should be close to that. • In a residual plot, you are looking for outliers, curvature, or changing variance. • In this lecture we will discuss polynomial regression and transformations, two separate methods. Both are possible solutions to curvature, and transformations have the added benefit they sometimes address changing variance.

Recall the Hooker data. There appears to be a small amount of curvature.

This curvature is seen more clearly in the residual plot.

Polynomial regression – one method for dealing with curvature. • To account for curvature, we can perform something called “polynomial regression”, which consists of fitting a polynomial (a quadratic or cubic typically) instead of a line. • Recall the linear model was Yi = β0 + β1 Xi + εi. The quadratic model is Yi = β0 + β1 Xi + β2 Xi2 + εi. The cubic model is Yi = β0 + β1 Xi + β2 Xi2 + β3 Xi3 + εi. • The higher the order of the polynomial, the more curvature it can account for.

Quadratic model accounts for the curvature Quadratic equation is 88.017 – 1.1295 Temp + 0.004 Temp2 If the quadratic model is better than the linear model, what about a cubic?

A cubic model produces no visual improvement equation is Pressure = 124.14 – 1.69 Temp + 0.0069 Temp2 - 0.000005 Temp3

Which to choose, quadratic or cubic? • In general, choose the LOWEST order polynomial possible (i.e. prefer linear to quadratic, quadratic to cubic, etc.). • This is aimed at 1) “Occam’s razor” meaning that simpler models are preferred, and 2) the higher the order, the more parameters to estimate. Statistically, it’s easier to estimate a few parameters than many.

P-values for selecting order • The regression output provides a formal method for selecting the order of the polynomial. This method typically agrees with looking at the residual plot. • The regression output provides p-values for each term in the regression. • The p-value for the highest order term is the ONLY one that is used.

Using p-values to select order • Begin by fitting the cubic model. If the cubic term is significant, use the cubic model (you can consider higher order models, but we do not in STA671) • If the cubic term is NOT significant, remove it and RERUN the model (p-values change depending on what terms are in the model), then look to see if the quadratic term is significant. • If the quadratic term is not significant, remove it and RERUN the model, resulting in a linear regression. • If none of these models produce a reasonable residual plot, you may need another method.

For the boiling point data • We first run the cubic model and acquire the following p-values • The p-value is not significant, so remove the cubic term and RERUN the model (do NOT just remove the quadratic terms based on the p-value above) Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 124.13563 384.83452 0.32 0.7495 12434 Temperature 1 -1.68544 5.92071 -0.28 0.7781 444.16724 Temperature_2 2nd power of TEMPERATURE 1 0.00688 0.03032 0.23 0.8222 2.98566 Temperature_3 3rd power of TEMPERATURE 1 -0.00000486 0.00005171 -0.09 0.92590.00022757

Quadratic model for boiling point data • The quadratic model produces the following p-values • The quadratic term is significant AND we observe a reasonable residual plot, so we stop here. This is our final model. Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 88.01662 13.93063 6.32 <.0001 12434 Temperature 1 -1.12954 0.14336 -7.88 <.0001 444.16724 Temperature_2 2nd power of TEMPERATURE 1 0.00403 0.00036820 10.95 <.0001 2.98566

What if no polynomial model produces a reasonable residual plot? • If none of our polynomial model produces a reasonable residual plot, we need another method. • Another method to try is to transform the response variable. • Transformations, like polynomial regression, can handle curvature, and in addition transformations have the potential to handle changing spread as well.

Example - the ethanol data • Data comes from an engine exhaust study. • NOx is a measure of the exhaust from the engine, while E is a measure of the fuel/air mixture (high values are almost all fuel, low values are almost all air) • A cubic model does not fit the data. A quadratic of linear model would do worse.

A cubic fit to the ethanol data Residual plot shows clear curvature. Scatterplot

Transformations • Instead of fitting Y as the response variable, we fit a function of Y as the response variable. • Thus, instead of Yi = β0 + β1 Xi + εi, you can fit log(Yi) = β0 + β1 Xi + εi, orsqrt(Yi) = β0 + β1 Xi + εi, orcbrt(Yi) = β0 + β1 Xi + εi, etc. • Thus, you greatly expand the possible models you can fit. • You can transform the X variable as well, but in the interest of time we do not discuss that in detail in STA671.

Transformations allow different errors structures. • A quadratic regression looks like Yi = β0 + β1 Xi + β2 Xi2 + εi. At any particular X, the variance is the same. • Taking the square root transformation sqrt(Yi) = β0 + β1 Xi + εi means thatYi = [β0 + β1 Xi + εi]2 = β02 + β12 Xi2 + εi2 + 2 β0β1 Xi + 2 β0εi + 2 β1 Xiεi. • There is a quadratic relationship between X and Y. • Note the multiplication between Xi and εi, this allows the variance to change for each Xi. Thus, in addition to handling curvature, transformations allow you to address changing variance.

Prototypical Data requiring transformation

After square root transformation

Which transformation? • There are no hard and fast rules on which transformation to try, no guaranteed method for finding a good transformation (in some data, you seem to never find a great fit). • Usually you have to perform trial and error, and remember you can combine polynomial regression with transformation. Thus for example, you can fit a cubic model in X to predict log(Y).

Some “typical” transformations • If you have area data, a square root transform is often useful (converts area to something proportional to the radius or length). • Similarly with volume, a cube root transformation may be appropriate. • With financial data (incomes, etc.), a log transform may be appropriate. Logs change percentage increases to constant increases, thus if a unit increase in X results in a 10% increase in Y, it also results in a 0.0953 increase in Y.

A general strategy • Fit the raw data (X and Y) with a least squares line. See if you get a good residual plot. If so, stop and be happy  • If not, try a polynomial regression (quadratic or cubic). If one of these fits, stop and be happy (remember, fit the smallest model possible). • If a polynomial regression does not work, try transforming Y to log, sqrt, and cube root (i.e. perform three more regressions). Fit a cubic polynomial regression on each of these and determine the best outcome. Choose the transformation that provides the best residual plot. • If none of those work, then regression might not be effective (there are more advanced techniques) or you may have to start transforming X as well. This becomes true trial and error. Consult your friendly local statistician.

Back to the ethanol data. • We can see from the scatterplot that E and NOX are not linearly related. • We tried a cubic regression and that didn’t work. • Now off to the transformations. We fit cubic regressions with log(Y), sqrt(Y), and cbrt(Y) as the response variables. • We may be able to get satisfactory results with something less than cubic, but if cubic doesn’t work the lower order models won’t either, thus we start with cubic models.

Square root transformation. Still clear curvature. Scatterplot Residual plot

Cube root transformation. Improved, but still some curvature.

Log transformation. Still some lack of fit, but best of the bunch.

Log transform is not perfect, but best we can do right now (I encourage you to play with the data on your own) • After we have chosen the log transformation on the basis of the best residual plot (and decided it is “ok”, if certainly not a great residual plot), we look at the p-value for the cubic term to see if we can remove it. We can. Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 -11.32827 2.48931 -4.55 <.0001 19.62487 E 1 25.55212 8.69871 2.94 0.0043 0.31320 E_2 2nd power of E 1 -10.74539 9.86507 -1.09 0.2792 34.56387 E_3 3rd power of E 1 -2.43334 3.63758 -0.67 0.5054 0.02403

Quadratic model for log(NOX) • The quadratic model produces almost identical scatter and residual plots. The quadratic term is significant, so this is our final model. Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Type I SS Intercept Intercept 1 -12.95199 0.55039 -23.53 <.0001 19.62487 E 1 31.31051 1.24771 25.09 <.0001 0.31320 E_2 2nd power of E 1 -17.32873 0.68084 -25.45 <.0001 34.56387

Extras • There are more advanced ways of dealing with polynomial regression and transformations, which we do not address in STA671. • Polynomial regression can be extended to handle more general curved models, such as splines (piecewise polynomials with desirable smoothness properties) • Transformation can be selected automatically by using something called a Box-Cox transformation, which automatically determines the appropriate exponent to transform your data (with a tradeoff of some interpretability). • Consult your friendly local statistician.

Polynomial Regression and Transformations