Loading in 5 sec....

Stat 112: Lecture 10 NotesPowerPoint Presentation

Stat 112: Lecture 10 Notes

- 117 Views
- Uploaded on
- Presentation posted in: General

Stat 112: Lecture 10 Notes

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Fitting Curvilinear Relationships
- Polynomial Regression (Ch. 5.2.1)
- Transformations (Ch. 5.2.2-5.2.4)

- Schedule:
- Homework 3 due on Thursday.
- Quiz 2 next

- Reconsider the simple regression problem of estimating the conditional mean of y given x,
- For many problems, is not linear.
- Linear regression model makes restrictive assumption that increase in mean of y|x for a one unit increase in x equals
- Curvilinear relationship: is a curve, not a straight line; increase in mean of y|x is not the same for all x.

- Data on average corn yield and rainfall in six U.S. states (1890-1927), cornyield.JMP

- Weekly wages and age of 200 randomly chosen males between ages 18 and 70 from the 1998 March Current Population Survey

- A large chain of liquor stores would like to know how much display space in its stores to devote to a new wine. It collects sales and display space data from 47 of its stores.

- Add powers of x as additional explanatory variables in a multiple regression model.
- Often is used in the place of x.
This does not affect the that is obtained from the multiple regression model.

- Quadratic model (K=2) is often sufficient.

- Two ways to fit model:
- Create variables . Use fit model with variables
- Use Fit Y by X. Click on red triangle next to Bivariate Analysis … and click Fit Polynomial instead of the usual Fit Line . This method produces nicer plots.

- The usual interpretation of multiple regression coefficients doesn’t make sense in polynomial regresssion.
- We can’t hold x fixed and change .
- Effect of increasing x by one unit depends on the starting x=x*

- Is it necessary to include a kth order term ?
- Test vs.
- Choose largest k so that test still rejects (at 0.05 level)
- If we use , always keep the lower order terms in the model.
- For corn yield data, use K=2 polynomial regression model.
- For income data, use K=2 polynomial regression model

- Curvilinear relationship: E(Y|X) is not a straight line.
- Another approach to fitting curvilinear relationships is to transform Y or x.
- Transformations: Perhaps E(f(Y)|g(X)) is a straight line, where f(Y) and g(X) are transformations of Y and X, and a simple linear regression model holds for the response variable f(Y) and explanatory variable g(X).

Y=Life Expectancy in 1999

X=Per Capita GDP (in US

Dollars) in 1999

Data in gdplife.JMP

Linearity assumption of simple

linear regression is clearly violated.

The increase in mean life

expectancy for each additional dollar

of GDP is less for large GDPs than

Small GDPs. Decreasing returns to

increases in GDP.

The mean of Life Expectancy | Log Per Capita appears to be approximately

a straight line.

- Testing for association between Y and X: If the simple linear regression model holds for f(Y) and g(X), then Y and X are associated if and only if the slope in the regression of f(Y) and g(X) does not equal zero. P-value for test that slope is zero is <.0001: Strong evidence that per capita GDP and life expectancy are associated.
- Prediction and mean response: What would you predict the life expectancy to be for a country with a per capita GDP of $20,000?

- Tukey’s Bulging Rule.
- See Handout.
- Match curvature in data to the shape of one of the curves drawn in the four quadrants of the figure in the handout. Then use the associated transformations, selecting one for either X, Y or both.

- Use Tukey’s Bulging rule (see handout) to determine transformations which might help.
- After Fit Y by X, click red triangle next to Bivariate Fit and click Fit Special. Experiment with transformations suggested by Tukey’s Bulging rule.
- Make residual plots of the residuals for transformed model vs. the original X by clicking red triangle next to Transformed Fit to … and clicking plot residuals. Choose transformations which make the residual plot have no pattern in the mean of the residuals vs. X.
- Compare different transformations by looking for transformation with smallest root mean square error on original y-scale. If using a transformation that involves transforming y, look at root mean square error for fit measured on original scale.

By looking at the root mean square error on the original y-scale, we see that

all of the transformations improve upon the untransformed model and that the

transformation to log x is by far the best.

The transformation to Log X appears to have mostly removed a trend in the mean

of the residuals. This means that . There is still a

problem of nonconstant variance.

- In comparing two transformations, use transformation with lower RMSE, using the fit measured on the original scale if y was transformed on the original y-scale [this is equivalent to choosing the transformation with the higher or ]
- In comparing transformations to polynomial regression models, compare of best transformation to best polynomial regression model (selected using the criterion on slide 10).
- If the transfomation’s is close to (e.g., within .01) but not as high as the polynomial regression’s, it is still reasonable to use the transformation on the grounds of parsimony.

- Problem with : it never decreases even if we add useless variables.
- . This can decrease if useless variables are added.
- Useful for comparing regression models with different numbers of variables. No longer represents proportion of variation in y explained by multiple regression line.
- Found under Summary of Fit in JMP.

Fourth order polynomial is the best polynomial regression model

using the criterion on slide 10

Fourth order polynomial is the best model – it has the highest

- Two methods for fitting regression models for curvilinear relationships:
- Polynomial Regression
- Transformations