Class 10 tuesday oct 12
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Class 10: Tuesday, Oct. 12 PowerPoint PPT Presentation

Class 10: Tuesday, Oct. 12. Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals Transformations Upcoming: Thursday: Finish transformations, Example Regression Analysis Tuesday: Review for midterm

Download Presentation

Class 10: Tuesday, Oct. 12

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Class 10 tuesday oct 12

Class 10: Tuesday, Oct. 12

  • Hurricane data set, review of confidence intervals and hypothesis tests

  • Confidence intervals for mean response

  • Prediction intervals

  • Transformations

  • Upcoming:

    • Thursday: Finish transformations, Example Regression Analysis

    • Tuesday: Review for midterm

    • Thursday: Midterm

    • Fall Break!


Hurricane data

Hurricane Data

  • Is there a trend in the number of hurricanes in the Atlantic over time (possibly an increase because of global warming)?

  • hurricane.JMP contains data on the number of hurricanes in the Atlantic basin from 1950-1997.


Inferences for hurricane data

Inferences for Hurricane Data

  • Residual plots and normal quantile plots indicate that assumptions of linearity, constant variance and normality in simple linear regression model are reasonable.

  • 95% confidence interval for slope (change in mean hurricanes between year t and year t+1): (-0.086,0.012)

  • Hypothesis Test of null hypothesis that slope equals zero: test statistic = -1.52, p-value =0.13. We accept since p-value > 0.05. No evidence of a trend in hurricanes from 1950-1997.


Tue oct 12

  • Scale for interpreting p-values:

  • A large p-value is not strong evidence in favor of H0, it only shows that there is not strong evidence against H0.


Inference in regression

Inference in Regression

  • Confidence intervals for slope

  • Hypothesis test for slope

  • Confidence intervals for mean response

  • Prediction intervals


Car price example

Car Price Example

  • A used-car dealer wants to understand how odometer reading affects the selling price of used cars.

  • The dealer randomly selects 100 three-year old Ford Tauruses that were sold at auction during the past month. Each car was in top condition and equipped with automatic transmission, AM/FM cassette tape player and air conditioning.

  • carprices.JMP contains the price and number of miles on the odometer of each car.


Tue oct 12

  • The used-car dealer has an opportunity to bid on a lot of cars offered by a rental company. The rental company has 250 Ford Tauruses, all equipped with automatic transmission, air conditioning and AM/FM cassette tape players. All of the cars in this lot have about 40,000 miles on the odometer. The dealer would like an estimate of the average selling price of all cars of this type with 40,000 miles on the odometer, i.e., E(Y|X=40,000).

  • The least squares estimate is


Confidence interval for mean response

Confidence Interval for Mean Response

  • Confidence interval for E(Y|X=40,000): A range of plausible values for E(Y|X=40,000) based on the sample.

  • Approximate 95% Confidence interval:

  • Notes about formula for SE: Standard error becomes smaller as sample size n increases, standard error is smaller the closer is to

  • In JMP, after Fit Line, click red triangle next to Linear Fit and click Confid Curves Fit. Use the crosshair tool by clicking Tools, Crosshair to find the exact values of the confidence interval endpoints for a given X0.


A prediction problem

A Prediction Problem

  • The used-car dealer is offered a particular 3-year old Ford Taurus equipped with automatic transmission, air conditioner and AM/FM cassette tape player and with 40,000 miles on the odometer. The dealer would like to predict the selling price of this particular car.

  • Best prediction based on least squares estimate:


Range of selling prices for particular car

Range of Selling Prices for Particular Car

  • The dealer is interested in the range of selling prices that this particular car with 40,000 miles on it is likely to have.

  • Under simple linear regression model, Y|X follows a normal distribution with mean and standard deviation . A car with 40,000 miles on it will be in interval about 95% of the time.

  • Class 5: We substituted the least squares estimates for for and said car with 40,000 miles on it will be in interval about 95% of the time. This is a good approximation but it ignores potential error in least square estimates.


Prediction interval

Prediction Interval

  • 95% Prediction Interval: An interval that has approximately a 95% chance of containing the value of Y for a particular unit with X=X0 ,where the particular unit is not in the original sample.

  • Approximate 95% prediction interval:

  • In JMP, after Fit Line, click red triangle next to Linear Fit and click Confid Curves Indiv. Use the crosshair tool by clicking Tools, Crosshair to find the exact values of the prediction interval endpoints for a given X0.


A violation of linearity

A Violation of Linearity

Y=Life Expectancy in 1999

X=Per Capita GDP (in US

Dollars) in 1999

Data in gdplife.JMP

Linearity assumption of simple

linear regression is clearly violated.

The increase in mean life

expectancy for each additional dollar

of GDP is less for large GDPs than

Small GDPs. Decreasing returns to

increases in GDP.


Transformations

Transformations

  • Violation of linearity: E(Y|X) is not a straight line.

  • Transformations: Perhaps E(f(Y)|g(X)) is a straight line, where f(Y) and g(X) are transformations of Y and X, and a simple linear regression model holds for the response variable f(Y) and explanatory variable g(X).


Tue oct 12

The mean of Life Expectancy | Log Per Capita appears to be approximately

a straight line.


How do we use the transformation

How do we use the transformation?

  • Testing for association between Y and X: If the simple linear regression model holds for f(Y) and g(X), then Y and X are associated if and only if the slope in the regression of f(Y) and g(X) does not equal zero. P-value for test that slope is zero is <.0001: Strong evidence that per capita GDP and life expectancy are associated.

  • Prediction and mean response: What would you predict the life expectancy to be for a country with a per capita GDP of $20,000?


How do we choose a transformation

How do we choose a transformation?

  • Tukey’s Bulging Rule.

  • See Handout.

  • Match curvature in data to the shape of one of the curves drawn in the four quadrants of the figure in the handout. Then use the associated transformations, selecting one for either X, Y or both.


Transformations in jmp

Transformations in JMP

  • Use Tukey’s Bulging rule (see handout) to determine transformations which might help.

  • After Fit Y by X, click red triangle next to Bivariate Fit and click Fit Special. Experiment with transformations suggested by Tukey’s Bulging rule.

  • Make residual plots of the residuals for transformed model vs. the original X by clicking red triangle next to Transformed Fit to … and clicking plot residuals. Choose transformations which make the residual plot have no pattern in the mean of the residuals vs. X.

  • Compare different transformations by looking for transformation with smallest root mean square error on original y-scale. If using a transformation that involves transforming y, look at root mean square error for fit measured on original scale.


Tue oct 12

`

By looking at the root mean square error on the original y-scale, we see that

all of the transformations improve upon the untransformed model and that the

transformation to log x is by far the best.


Tue oct 12

The transformation to Log X appears to have mostly removed a trend in the mean

of the residuals. This means that . There is still a

problem of nonconstant variance.


  • Login