- 40 Views
- Uploaded on
- Presentation posted in: General

Lecture 6 Notes

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Note: I will e-mail homework 2 tonight. It will be due next Thursday.
- The Multiple Linear Regression model (Chapter 4.1)
- Inferences from multiple regression analysis (Chapter 4.2)
- In multiple regression analysis, we consider more than one independent variable x1,…,xK . We are interested in the conditional mean of y given x1,…,xK .

- A team charged with designing a new automobile is concerned about the gas mileage that can be achieved. The design team is interested in two things:
(1) Which characteristics of the design are likely to affect mileage?

(2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage.

- The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.

- To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate.
- If we use simple linear regression with each of the four independent variables, which provides the best predictions?

- Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that
- Weight gives the best predictions of GPM1000Hwy based on simple linear regression.
- But we can obtain better predictions by using more than one of the independent variables.

- Assumptions about :
- The expected value of the disturbances is zero for each ,
- The variance of each is equal to ,i.e.,
- The are normally distributed.
- The are independent.

- We use the same least squares procedure as for simple linear regression.
- Our estimates of are the coefficients that minimize the sum of squared prediction errors:
- Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.

- Estimate of :
- = Root Mean Square Error in JMP
- For simple linear regression of GP1000MHWY on Weight, . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,

- Residual for observation i = prediction error for observation i =
- Root mean square error = Typical size of absolute value of prediction error
- As with simple linear regression model, if multiple linear regression model holds
- About 95% of the observations will be within two RMSEs of their predicted value

- For car data, about 95% of the time, the actual GP1000M will be within 2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.

- Confidence intervals: confidence interval for :
Degrees of freedom for t equals n-(K+1). Standard error of , , found on JMP output.

- Hypothesis Test:
Decision rule for test: Reject H0 if or

where

p-value for testing is printed in JMP output under Prob>|t|.

- Find a 95% confidence interval for ?
- Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.

- Multiple Linear Regression Model:
- The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed.
- A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.

- In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years.
- Data is in rainfall.JMP.

Higher rainfall is associated with lower temperature.

Rainfall is estimated to be beneficial once temperature is held fixed.

Multiple regression provides a better picture of the benefits of

an irrigation scheme because temperature would be held fixed in

an irrigation scheme.