1 / 20

Lecture 22 – Thurs., Nov. 25

Lecture 22 – Thurs., Nov. 25. Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter 10.1-10.2). Nominal Variables. To incorporate nominal variables in multiple regression analysis, we use indicator variables.

Download Presentation

Lecture 22 – Thurs., Nov. 25

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22 – Thurs., Nov. 25 • Nominal explanatory variables (Chapter 9.3) • Inference for multiple regression (Chapter 10.1-10.2)

  2. Nominal Variables • To incorporate nominal variables in multiple regression analysis, we use indicator variables. • Indicator variable to distinguish between two groups • The time onset (early vs. late) is a nominal variable. To incorporate it into multiple regression analysis, we used indicator variable early which equals 1 if early, 0 if late.

  3. Nominal Variables with More than Two Categories • To incorporate nominal variables with more than two categories, we use multiple indicator variables. If there are k categories, we need k-1 indicator variables.

  4. Nominal Explanatory Variables Example: Auction Car Prices • A car dealer wants to predict the auction price of a car. • The dealer believes that odometer reading and the car color are variables that affect a car’s price (data from sample of cars in auctionprice.JMP) • Three color categories are considered: • White • Silver • Other colors • Note: Color is a nominal variable.

  5. Indicator Variables in Auction Car Prices 1 if the color is white 0 if the color is not white I1 = 1 if the color is silver 0 if the color is not silver I2 = The category “Other colors” is defined by: I1 = 0; I2 = 0

  6. Auction Car Price Model • Solution • the proposed model is • The data White car Other color Silver color

  7. Price 16996.48 - .0555(Odometer) 16791.48 - .0555(Odometer) 16701 - .0555(Odometer) Odometer Example: Auction Car Price The Regression Equation From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) The equation for a silver color car. Price = 16701 - .0555(Odometer) + 90.48(0) + 295.48(1) The equation for a white color car. Price = 16701 - .0555(Odometer) + 90.48(1) + 295.48(0) Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0) The equation for an “other color” car.

  8. Example: Auction Car Price The Regression Equation From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) For one additional mile the auction price decreases by 5.55 cents. A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category.

  9. There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Example: Auction Car Price The Regression Equation Xm18-02b

  10. Shorthand Notation for Nominal Variables • Shorthand Notation for regression model with Nominal Variables. Use all capital letters for nominal variables • Parallel Regression Lines model: • Separate Regression Lines model:

  11. Nominal Variables in JMP It is not necessary to create indicator variables yourself to represent a nominal variable. Nominal variables in JMP: • Make sure that the nominal variable’s modeling type is in fact nominal. • Include the nominal variable in the Construct Model Effects box in Fit Model • JMP will create indicator variables. The brackets indicate the category of the nominal variable for which the indicator variable is 1. • JMP will leave out the level which is highest alphabetically or numerically.

  12. Specially Constructed Explanatory Variables • Types of specially constructed explanatory variables: • Powers of variables • Products of variables (interactions) • Indicator variables to represent nominal variables • Transformations of variables (e.g., log) • Use matrix of pairwise scatterplots to initially examine the data and look for needed transformations, powers of variables.

  13. Inference for Multiple Regression • Chapter 10.2 • Tests for single coefficients • Confidence intervals for single coefficients • Confidence intervals for mean response at • Prediction intervals for • Chapter 10.3 • F-test for overall significance of regression • F-test for joint significance of several terms (will not cover)

  14. Case Study 10.1.2 • Question: Do echolocating bats expend more energy than nonecholocating bats after accounting for body size? • Data: Body mass and flight energy expenditure for 4 nonecholocating bats, 12 non-echolocating birds and 4 echolocating bats. • Strategy: Build a multiple regression model for mean energy expended as a function of type of flying vertebrate (echolocating bat, nonecholocating bat, nonecholocating bird) and body size . • Explore (resolve need for transformation) • Test for interaction • If no interaction, answer question with the three parallel lines model

  15. Coded Scatterplots • To construct a coded scatterplot, create columns energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat. The column energy nonecholocating bat should contain only the energies for nonecholocating bats and a blank for all other species. • Click graph, overlay plot, put energy nonecholocating bat, energy nonecholocating bird and energy echolocating bat in Y and mass in X.

  16. Coded Scatterplots

  17. Separate/Parallel Regression Lines Model • Separate regression lines model: • Parallel regression lines model:

  18. Inferences for Echolocating Bats • Is the parallel regression lines model appropriate? Test and • There is no evidence against the parallel regression lines model so we go ahead and use it to answer the question of interest – do echolocating bats use less energy than nonecholating bats of the same body size ( ) and nonecholocating birds of the same body size.( )

  19. Inferences for Echolocating Bats Cont. • No strong evidence that echolocating bats use less energy than either nonecholocating bats (p-value = 0.35) or nonecholocating birds (p-value = 0.77) of same body size. • 95% Confidence interval for difference in mean of log energy for nonecholocating bats and echolocating bats of same body size: (-0.51,0.35). • This means that 95% confidence interval for ratio of median energy for nonecholocating bats and echolocating bats of same body size is • Summary of findings: Although there is no strong evidence that echolocating bats use less energy than nonecholocating bats of same body size, it is still plausible that they use quite a less bit energy (60% as much at the median). Study is inconclusive.

  20. Prediction Intervals • To find a 95% prediction interval for the mean log energy of a flying vertebrate of a given type and mass, • Fit the multiple regression model • Click red triangle next to response log energy, click save columns, click predicted values and also click indiv confid interval. This saves the predicted values, lower 95% prediction interval endpoint and upper 95% prediction interval endpoint for each observation in data set. • To get prediction interval for X’s that are not in the data set, enter a row with those X’s and then exclude the observation.

More Related