Chapter 12 multiple regression
Download
1 / 141

Chapter 12 Multiple Regression - PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on

Chapter 12 Multiple Regression. Learn…. T o use Multiple Regression Analysis to predict a response variable using more than one explanatory variable. Section 12.1. How Can We Use Several Variables to Predict a Response?. Regression Models.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 12 Multiple Regression' - ryu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 12 multiple regression
Chapter 12Multiple Regression

  • Learn….

    To use Multiple Regression Analysis to predict a response variable using more than one explanatory variable.


Section 12 1
Section 12.1

How Can We Use Several Variables to Predict a Response?


Regression models
Regression Models

  • The model that contains only two variables, x and y, is called a bivariate model


Regression models1
Regression Models

  • The regression equation for the bivariate model is:


Regression models2
Regression Models

  • Suppose there are two predictors, denoted by x1 and x2

  • This is called a multiple regression model


Regression models3
Regression Models

  • The regression equation for this multiple regression model with two predictors is:


Multiple regression model
Multiple Regression Model

  • The multiple regression model relates the mean µy of a quantitative response variable y to a set of explanatory variables x1, x2,….


Multiple regression model1
Multiple Regression Model

  • Example: For three explanatory variables, the multiple regression equation is:


Multiple regression model2
Multiple Regression Model

  • Example: The sample prediction equation with three explanatory variables is:


Example predicting selling price using house and lot size
Example: Predicting Selling Price Using House and Lot Size

  • The data set “house selling prices” contains observations on 100 home sales in Florida in November 2003

  • A multiple regression analysis was done with selling price as the response variable and with house size and lot size as the explanatory variables



Example predicting selling price using house and lot size2
Example: Predicting Selling Price Using House and Lot Size

  • Prediction Equation:

    where y = selling price, x1=house size and x2 = lot size


Example predicting selling price using house and lot size3
Example: Predicting Selling Price Using House and Lot Size

  • One house listed in the data set had house size = 1240 square feet, lot size = 18,000 square feet and selling price = $145,000

  • Find its predicted selling price:


Example predicting selling price using house and lot size4
Example: Predicting Selling Price Using House and Lot Size

  • Find its residual:

  • The residual tells us that the actual selling price was $37,724 higher than predicted


The number of explanatory variables
The Number of Explanatory Variables

  • You should not use many explanatory variables in a multiple regression model unless you have lots of data

  • A rough guideline is that the sample size n should be at least 10 times the number of explanatory variables


Plotting relationships
Plotting Relationships

  • Always look at the data before doing a multiple regression

  • Most software has the option of constructing scatterplots on a single graph for each pair of variables

    • This is called a scatterplot matrix



Interpretation of multiple regression coefficients
Interpretation of Multiple Regression Coefficients

  • The simplest way to interpret a multiple regression equation looks at it in two dimensions as a function of a single explanatory variable

  • We can look at it this way by fixing values for the other explanatory variable(s)


Interpretation of multiple regression coefficients1
Interpretation of Multiple Regression Coefficients

Example using the housing data:

  • Suppose we fix x1 = house size at 2000 square feet

  • The prediction equation becomes:


Interpretation of multiple regression coefficients2
Interpretation of Multiple Regression Coefficients

  • Since the slope coefficient of x2 is 2.84, the predicted selling price for 2000 square foot houses increases by $2.84 for every square foot increase in lot size

  • For a 1000 square-foot increase in lot size, the predicted selling price of 2000 sq. ft. houses increases by 1000(2.84) = $2840


Interpretation of multiple regression coefficients3
Interpretation of Multiple Regression Coefficients

Example using the housing data:

  • Suppose we fix x2 = lot size at 30,000 square feet

  • The prediction equation becomes:


Interpretation of multiple regression coefficients4
Interpretation of Multiple Regression Coefficients

  • Since the slope coefficient of x1 is 53.8, the predicted selling price for houses with a lot size of 30,000 sq. ft. increases by $53.80 for every square foot increase in house size


Interpretation of multiple regression coefficients5
Interpretation of Multiple Regression Coefficients

  • In summary, an increase of a square foot in house size has a larger impact on the selling price ($53.80) than an increase of a square foot in lot size ($2.84)

  • We can compare slopes for these explanatory variables because their units of measurement are the same (square feet)

  • Slopes cannot be compared when the units differ


Summarizing the effect while controlling for a variable
Summarizing the Effect While Controlling for a Variable

  • The multiple regression model assumes that the slope for a particular explanatory variable is identical for all fixed values of the other explanatory variables


Summarizing the effect while controlling for a variable1
Summarizing the Effect While Controlling for a Variable

  • For example, the coefficient of x1 in the prediction equation:

    is 53.8 regardless of whether we plug in x2 = 10,000 or x2 = 30,000 or x2 = 50,000



Slopes in multiple regression and in bivariate regression
Slopes in Multiple Regression and in Bivariate Regression

  • In multiple regression, a slope describes the effect of an explanatory variable while controlling effects of the other explanatory variables in the model


Slopes in multiple regression and in bivariate regression1
Slopes in Multiple Regression and in Bivariate Regression

  • Bivariate regression has only a single explanatory variable

  • A slope in bivariate regression describes the effect of that variable while ignoring all other possible explanatory variables


Importance of multiple regression
Importance of Multiple Regression

  • One of the main uses of multiple regression is to identify potential lurking variables and control for them by including them as explanatory variables in the model


For all students at Walden Univ., the prediction equation for y = college GPA and x1= H.S. GPA and x2= study time is:

  • Find the predicted college GPA of a student who has a H.S. GPA of 3.5 and who studies 3 hrs. per day.

  • 3.67

  • 3.005

  • 3.175

  • 3.4


For all students at Walden Univ., the prediction equation for y = college GPA and x1= H.S. GPA and x2= study time is:

  • For students with fixed study time, what is the change in predicted college GPA when H.S. GPA increases from 3.0 to 4.0?

  • 1.13

  • 0.0078

  • 0.643

  • 1.00


Section 12 2
Section 12.2 for y = college GPA and x

Extending the Correlation and R-Squared for Multiple Regression


Multiple correlation
Multiple Correlation for y = college GPA and x

  • To summarize how well a multiple regression model predicts y, we analyze how well the observed y values correlate with the predicted y values

  • The multiple correlation is the correlation between the observed y values and the predicted y values

    • It is denoted by R


Multiple correlation1
Multiple Correlation for y = college GPA and x

  • For each subject, the regression equation provides a predicted value

  • Each subject has an observed y-value and apredicted y-value


Multiple correlation2
Multiple Correlation for y = college GPA and x

  • The correlation computed between all pairs of observed y-values and predicted y-values is the multiple correlation, R

  • The larger the multiple correlation, the better are the predictions of y by the set of explanatory variables


Multiple correlation3
Multiple Correlation for y = college GPA and x

  • The R-value always falls between 0 and 1

  • In this way, the multiple correlation ‘R’ differs from the bivariate correlation ‘r’ between y and a single variable x, which falls between -1 and +1


R squared
R-squared for y = college GPA and x

  • For predicting y, the square of R describes the relative improvement from using the prediction equation instead of using the sample mean, y


R squared1
R-squared for y = college GPA and x

  • The error in using the prediction equation to predict y is summarized by the residual sum of squares:


R squared2
R-squared for y = college GPA and x

  • The error in using y to predict y is summarized by the total sum of squares:


R squared3
R-squared for y = college GPA and x

  • The proportional reduction in error is:


R squared4
R-squared for y = college GPA and x

  • The better the predictions are using the regression equation, the larger R2 is

  • For multiple regression, R2 is the square of the multiple correlation, R


Example how well can we predict house selling prices
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • For the 100 observations on y = selling price, x1 = house size, and x2 = lot size, a table, called the ANOVA (analysis of variance) table was created

  • The table displays the sums of squares in the SS column


Example how well can we predict house selling prices1
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • The R2value can be created from the sums of squares in the table


Example how well can we predict house selling prices2
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • Using house size and lot size together to predict selling price reduces the prediction error by 71%, relative to using y alone to predict selling price


Example how well can we predict house selling prices3
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • Find and interpret the multiple correlation

  • There is a strong association between the observed and the predicted selling prices

  • House size and lot size very much help us to predict selling prices


Example how well can we predict house selling prices4
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • If we used a bivariate regression model to predict selling price with house size as the predictor, the r2 value would be 0.58

  • If we used a bivariate regression model to predict selling price with lot size as the predictor, the r2 value would be 0.51


Example how well can we predict house selling prices5
Example: How Well Can We Predict House Selling Prices? for y = college GPA and x

  • The multiple regression model has R2 0.71, so it provides better predictions than either bivariate model


Properties of r 2
Properties of R for y = college GPA and x2

  • The previous example showed that R2 for the multiple regression model was larger than r2 for a bivariate model using only one of the explanatory variables

  • A key factor of R2 is that it cannot decrease when predictors are added to a model


Properties of r 21
Properties of R for y = college GPA and x2

  • R2 falls between 0 and 1

  • The larger the value, the better the explanatory variables collectively predict y

  • R2 =1 only when all residuals are 0, that is, when all regression predictions are prefect

  • R2 = 0 when the correlation between y and each explanatory variable equals 0


Properties of r 22
Properties of R for y = college GPA and x2

  • R2 gets larger, or at worst stays the same, whenever an explanatory variable is added to the multiple regression model

  • The value of R2 does not depend on the units of measurement


R 2 values for various multiple regression models
R for y = college GPA and x2 Values for Various Multiple Regression Models


R 2 values for various multiple regression models1
R for y = college GPA and x2 Values for Various Multiple Regression Models

  • The single predictor in the data set that is most strongly associated with y is the house’s real estate tax assessment

    • (r2 = 0.679)

  • When we add house size as a second predictor, R2 goes up from 0.679 to 0.730

  • As other predictors are added, R2 continues to go up, but not by much


R 2 values for various multiple regression models2
R for y = college GPA and x2 Values for Various Multiple Regression Models

  • R2 does not increase much after a few predictors are in the model

  • When there are many explanatory variables but the correlations among them are strong, once you have included a few of them in the model, R2 usually doesn’t increase muchmore when you add additional ones


R 2 values for various multiple regression models3
R for y = college GPA and x2 Values for Various Multiple Regression Models

  • This does not mean that the additional variables are uncorrelated with the response variable

  • It merely means that they don’t add much new power for predicting y, given the values of the predictors already in the model


In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age.

Their correlations with total body weight were:

Height: 0.745 Percent Body fat: 0.390 Age: -0.187

  • Which explanatory variable gives by itself the best prediction of weight?

  • Height

  • Percent body fat

  • Age


In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age.

Their correlations with total body weight were:

Height: 0.745 Percent Body fat: 0.390 Age: -0.187

  • With height as the sole predictor, what is r2?

  • .745

  • .555

  • .625

  • .825


In a data set used to predict body weight (in pounds), three predictors were used: height, percent body fat and age.

Their correlations with total body weight were:

Height: 0.745 Percent Body fat: 0.390 Age: -0.187

  • If Percent Body Fat is added to the model R2 = 0.66. If Age is then added to the model R2=0.67. Once you know height and % body fat, does age seem to help in predicting weight?

  • No

  • Yes


Section 12 3
Section 12.3 predictors were used: height, percent body fat and age.

How Can We Use Multiple Regression to Make Inferences?


Inferences about the population
Inferences about the Population predictors were used: height, percent body fat and age.

  • Assumptions required when using a multiple regression model to make inferences about the population:

    • The regression equation truly holds for the population means

    • This implies that there is a straight-line relationship between the mean of y and each explanatory variable, with the same slope at each value of the other predictors


Inferences about the population1
Inferences about the Population predictors were used: height, percent body fat and age.

  • Assumptions required when using a multiple regression model to make inferences about the population:

    • The data were gathered using randomization

    • The response variable y has a normal distribution at each combination of values of the explanatory variables, with the same standard deviation


Inferences about individual regression parameters
Inferences about Individual Regression Parameters predictors were used: height, percent body fat and age.

  • Consider a particular parameter, β1

  • If β1= 0, the mean of y is identical for all values of x1, at fixed values of the other explanatory variables

  • So, H0: β1= 0 states that y and x1 are statistically independent, controlling for the other variables

  • This means that once the other explanatory variables are in the model, it doesn’t help to have x1 in the model


Significance test about a multiple regression parameter
Significance Test about a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • Assumptions:

    • Each explanatory variable has a straight-line relation with µy with the same slope for all combinations of values of other predictors in the model

    • Data gathered with randomization

    • Normal distribution for y with same standard deviation at each combination of values of other predictors in model


Significance test about a multiple regression parameter1
Significance Test about a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • Hypotheses:

    • H0: β1= 0

    • Ha: β1≠ 0

    • When H0 is true, y is independent of x1, controlling for the other predictors


Significance test about a multiple regression parameter2
Significance Test about a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • Test Statistic:


Significance test about a multiple regression parameter3
Significance Test about a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • P-value: Two-tail probability from t- distribution of values larger than observed t test statistic (in absolute value)

    The t-distribution has:

    df = n – number of parameters in the regression equation


Significance test about a multiple regression parameter4
Significance Test about a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • Conclusion: Interpret P-value; compare to significance level if decision needed


Example what helps predict a female athlete s weight
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • The “College Athletes” data set comes from a study of 64 University of Georgia female athletes

  • The study measured several physical characteristics, including total body weight in pounds (TBW), height in inches (HGT), the percent of body fat (%BF) and age


Example what helps predict a female athlete s weight1
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • The results of fitting a multiple regression model for predicting weight using the other variables:


Example what helps predict a female athlete s weight2
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • Interpret the effect of age on weight in the multiple regression equation:


Example what helps predict a female athlete s weight3
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • The slope coefficient of age is -0.96

  • For athletes having fixed values for x1 and x2,the predicted weight decreases by 0.96 pounds for a 1-year increase in age, and the ages vary only between 17 and 23


Example what helps predict a female athlete s weight4
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • Run a hypothesis test to determine whether age helps to predict weight, if you already know height and percent body fat


Example what helps predict a female athlete s weight5
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • Assumptions:

    • The 64 female athletes were a convenience sample, not a random sample

    • Caution should be taken when making inferences about all female college athletes


Example what helps predict a female athlete s weight6
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • Hypotheses:

    • H0: β3= 0

    • Ha: β3≠ 0

  • Test statistic:


Example what helps predict a female athlete s weight7
Example: What Helps Predict a Female Athlete’s Weight? predictors were used: height, percent body fat and age.

  • P-value: This value is reported in the output as 0.14

  • Conclusion:

  • The P-value of 0.14 does not give much evidence against the null hypothesis that β3 = 0

    • Age does not significantly predict weight if we already know height and % body fat


Confidence interval for a multiple regression parameter
Confidence Interval for a Multiple Regression Parameter predictors were used: height, percent body fat and age.

  • A 95% confidence interval for a β slope parameter in multiple regression equals:

  • The t-score has:

    df = (n - # of parameters in the model)


Example what s plausible for the effect of age on weight
Example: What’s Plausible for the Effect of Age on Weight?

  • Construct and interpret a 95% CI for β3, the effect of age while controlling for height and % body fat


Example what s plausible for the effect of age on weight1
Example: What’s Plausible for the Effect of Age on Weight?

  • At fixed values of x1 and x2, we infer that the population mean of weight changes very little (and maybe not at all) for a 1-year increase in age

  • The confidence interval contains 0

    • Age may have no effect on weight, once we control for height and % body fat


Estimating variability around the regression equation
Estimating Variability Around the Regression Equation Weight?

  • A standard deviation parameter, σ, describes variability of the observations around the regression equation

  • Its sample estimate is:


Example estimating variability of female athletes weight
Example: Estimating Variability of Female Athletes’ Weight

  • Anova Table for the “college athletes” data set:


Example estimating variability of female athletes weight1
Example: Estimating Variability of Female Athletes’ Weight

  • For female athletes at particular values of height, % of body fat, and age, estimate the standard deviation of their weights

  • Begin by finding the Mean Square Error:

  • Notice that this value (102.2) appears in the MS column in the ANOVA table


Example estimating variability of female athletes weight2
Example: Estimating Variability of Female Athletes’ Weight

  • The standard deviation is:

  • This value is also displayed in the ANOVA table

  • For athletes with certain fixed values of height, % body fat, and age, the weights vary with a standard deviation of about 10 pounds


Example estimating variability of female athletes weight3
Example: Estimating Variability of Female Athletes’ Weight

  • If the conditional distributions of weight are approximately bell-shaped, about 95% of the weight values fall within about 2s = 20 pounds of the true regression line


Do the explanatory variables collectively have an effect
Do the Explanatory Variables Collectively Have an Effect? Weight

  • Example: With 3 predictors in a model, we can check this by testing:


Do the explanatory variables collectively have an effect1
Do the Explanatory Variables Collectively Have an Effect? Weight

  • The test statistic for H0 is denoted by F


Do the explanatory variables collectively have an effect2
Do the Explanatory Variables Collectively Have an Effect? Weight

  • When H0 is true, the expected value of the F test statistic is approximately 1

  • When H0 is false, F tends to be larger than 1

  • The larger the F test statistic, the stronger the evidence against H0


Summary of f test that all eta parameters 0
Summary of F Test That All Weightβeta Parameters = 0

  • Assumptions: Multiple regression equation holds, data gathered randomly, normal distribution for y with same standard deviation at each combination of predictors


Summary of f test that all eta parameters 01
Summary of F Test That All Weightβeta Parameters = 0

  • Test statistic:


Summary of f test that all eta parameters 02
Summary of F-Test That All Weightβeta Parameters = 0

  • P-value: Right-tail probability above observed F-test statistic value from F- distribution with:

    • df1 = number of explanatory variables

    • df2 = n – (number of parameters in regression equation)


Summary of f test that all eta parameters 03
Summary of F-Test That All Weightβeta Parameters = 0

  • Conclusion: The smaller the P-value, the stronger the evidence that at least one explanatory variable has an effect on y

    • If a decision is needed, reject H0 if P-value ≤ significance level, such as 0.05


Example the f test for predictors of athletes weight
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • For the 64 female college athletes, the regression model for predicting y = weight using x1 = height, x2 = % body fat and x3 = age is summarized in the ANOVA table on the next page


Example the f test for predictors of athletes weight1
Example: The WeightF-Test for Predictors of Athletes’ Weight


Example the f test for predictors of athletes weight2
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • Use the output in the ANOVA table to test the hypothesis:


Example the f test for predictors of athletes weight3
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • The observed F statistic is 40.48

  • The corresponding P-value is 0.000

  • We can reject H0at the 0.05 significance level

  • We conclude that at least one predictor has an effect on weight


Example the f test for predictors of athletes weight4
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • The F-test tells us that at least one explanatory variable has an effect

  • If the explanatory variables are chosen sensibly, at least one should have some predictive power

  • The F-test result tells us whether there is sufficient evidence to make it worthwhile to consider the individual effects, using t-tests


Example the f test for predictors of athletes weight5
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • The individual t-tests identify which of the variables are significant (controlling for the other variables)


Example the f test for predictors of athletes weight6
Example: The WeightF-Test for Predictors of Athletes’ Weight

  • If a variable turns out not to be significant, it can be removed from the model

  • In this example, ‘age’ can be removed from the model


Section 12 4
Section 12.4 Weight

Checking a Regression Model Using Residual Plots


Assumptions for inference with a multiple regression model
Assumptions for Inference with a Multiple Regression Model Weight

  • The regression equation approximates well the true relationship between the predictors and the mean of y

  • The data were gathered randomly

  • y has a normal distribution with the same standard deviation at each combination of predictors


Checking shape and detecting unusual observations
Checking Shape and Detecting Unusual Observations Weight

  • To test Assumption 3 (the conditional distribution of y is normal at any fixed values of the explanatory variables):

    • Construction a histogram of the standardized residuals

    • The histogram should be approximately bell-shaped

    • Nearly all the standardized residuals should fall between -3 and +3. Any residual outside these limits is a potential outlier


Example residuals for house selling price
Example: Residuals for House Selling Price Weight

  • For the house selling price data, a MINITAB histogram of the standardized residuals for the multiple regression model predicting selling price by the house size and the lot size was created and is displayed on the following page



Example residuals for house selling price2
Example: Residuals for House Selling Price Weight

  • The residuals are roughly bell shaped about 0

  • They fall between about -3 and +3

  • No severe nonnormality is indicated


Plotting residuals against each explanatory variable
Plotting Residuals against Each Explanatory Variable Weight

  • Plots of residuals against each explanatory variable help us check for potential problems with the regression model

  • Ideally, the residuals should fluctuate randomly about 0

  • There should be no obvious change in trend or change in variation as the values of the explanatory variable increases



Section 12 5
Section 12.5 Weight

How Can Regression Include Categorical Predictors?


Indicator variables
Indicator Variables Weight

  • Regression models specify categories of a categorical explanatory variable using artificial variables, called indicator variables

  • The indicator variable for a particular category is binary

    • It equals 1 if the observation falls into that category and it equals 0 otherwise


Indicator variables1
Indicator Variables Weight

  • In the house selling prices data set, the city region in which a house is located is a categorical variable

  • The indicator variable x for region is

    • x = 1 if house is in NW (northwest region)

    • x = 0 if house is not in NW


Indicator variables2
Indicator Variables Weight

  • The coefficient β of the indicator variable x is the difference between the mean selling prices for homes in the NW and for homes not in the NW


Example including region in regression for house selling price
Example: Including Region in Regression for House Selling Price

  • Output from the regression model for selling price of home using house size and region


Example including region in regression for house selling price1
Example: Including Region in Regression for House Selling Price

  • Find and plot the lines showing how predicted selling price varies as a function of house size, for homes in the NW and for homes no in the NW


Example including region in regression for house selling price2
Example: Including Region in Regression for House Selling Price

  • The regression equation from the MINITAB output is:


Example including region in regression for house selling price3
Example: Including Region in Regression for House Selling Price

  • For homes not in the NW, x2 = 0

  • The prediction equation then simplifies to:


Example including region in regression for house selling price4
Example: Including Region in Regression for House Selling Price

  • For homes in the NW, x2 = 1

  • The prediction equation then simplifies to:



Example including region in regression for house selling price6
Example: Including Region in Regression for House Selling Price

  • Both lines have the same slope, 78

  • For homes in the NW and for homes not in the NW, the predicted selling price increases by $78 for each square-foot increase in house size

  • The figure portrays a separate line for each category of region (NW, not NW)


Example including region in regression for house selling price7
Example: Including Region in Regression for House Selling Price

  • The coefficient of the indicator variable is 30569

  • For any fixed value of house size, we predict that the selling price is $30,569 higher for homes in the NW


Example including region in regression for house selling price8
Example: Including Region in Regression for House Selling Price

  • The line for homes in the NW is above the line for homes not in the NW

  • The predicted selling price is higher for homes in the NW

  • The P-value of 0.000 for the test for the coefficient of the indicator variable suggests that this difference is statistically significant


Is there interaction
Is There Interaction? Price

  • For two explanatory variables, interaction exists between them in their effects on the response variable when the slope of the relationship between µy and one of them changes as the value of the other changes



Section 12 6
Section 12.6 Price

How Can We Model a Categorical Response?


Modeling a categorical response variable
Modeling a Categorical Response Variable Price

  • When y is categorical, a different regression model applies, called a logistic regression


Examples of logistic regression
Examples of Logistic Regression Price

  • A voter’s choice in an election (Democrat or Republican), with explanatory variables: annual income, political ideology, religious affiliation, and race

  • Whether a credit card holder pays their bill on time (yes or no), with explanatory variables: family income and the number of months in the past year that the customer paid the bill on time


The logistic regression model
The Logistic Regression Model Price

  • Denote the possible outcomes for y as 0 and 1

  • Use the generic terms failure (for outcome = 0) and success (for outcome =1)

  • The population mean of the scores equals the population proportion of ‘1’ outcomes (successes)

    • That is, µy = p

  • The proportion, p, also represents the probability that a randomly selected subject has a success outcome


The logistic regression model1
The Logistic Regression Model Price

  • The straight-line model is usually inadequate

  • A more realistic model has a curved S-shape instead of a straight-line trend


The logistic regression model2
The Logistic Regression Model Price

  • A regression equation for an S-shaped curve for the probability of success p is:


Example annual income and having a travel credit card
Example: Annual Income and Having a Travel Credit Card Price

  • An Italian study with 100 randomly selected Italian adults considered factors that are associated with whether a person possesses at least one travel credit card

  • The table on the next page shows results for the first 15 people on this response variable and on the person’s annual income (in thousands of euros)



Example annual income and having a travel credit card2
Example: Annual Income and Having a Travel Credit Card Price

  • Let x = annual income and let y = whether the person possesses a travel credit card (1 = yes, 0 = no)


Example annual income and having a travel credit card3
Example: Annual Income and Having a Travel Credit Card Price

  • Substituting the α and β estimates into the logistic regression model formula yields:


Example annual income and having a travel credit card4
Example: Annual Income and Having a Travel Credit Card Price

  • Find the estimated probability of possessing a travel credit card at the lowest and highest annual income levels in the sample, which were x = 12 and x = 65


Example annual income and having a travel credit card5
Example: Annual Income and Having a Travel Credit Card Price

  • For x = 12 thousand euros, the estimated probability of possessing a travel credit card is:


Example annual income and having a travel credit card6
Example: Annual Income and Having a Travel Credit Card Price

  • For x = 65 thousand euros, the estimated probability of possessing a travel credit card is:


Example annual income and having a travel credit card7
Example: Annual Income and Having a Travel Credit Card Price

  • Annual income has a strong positive effect on having a credit card

  • The estimated probability of having a travel credit card changes from 0.09 to 0.97 as annual income changes over its range


Example estimating proportion of students who ve used marijuana
Example: Estimating Proportion of Students Who’ve Used Marijuana

  • A three-variable contingency table from a survey of senior high-school students in shown on the next page

  • The students were asked whether they had ever used: alcohol, cigarettes or marijuana



Example estimating proportion of students who ve used marijuana2
Example: Estimating Proportion of Students Who’ve Used Marijuana

  • Let y indicate marijuana use, coded: (1 = yes, 0 = no)

  • Let x1 be an indicator variable for alcohol use (1 = yes, 0 = no)

  • Let x2 be an indicator variable for cigarette use (1 = yes, 0 = no)



Example estimating proportion of students who ve used marijuana4
Example:Estimating Proportion of Students Who’ve Used Marijuana

  • The logistic regression prediction equation is:


Example estimating proportion of students who ve used marijuana5
Example: Estimating Proportion of Students Who’ve Used Marijuana

  • For those who have not used alcohol or cigarettes, x1= x2 = 0 and:


Example estimating proportion of students who ve used marijuana6
Example: Estimating Proportion of Students Who’ve Used Marijuana

  • For those who have used alcohol and cigarettes, x1= x2 = 1 and:


Example estimating proportion of students who ve used marijuana7
Example: Estimating Proportion of Students Who’ve Used Marijuana

  • The probability that students have tried marijuana seems to depend greatly on whether they’ve used alcohol and cigarettes