1 / 41

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables. 8.1 The General Concept of Indicator Variables. Qualitative variables – also known as categorical variables. Qualitative variables do not have a scale of measurement.

jhatfield
Download Presentation

Chapter 8 Indicator Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining

  2. 8.1 The General Concept of Indicator Variables • Qualitative variables – also known as categorical variables. Qualitative variables do not have a scale of measurement. • Indicator variables – a variable that assigns levels to the qualitative variable (also known as dummy variables). Linear Regression Analysis 5E Montgomery, Peck & Vining

  3. 8.1 The General Concept of Indicator Variables Example • Relate the effective life of a cutting tool (y) used on a lathe to the lathe speed in revolutions per minute (x1) and type of cutting tool used. •  Tool type is qualitative and can be represented as: • If a first-order model is appropriate: Linear Regression Analysis 5E Montgomery, Peck & Vining

  4. 8.1 The General Concept of Indicator Variables Example For Tool type A this model becomes: For Tool type B this model becomes: • Changing from A to B induces a change in the intercept (slope is unchanged and identical). We assume that the variance is equal for all levels of the qualitative variable. Linear Regression Analysis 5E Montgomery, Peck & Vining

  5. 8.1 The General Concept of Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining

  6. 8.1 The General Concept of Indicator Variables • For qualitative variables with a levels, we would need a-1 indicator variables. For example, say there were three tool types, A, B, and C. Then two indicator variables (called x2 and x3) will be needed: Linear Regression Analysis 5E Montgomery, Peck & Vining

  7. Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining

  8. Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining

  9. Example 8.1 Tool Life Data • The model to be fit is where x2 = 0 indicates Tool type A, if x2 = 1 then Tool type B is used. • The least squares fit is Linear Regression Analysis 5E Montgomery, Peck & Vining

  10. Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining

  11. Example 8.1 Tool Life Data – residual analysis – see plot of residuals versus fitted values in Fig. 8.3 (slide 8) and normal probability plot below: Linear Regression Analysis 5E Montgomery, Peck & Vining

  12. 8.1 The General Concept of Indicator Variables • Two separate models could have been fit to the data. • However, the single-model approach is preferred because the analyst has only one final equation to work with instead of two, a much simpler practical result. • Furthermore, since both straight lines are assumed to have the same slope, it makes sense to combine the data from both tool types to produce a single estimate of this common parameter. Linear Regression Analysis 5E Montgomery, Peck & Vining

  13. 8.1 The General Concept of Indicator Variables Difference in Slope •  If we expect the slopes to differ, we can model this phenomenon by including an interaction term between the variables. • Consider the tool life data again, and say we believe there may be different slopes for the two tools. The model we can fit to account for the change in slope is: 8.4 Linear Regression Analysis 5E Montgomery, Peck & Vining

  14. 8.1 The General Concept of Indicator Variables Difference in Slope • If tool type A is used: • If tool type B is used: • Thus, the intercept has shifted and so has the slope. • 2 – change in the intercept caused by changing from type A to type B • 3 – change in the slope caused by changing from type A to type B Linear Regression Analysis 5E Montgomery, Peck & Vining

  15. Linear Regression Analysis 5E Montgomery, Peck & Vining

  16. 8.1 The General Concept of Indicator Variables • What we really have are two regression equations; one for tool type A and one for tool type B. If we wanted to test to determine if these two equations are the same, we can use the extra sum of squares method and conduct a test of hypothesis. H0: 2 = 3 = 0 vs. H1: 2  0 and/or 3 0 • The test statistic would be Linear Regression Analysis 5E Montgomery, Peck & Vining

  17. Example 8.2 The Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining

  18. Example 8.2 The Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining

  19. Example 8.3 An Indicator Variable with More Than Two Levels An electric utility is investigating the effect of the size of a single-family house and the type of air conditioning used in the house on the total electricity consumption during warm weather months. Linear Regression Analysis 5E Montgomery, Peck & Vining

  20. Example 8.3 An Indicator Variable with More Than Two Levels Linear Regression Analysis 5E Montgomery, Peck & Vining

  21. Example 8.3 An Indicator Variable with More Than Two Levels • In this problem it would seem unrealistic to assume that the slope of the regression function relating mean electricity consumption to the size of the house does not depend on the type of air conditioning system. • For example, we would expect the mean electricity consumption to increase with the size of the house, but the rate of increase should be different for a central air conditioning system than for window units because central air conditioning should be more efficient than window units for larger houses. Linear Regression Analysis 5E Montgomery, Peck & Vining

  22. Example 8.3 An Indicator Variable with More Than Two Levels • There should be an interaction between the size of the house and the type of air conditioning system. Linear Regression Analysis 5E Montgomery, Peck & Vining

  23. Example 8.3 An Indicator Variable with More Than Two Levels • The four regression models corresponding to the four types of air conditioning systems are as follows: Linear Regression Analysis 5E Montgomery, Peck & Vining

  24. Example 8.4 More than Two Indicator Variables • Suppose that in Example 8.1 a second qualitative factor, the type of cutting oil used, must be considered. • Assuming that this factor has two levels, we may define a second indicator variable, x3, as follows: Linear Regression Analysis 5E Montgomery, Peck & Vining

  25. Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining

  26. Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining

  27. Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining

  28. Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

  29. Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

  30. Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

  31. Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

  32. Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining

  33. 8.2 Some Comments on Indicator Variables • Try to avoid using a specific metric for the levels of the qualitative variable (avoid allocating codes). That is, it is dangerous to assign values to each level such as 1, 2, 3, and 4. Why? • You can substitute indicator variables for quantitative regressors. • Useful if accurate data cannot be readily attained. • Group the data into classes or intervals and then assign indicator variables. • Drawback: loss of information by not using the actual data. Quantitative information is often more useful than qualitative. Linear Regression Analysis 5E Montgomery, Peck & Vining

  34. 8.3 Regression Approach to Analysis of Variance • The analysis of varianceis a technique frequently used to analyze data from plannedor designed experiments. • Essentially, any analysis-of-variance problem can be treated as a regression problem in which all of the regressors are indicator variables. Linear Regression Analysis 5E Montgomery, Peck & Vining

  35. 8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining

  36. 8.3 Regression Approach to Analysis of Variance In the fixed-effects or model I case, the analysis of variance is used to test the hypothesis that all k population means are equal, or equivalently, Linear Regression Analysis 5E Montgomery, Peck & Vining

  37. 8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining

  38. 8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining

  39. 8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining

  40. 8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining

  41. 8.3 Regression Approach to Analysis of Variance • The analysis of varianceis a technique frequently used to analyze data from plannedor designed experiments. • Essentially, any analysis-of-variance problem can be treated as a regression problem in which all of the regressors are indicator variables. Linear Regression Analysis 5E Montgomery, Peck & Vining

More Related