1 / 51

A (second-order) multiple regression model with interaction terms

A (second-order) multiple regression model with interaction terms. A example in which the predictors do not interact. Is baby’s birth weight related to smoking during pregnancy?. Sample of n = 32 births Response ( y ): birth weight in grams of baby

quon-bell
Download Presentation

A (second-order) multiple regression model with interaction terms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A (second-order) multiple regression model with interaction terms

  2. A example in which the predictors do not interact

  3. Is baby’s birth weight related to smoking during pregnancy? • Sample of n = 32 births • Response (y): birth weight in grams of baby • Potential predictor (x1): smoking status of mother (yes or no) • Potential predictor (x2): length of gestation in weeks

  4. and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. A first order modelwith one binary predictor • where … • yi is birth weight of baby i • xi1 is length of gestation of baby i • xi2 = 1, if mother smokes and xi2 = 0, if not

  5. Estimated first order modelwith one binary predictor The regression equation is Weight = - 2390 + 143 Gest - 245 Smoking

  6. In what way do the predictors have an “additive effect” on the response? • The effect of smoking on the mean birth weight is the same for all gestation lengths. (Exhibited by parallel lines.) • The effect of gestation length on the mean birth weight is the same for smokers and non-smokers. (Exhibited by parallel lines.)

  7. What are “additive effects”? A regression model contains additive effects if the response function can be written as a sum of functions of the predictor variables: For example:

  8. An example where including “interaction terms” is appropriate

  9. Compare three treatments (A, B, C) for severe depression • Random sample of n = 36 severely depressed individuals. • y = measure of treatment effectiveness • x1 = age (in years) • x2 = 1 if patient received A and 0, if not • x3 = 1 if patient received B and 0, if not

  10. Compare three treatments (A, B, C) for severe depression

  11. A model with interaction terms • where … • yi is treatment effectiveness for patient i • xi1 is age of patient i • xi2 = 1, if treatment A and xi2 = 0, if not • xi3 = 1, if treatment B and xi3 = 0, if not

  12. In what way do the predictors have an “interaction effect” on the response? If patient received A (xi2 = 1, xi3 = 0): If patient received B (xi2 = 0, xi3 = 1): If patient received C (xi2 = 0, xi3 = 0):

  13. In what way do the predictors have an “interaction effect” on the response? • The effect of treatment on the treatment’s effectiveness depends on the individual’s age. (Exhibited by non-parallel lines.) • The effect of the individual’s age on the treatment’s effectiveness depends on the treatment. (Exhibited by non-parallel lines.)

  14. What does it mean for two predictors “to interact”? • In general, two predictors interactif the effect on the response variable of one predictor depends on the value of the other. • A slope parameter can no longer be interpreted as the change in the mean response for each unit increase in the predictor, while the other predictors are held constant.

  15. What are “interaction effects”? A regression model contains interaction effects if the response function cannot be written as a sum of functions of the predictor variables: For example:

  16. The estimated regression function The regression equation is y = 6.21 + 1.03age + 41.3x2 + 22.7x3 - 0.703agex2 - 0.510agex3 If patient received A (xi2 = 1, xi3 = 0): If patient received B (xi2 = 0, xi3 = 1): If patient received C (xi2 = 0, xi3 = 0):

  17. The estimated regression function

  18. Recall the appropriateregression analysis steps • Model building • Model formulation • Model estimation • Model evaluation • Model use

  19. Residuals versus fits plot

  20. Normal probability plot

  21. Is there a difference in the mean effectiveness for the three treatments? If patient received A (xi2 = 1, xi3 = 0): If patient received B (xi2 = 0, xi3 = 1): If patient received C (xi2 = 0, xi3 = 0):

  22. F distribution with 4 DF in numerator and 30 DF in denominator x P( X <= x ) 24.4900 1.0000 Test for identical regression functions Analysis of Variance Source DF SS MS F P Regression 5 4932.85 986.57 64.04 0.000 Residual Error 30 462.15 15.40 Total 35 5395.00 Source DF Seq SS age 1 3424.43 x2 1 803.80 x3 1 1.19 agex2 1 375.00 agex3 1 328.42

  23. Does the effect of age on the treatment’s effectiveness depend on treatment? If patient received A (xi2 = 1, xi3 = 0): If patient received B (xi2 = 0, xi3 = 1): If patient received C (xi2 = 0, xi3 = 0):

  24. F distribution with 2 DF in numerator and 30 DF in denominator x P( X <= x ) 22.8400 1.0000 Test for significant interaction Analysis of Variance Source DF SS MS F P Regression 5 4932.85 986.57 64.04 0.000 Residual Error 30 462.15 15.40 Total 35 5395.00 Source DF Seq SS age 1 3424.43 x2 1 803.80 x3 1 1.19 agex2 1 375.00 agex3 1 328.42

  25. Another example A model with one qualitative predictor and two quantitative predictors

  26. Bird breathing habits in burrows? • Experiment with n = 120 nestling bank swallows and n = 120 adult bank swallows • Response (y): % increase in “minute ventilation”, Vent, i.e., total volume of air breathed per minute • Potential predictor (x1): percentage of oxygen, O2, in the air the baby birds breathe • Potential predictor (x2): percentage of carbon dioxide, CO2, in the air the baby birds breathe • Potential predictor (x3): 1 if adult, 0 if baby

  27. Primary research question • Is there any evidence that the adult birds differ from the baby birds in terms of their minute ventilation as a function of oxygen and carbon dioxide?

  28. and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. A formulated model • where … • yi is percentage of minute ventilation • xi1 is percentage of oxygen • xi2 is percentage of carbon dioxide • xi3 is type of bird (0, if baby and 1, if adult)

  29. An aside An example that illustrates the impact of leaving a necessary interaction term out of the model

  30. Suggests x is related to y?Suggests there is a treatment effect?

  31. and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. A formulated model • where … • yi is the response • xi1 is the variable you want to “adjust for” • xi2 is treatment (0 or 1)

  32. Is x related to y?Is there a treatment effect? The regression equation is y = 4.55 - 0.028 x + 1.10 group Predictor Coef SE Coef T P Constant 4.5492 0.8665 5.25 0.000 x -0.0276 0.1288 -0.21 0.831 group 1.0959 0.7056 1.55 0.125 ... Analysis of Variance Source DF SS MS F P Regression 2 23.255 11.628 1.23 0.298 Residual Error 73 690.453 9.458 Total 75 713.709 Source DF Seq SS x* 1 0.435 group 1 22.820

  33. The estimated regression functions

  34. The residuals versus fits plot

  35. and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. A more appropriately formulated model • where … • yi is the response • xi1 is the variable you want to “adjust for” • xi2 is treatment (0 or 1) • xi1 xi2 is the “missing” interaction term

  36. The estimated regression function The regression equation is y = 10.1 - 1.04 x - 10.1 group + 2.03 groupx

  37. The residuals versus fits plot

  38. Is x related to y?Is there a treatment effect? The regression equation is y = 10.1 - 1.04 x - 10.1 group + 2.03 groupx Predictor Coef SE Coef T P Constant 10.1401 0.4320 23.47 0.000 x -1.04416 0.07031 -14.85 0.000 group -10.0859 0.6110 -16.51 0.000 groupx 2.03307 0.09944 20.45 0.000 S = 1.187 R-Sq = 85.8% R-Sq(adj) = 85.2% Analysis of Variance Source DF SS MS F P Regression 3 612.26 204.09 144.84 0.000 Residual Error 72 101.45 1.41 Total 75 713.71

  39. Back to the bird example

  40. and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2. A more appropriately formulated model • where … • yi is percentage of minute ventilation • xi1 is percentage of oxygen • xi2 is percentage of carbon dioxide • xi3 is type of bird (0, if baby and 1, if adult)

  41. The model yields two response functions For baby birds (xi3 = 0): For adult birds (xi3 = 1):

  42. Is there a significant interaction between type and O2? between type and CO2? between O2 and CO2? The regression equation is Vent = - 18 + 1.19 O2 + 54.3 CO2 + 112 Type - 7.01 TypeO2 + 2.31 TypeCO2 - 1.45 CO2O2 Predictor Coef SE Coef T P Constant -18.4 160.0 -0.11 0.909 O2 1.189 9.854 0.12 0.904 CO2 54.28 25.99 2.09 0.038 Type 111.7 157.7 0.71 0.480 TypeO2 -7.008 9.560 -0.73 0.464 TypeCO2 2.311 7.126 0.32 0.746 CO2O2 -1.449 1.593 -0.91 0.364 S = 165.6 R-Sq = 27.2% R-Sq(adj) = 25.3%

  43. Is there a significant interaction between type and O2? between type and CO2? between O2 and CO2? Analysis of Variance Source DF SS MS F P Regression 6 2387540 397923 14.51 0.000 Residual Error 233 6388603 27419 Total 239 8776143 Source DF Seq SS O2 1 93651 CO2 1 2247696 Type 1 5910 TypeO2 1 14735 TypeCO2 1 2884 CO2O2 1 22664

  44. The residual versus fits plot

  45. Plot for adult birds

  46. Plot for baby birds

  47. Is there any evidence that the adult birds differ from the baby birds? The regression equation is Vent = 137 - 8.83 O2 + 32.3 CO2 + 9.9 Type Predictor Coef SE Coef T P Constant 136.77 79.33 1.72 0.086 O2 -8.834 4.765 -1.85 0.065 CO2 32.258 3.551 9.08 0.000 Type 9.93 21.31 0.47 0.642

  48. The residuals versus fits plot

  49. The normal probability plot

  50. Cost of including unnecessary terms in the model For model with interaction terms: Analysis of Variance Source DF SS MS F P Regression 6 2387540 397923 14.51 0.000 Residual Error 233 6388603 27419 Total 239 8776143 For model with no interaction terms: Analysis of Variance Source DF SS MS F P Regression 3 2347257 782419 28.72 0.000 Residual Error 236 6428886 27241 Total 239 8776143

More Related