1 / 38

Unit 6: The basics of multiple regression

Unit 6: The basics of multiple regression. The S-030 roadmap: Where’s this unit in the big picture?. Unit 1: Introduction to simple linear regression. Unit 2: Correlation and causality. Unit 3: Inference for the regression model. Building a solid foundation. Unit 5:

micah
Download Presentation

Unit 6: The basics of multiple regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unit 6: The basics of multiple regression

  2. The S-030 roadmap: Where’s this unit in the big picture? Unit 1: Introduction to simple linear regression Unit 2: Correlation and causality Unit 3: Inference for the regression model Building a solid foundation Unit 5: Transformations to achieve linearity Unit 4: Regression assumptions: Evaluating their tenability Mastering the subtleties Adding additional predictors Unit 6: The basics of multiple regression Unit 7: Statistical control in depth: Correlation and collinearity Generalizing to other types of predictors and effects Unit 9: Categorical predictors II: Polychotomies Unit 8: Categorical predictors I: Dichotomies Unit 10: Interaction and quadratic effects Pulling it all together Unit 11: Regression modeling in practice

  3. In this unit, we’re going to learn about… • Various representations of the multiple regression model: • An algebraic representation • A three-dimensional graphic representation • A two-dimensional graphic representation • Multiple regression—how it works and helps improve predictions • Estimating the parameters of the multiple regression model • Holding predictors constant—what does this really mean? • Plotting the fitted multiple regression model: • Deciding how to construct the plot • Choosing prototypical values • Learning how to actually construct the plot (and interpret it correctly!) • R2 and the Analysis of Variance (ANOVA) in multiple regression • Inference in multiple regression • The omnibus F-test in multiple regression • Individual t-tests • How might we summarize multiple regression results in tables/figures? • How do we test our regression assumptions?

  4. US News Peer Ratings of Graduate Schools of Education (GSEs) •       •       •       •      •       •       •       •       • RQs: What predicts Peer Ratings? • This unit (Unit 6) doctoral student characteristics • Next unit (Unit 7) faculty research productivity Learn more about the ratings at USNews.com (education school ratings methodology page)

  5. A first look at the data: Peer Ratings, mean GREs and N Doc Grads Outcome: Peer Rating The UNIVARIATE ProcedureVariable: PeerRat Mean 344.8276 Std Deviation 45.13190Median 340.0000 Variance 2036.88853Mode 300.0000 Range 190.00000 Interquartile Range 60.00000 n = 87 Ratings of US Graduate schools of education Doc Peer USNewsID School GRE Grad Rat Rat 1 Harvard 6.625 60 450 100 2 UCLA 5.780 53 410 97 3 Stanford 6.775 38 470 95 4 TC 6.045 193 440 92 5 Vanderbilt 6.605 22 430 88 6 Northwestern 6.770 10 390 83 7 Berkeley 6.050 43 440 82 8 Penn 6.040 61 380 82 9 Michigan 6.090 38 430 7910 Madison 5.800 106 430 7911 NYU 5.960 112 360 7712 MinneTC 5.750 89 390 7313 Oregon 6.115 39 340 7114 MichiganState 5.865 52 420 7015 Indiana 5.960 110 390 6916 UTAustin 5.865 102 400 6917 Washington 5.930 37 370 6818 Urbana 6.330 50 410 6719 USC 5.695 119 360 6720 BC 5.845 42 360 66... Stem Leaf # Boxplot 47 0 1 0 46 45 0 1 | 44 00 2 | 43 000 3 | 42 0 1 | 41 00 2 | 40 0000 4 | 39 00000 5 | 38 00 2 | 37 00 2 +-----+ 36 000000 6 | | 35 000000000 9 | | 34 000000 6 *--+--* 33 00000000 8 | | 32 000000000 9 | | 31 0000000 7 +-----+ 30 0000000000 10 | 29 000000 6 | 28 000 3 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1 Stanford HGSE TC, Berkeley RQs: What doctoral student characteristics predict variation in the peer ratings of GSEs? Question predictor: Is it quality (GRE scores)? Control predictor: Is it size (N doc grads)? St Johns, Cincinnati, USF

  6. Examining the predictors: Mean GRE scores and Number of doctoral grads Control Predictor: Number of doctoral graduates The UNIVARIATE ProcedureVariable: DocGrad Mean 45.67816 Std Deviation 33.03293Median 38.00000 Variance 1091.17428Mode 18.00000 Range 189.00000 Interquartile Range 40.00000 HGSE (60) VA Comm Cornell UC Davis UC Irvine Question Predictor: Mean GRE scores The UNIVARIATE ProcedureVariable: GRE Mean 5.578966 Std Deviation 0.42899Median 5.505000 Variance 0.18403Mode 5.210000 Range 2.03000 Interquartile Range 0.57000 Stem Leaf # Boxplot 67 78 2 0 66 02 2 | 65 4 1 | 64 | 63 3 1 | 62 | 61 2 1 | 60 04459 5 | 59 111366 6 | 58 04669 5 +-----+ 57 0004555688 10 | | 56 0225666 7 | | 55 000178 6 *--+--* 54 11244889 8 | | 53 3444556888 10 | | 52 0011122255788 13 +-----+ 51 | 50 1349 4 | 49 046 3 | 48 4 1 | 47 48 2 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Stem Leaf # Boxplot 19 3 1 * 18 17 16 15 14 1 1 0 13 12 11 029 3 | 10 26 2 | 9 6 1 | 8 669 3 | 7 016 3 | 6 0011345678 10 +-----+ 5 0112233589 10 | | 4 0122356 7 | + | 3 0023346678889 13 *-----* 2 01112344778899 14 +-----+ 1 001235567888899 15 | 0 4468 4 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1 Stanford, Northwestern TC HGSE, Vanderbilt Delaware Georgia Auburn St John’s, Illinois State

  7. Simple linear regression of Peer Ratings on mean GRE scores Effect is strong: 43.1% of the variation in ratings is associated with mean GRE scores Effect is large and statistically significant: Schools whose mean GRE scores are 100 points higher have peer ratings that are, on average, 69 points higher (p<0.0001) The REG ProcedureDependent Variable: PeerRat Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 1 75507 75507 64.40 <.0001Error 85 99665 1172.53242Corrected Total 86 175172 Root MSE 34.24226 R-Square 0.4310 Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 -40.51619 48.15953 -0.84 0.4025GRE 1 69.07083 8.60722 8.02 <.0001 Tentative conclusion: Student body quality has an effect (or at least it does not controlling for size)

  8. Simple linear regression of Peer Ratings on program size TC Down in X Effect is moderately strong: 21.5% of the variation in ratings is associated with number of doctoral graduates Effect is moderately large and statistically significant: Programs that are twice as large have peer ratings that are an average of 18.9 points higher (p<0.0001) The REG ProcedureDependent Variable: PeerRat Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 1 37702 37702 23.31 <.0001Error 85 137470 1617.29328Corrected Total 86 175172 Root MSE 40.21559 R-Square 0.2152 Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 247.62760 20.58800 12.03 <.0001L2Doc 1 18.90487 3.91546 4.83 <.0001 Conclusion: We should control for size when evaluating the effects of GRE scores

  9. From simple regression to multiple regression: Putting it all together More generally, let X1, X2, … Xk represent k predictors • How does multiple regression help us? • Simultaneous consideration of many contributing factors • We explain more of the variation in Y • More accurate predictions (so the residuals are smaller) • Provides a separate understanding of each predictor, controlling for the effects of other predictors in the model (that is, holding all these other predictors constant)

  10. What does the multiple regression model look like graphically? School GRE L2Doc PeerRat Cornell 5.62 2.00 350 Vcomm 5.39 2.00 300 UCIrvine 5.20 3.00 310 Small schools Boulder 5.91 3.58 350 SDState 5.01 4.25 320 GMU 5.62 4.39 320 StJohns 4.75 4.39 280 BU 5.61 4.52 340 UIChicago 5.33 4.58 350 Utah 5.22 4.86 320 IllState 4.79 5.04 290 Medium schools Stanford 6.78 5.25 470 BC 5.85 5.39 360 SUNYAlb 5.65 5.39 330 UNM 5.04 5.49 300 Uconn 5.89 5.70 340 MichSt 5.87 5.70 420 Uarizona 5.34 5.88 360 Harvard 6.62 5.91 450 Large schools Iowa 6.01 6.02 360 Madison 5.80 6.73 430 NYU 5.96 6.81 360 Georgia 5.49 7.14 380 TC 6.05 7.59 440 Let’s go 3D! n= 24 for display purposes Sorted by GRE (low to high) Sorted by L2Doc (low to high) Schools with same size but different GRE scores Schools with same GRE scores but different sizes Schools with same GRE scores AND size

  11. What does the multiple regression model look like graphically? Observations UNDER-predicted (blue) Fitted regression plane PEER Observations OVER-predicted (purple)

  12. Returning to Flatland, Part I: A 3D graph drawn in 2D (using perspective) 475 425 375 3 325 4 275 5 6 225 4.5 5 5.5 7 6 6.5 7 Note that this image has a different orientation than the one on the last slide PeerRat Ratings are higher, on average, in larger schools And this holds at each level of L2Doc (ie, holding L2Doc constant) And this holds at each level of GRE (ie, holding GRE constant) Ratings are higher, on average, in schools with higher GRE scores L2Doc GRE

  13. Returning to Flatland, Part II: Projecting the 3D graph back into 2D 475 425 375 4 3 325 3 5 4 275 6 5 7 6 225 4.5 5 5.5 7 6 6.5 7 PeerRat Each of these lines describes the effect of GRE at a given value of L2Doc—notice that this effect is the same at all levels of L2Doc Notice that these lines are equidistant (or at least they appear to be so in perspective) L2Doc GRE

  14. Returning to Flatland, Part II: Projecting the 3D graph back into 2D PEER Looking at our 3D plot from the side, we can see how to move from the fitted plane Note that this image has a different orientation than the one on the last slide 7 6 5 4 …to… 3 PEER A two-dimensional representation of prototypical fitted lines 7 6 5 4 3 GRE

  15. Multiple regression assumptions (with more than 1 predictor) • At each combination of the X’sthere is a distribution of Y. These distributions have a mean µ Y|X1…Xk and a variance of σ2Y|X1…Xk… • The straight line model is correct. The means of each of these distributions, the µ’s, may be joined by a plane. Y • Homoscedasticity. The variances of each of these distributions, the σ2’s, are identical. • Independence of observations. • Conditional on each combination of the X’s, the values of Y are independent of each other (we still can’t see this visually) X2 5.Normality. At each combination of the X’sthe values of Y are normally distributed X1

  16. Multiple regression results: Regressing Peer Ratings on both L2Doc and GRE Interpretation of intercept:The value of Y when all X’s = 0. When L2Doc=0 & GRE=0, the predicted Peer Rating is -87.29 Here, the intercept is not meaningful. Holding L2Doc constant, schools whose doctoral students have mean GRE scores that are 100 higher have peer ratings that are 63.32 points higher. Interpretation of slope coefficients:Difference in Y per 1 unit difference in X, holding all other X’s in the model constant. Holding mean GRE scores constant, schools with twice as many doctoral graduates have peer ratings that are an average of 15.34 points higher. The REG Procedure Dependent Variable: PeerRat Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 99814 49907 55.63 <.0001 Error 84 75359 897.12759 Corrected Total 86 175172 Root MSE 29.95209 R-Square 0.5698 Dependent Mean 344.82759 Adj R-Sq 0.5596 Coeff Var 8.68611 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -87.29494 43.07364 -2.03 0.0459 GRE 1 63.31660 7.60956 8.32 <.0001 L2Doc 1 15.34201 2.94746 5.21 <.0001 Synonyms: “statistically controlling for,” “partialling out,” “holding constant”

  17. Understanding the fitted multiple regression model algebraically & graphically 3 4 5 6 7 Graphically Return to the plot from before Algebraically Plug in different values of L2Doc Controlled effect of GRE can be seen in the commonslope (63.32) of these lines 4.75 -(-10.59) 15.34 -10.59 -(-25.93) 15.34 -25.93 -(-41.27) 15.34 Controlled effect of L2Doc can be seen in the common distance (15.34) between these lines

  18. Conceptualizing a 2D graph that will display our findings (120, 118.67) (80, 82.35) • Having made these 2 decisions, we then: • Systematically substitute in the prototypical values for those predictor(s), which yields a set of partial regression equations • Plot each partial regression equation as before (substitute in any 2 values for the remaining predictor, get the corresponding value of y-hat, plot the points, and connect them) • In multiple regression, we use the same general approach, but because we have more than 1 predictor, we have to make two decisions: • Decide which predictor you’d like to display on the X axis—in multiple regression, you have several predictors, but in a 2D graph, you only have 1 X axis • For all the other predictors (note: here we have only 1 other predictor, but usually we have more) identify prototypical values you’d like to use for plotting So let’s discuss how we make these two decisions

  19. Decision 1: Use sketches to select the predictor to display on the X axis(note: I mean sketches that don’t need to be drawn perfectly to scale) It’s usually easier to see/talk about a predictor displayed on the X axis (because its effect is seen through the slope) Corollary: Usually put the question predictor on the X axis. You’re typically less interested in control predictors and generally want to focus on question predictors From the multiple regression equation, we know that the fitted lines corresponding to 1 unit differences in L2Doc will be 15.34 rating points apart From the multiple regression equation, we know that the fitted lines corresponding to 1 unit differences in GRE will be 63.32 rating points apart (and the slopes for L2Doc will be shallower on this graph) Large L2Doc PeerRat PeerRat Medium L2Doc Small L2Doc Hi GREs We now need to define these prototypical values Medium GREs Low GREs GRE L2Doc Two general principles when deciding:

  20. Decision 2: Helpful strategies for selecting prototypical values • Examine the distribution of the remaining predictors and consider selecting: • Substantively interesting values. This is easiest when the predictor has inherently appealing values (e.g., 8, 12, and 16 years of education in the US) • A range of percentiles. When there are no well-known values, consider using a range of percentiles (either the 25th, 50th and 75th or the 10th, 50th, and 90th) • The sample mean  .5 (or 1) standard deviation. Best used with predictors with a symmetric distribution • The sample mean (on its own). If you don’t want to display a predictor’s effect but just control for it, using only the sample mean will yield a “controlled” fitted regression equation • Remember that exposition is easier if you select whole number values (if the scale permits) or easily communicated fractions (eg.,¼, ½, ¾, ⅛) The UNIVARIATE Procedure Variable: L2Doc Mean 5.141533 Std Deviation 1.10755 Median 5.247928 Variance 1.22666 Quantile Estimate 100% Max 7.59246 95% 6.78136 90% 6.47573 75% Q3 5.93074 50% Median 5.24793 25% Q1 4.39232 10% 3.70044 5% 3.32193 0% Min 2.00000 Stem Leaf # Boxplot 7 6 1 | 7 1 1 | 6 5677889 7 | 6 00001111244 11 | 5 5567777778999999 16 +-----+ 5 0001222222334444 16 *--+--* 4 556688889999 12 | | 4 012222223444 12 +-----+ 3 56799 5 | 3 033 3 | 2 6 1 | 2 00 2 0 ----+----+----+----+ Mean = 5.14, sd = 1.11 10th = 3.7 25th = 4.4 50th = 5.2 75th = 5.9 90th = 6.5 Use L2DOC = 4, 5, and 6 24 = 16 (small) 25 = 32 (medium) 26 = 64 (large)

  21. Substitute in the prototypical values to graph the fitted MR equation 15.34 Large Medium Small The vertical distance between the parallel lines spaced 1 unit apart is the slope coefficient for L2DOC Three prototypical lines representing the relationship between Peer Ratings and GRE scores, holding L2Doc constant (again, notice the identical slopes but different intercepts)

  22. What would happen if we put L2Doc on the X axis? The UNIVARIATE Procedure Variable: GRE Mean 5.578966 Std Deviation 0.428999 Median 5.505000 Variance 0.184035 Quantile Estimate 100% Max 6.775 95% 6.540 90% 6.050 75% Q3 5.845 50% Median 5.505 25% Q1 5.275 10% 5.040 5% 4.945 0% Min 4.745 Stem Leaf # Boxplot 67 78 2 0 66 02 2 | 65 4 1 | 64 | 63 3 1 | 62 | 61 2 1 | 60 04459 5 | 59 111366 6 | 58 04669 5 +-----+ 57 0004555688 10 | | 56 0225666 7 | | 55 000178 6 *--+--* 54 11244889 8 | | 53 3444556888 10 | | 52 0011122255788 13 +-----+ 51 | 50 1349 4 | 49 046 3 | 48 4 1 | 47 48 2 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Use GRE = 5, 5.5 and 6 6 (high GRE) 5.5 (~median GRE) 5 (low GRE) The vertical distance between the parallel lines spaced 1 unit apart (Low to High) is the slope coefficient for GRE Hi GRE Med GRE Lo GRE 63.32

  23. Understanding how the fitted MR equation provides statistical control       Comparison 1: Programs withidentical mean GRE scores may have different ratings because of their size Comparison 2: Programs ofequal size may have different ratings because of the quality of their student body Large (7.0, 448)  Medium  Small  (6.0, 385)  385 Comparison 3: There is more than one way to earn a specific rating (this is an unusual view of the data because it’s reasoning backwards—we don’t really ever want to hold Y constant— but nevertheless it’s worth thinking about)  369 Comparing values “on” any fitted line is holding L2Doc constant  354     (5.0, 321)    Comparing fitted values on a line drawn perpendicular to the X axis is holding GRE constant

  24. Towards understanding how and why MR can provide a better fit Large Medium Hi GRE Small Med GRE Lo GRE = Hi GRE = Large Schools = Med GRE = Medium Schools = Lo GRE = Small Schools Why multiple regression helps provide a better fit If the additional predictor(s) improve(s) the quality of the fit, the observed values of Y (the yi) will be closer to the predicted values of Y (the ) (the fitted values on the relevant line – i.e., the line corresponding to that specific combination of predictor values) Note: In actuality, there are fitted lines for every value of L2Doc, not just for large, medium, and small schools. Why are these lines parallel? They’re parallel because we assume that they are. This is known as a main effects assumption: we’re assuming the effect of each predictor is the same regardless of the levels of the other predictor. Might this not be a correct assumption? 

  25. Might these lines NOT be parallel?: Let’s imagine what else they might be = Hi GRE = Large Schools = Med GRE = Medium Schools = Lo GRE = Small Schools Hmmm…the larger the school, the larger the effect of GRE? Hmmm…the better the student body, the larger the effect of program size? • What does it mean if the lines aren’t parallel? • This says that the effect of one predictor (say the effect of L2Doc) differs by levels of the other predictor (here, GRE) • This is called a statistical interaction and in Unit 10 we’ll learn how to test for it and modify the model if necessary Right now, let’s assume that the main effects assumption is correct

  26. From simple to multiple regression: R2 & the Analysis of Variance (ANOVA) Y Y df Mean Square # predictors MSR=SSR/dfSSR X X (n-1) – # predictors MSE=SSE/dfSSE Analysis of Variance n-1 MST=SST/dfSST Source Sum of Squares Model (Regression) Error (Residual) Total Reprise of R2 in simple linear regression R2 in multiple linear regression The residual is now the vertical distance between the observation and the fitted regression plane Note that this table and the formula for R2 apply in both simple and multiple regression—it’s only the fitted values of Y that change!

  27. Comparing fitted values and residuals from simple and multiple regression models ID School GRE L2Doc PeerRat yhatgre yhatdoc yhatmr residgre residdoc residmr 1 Harvard 6.625 5.91 450 417.1 359.3 422.8 32.9 90.7 27.2 2 UCLA 5.780 5.73 410 358.7 355.9 366.6 51.3 54.1 43.4 3 Stanford 6.775 5.25 470 427.4 346.8 422.2 42.6 123.2 47.8 4 TC 6.045 7.59 440 377.0 391.2 411.9 63.0 48.8 28.1 5 Vanderbilt 6.605 4.46 430 415.7 331.9 399.3 14.3 98.1 30.7 6 Northwestern 6.770 3.32 390 427.1 310.4 392.3 -37.1 79.6 -2.3 7 Berkeley 6.050 5.43 440 377.4 350.2 379.0 62.6 89.8 61.0 8 Penn 6.040 5.93 380 376.7 359.7 386.1 3.3 20.3 -6.1 9 Michigan 6.090 5.25 430 380.1 346.8 378.8 49.9 83.2 51.2 10 Madison 5.800 6.73 430 360.1 374.8 383.2 69.9 55.2 46.8 .... 81 Colorado 5.210 3.91 300 319.3 321.5 302.5 -19.3 -21.5 -2.5 82 UWMilw 5.030 4.25 330 306.9 327.9 296.4 23.1 2.1 33.6 83 Hofstra 5.910 3.70 290 367.7 317.6 343.7 -77.7 -27.6 -53.7 84 IllState 4.785 5.04 290 290.0 343.0 293.1 0.0 -53.0 -3.1 85 IndianaSt 4.955 4.17 290 301.7 326.5 290.4 -11.7 -36.5 -0.4 86 StJohns 4.745 4.39 280 287.2 330.7 280.5 -7.2 -50.7 -0.5 87 UVM 5.340 4.17 310 328.3 326.5 314.8 -18.3 -16.5 -4.8 75,359 99,665 137,470 Sum of Squared Errors

  28. Interpreting R2 and the Analysis of Variance in multiple regression =.752 r=.75 Root MST = = 45.13 Analysis of Variance Sum of MeanSource DF Squares Square Model 2 99814 49907 Error 84 75359 897.12759Corrected Total 86 175172 Root MSE 29.95209 R-Square 0.5698 57.0% of the variation in Peer Ratings is associated with L2Doc and GRE

  29. Statistical inference: Two distinct types of hypotheses we can test Across all my predictors, is there anything going on, or would I do just as well without them? Controlling for all other predictors in the model, does each individual predictor, Xj, have an effect? Overall/Omnibus F test Individual t-tests With only 1 predictor (that is, in simple linear regression), these two tests are identical. In multiple regression, these two types of tests are decidedly different!

  30. Towards a heuristic understanding of the omnibus F-test:Comparing the regression decomposition when H0 is not true and is true Y Y X X Regression decomposition if H0 is not true Regression decomposition if H0is true Regression deviations are large Regression deviations are small Error deviations are small Error deviations are large SS Regression is large SS Regression is small SS Error is small SS Error is large MSR is large MSR is small MSE is small MSE is large

  31. Conducting omnibus hypothesis tests in multiple regression Critical values of Fobserved (α=.05) df for numerator (MSR) df for denominator (MSE) 25 50 100 inf 1 4.24 4.03 3.94 3.84 2 3.39 3.18 3.09 3.00 When the numerator df = 1, F = t2 (1.962=3.84) – this relationship makes sense so that the omnibus F-test and the single parameter t-test give identical results 3 2.99 2.79 2.70 2.60 4 2.76 2.56 2.46 2.37 5 2.60 2.40 2.31 2.21 10 2.24 2.03 1.93 1.83 20 2.01 1.78 1.68 1.57 120 1.77 1.64 1.38 1.22 1000 1.72 1.56 1.30 1.00 Omnibus F test: Across all my predictors, is there anything going on, or would I do just as well without them? Sum of Mean Source DF Squares Square F Value Pr > F Model 2 99814 49907 55.63 <.0001 Error 84 75359 897.12759 Corrected Total 86 175172 Q: Is 55.63 “large enough” to reject H0? Because F2, 84 = 55.63 (p<0.0001), we reject H0 that all’s = 0 and conclude that at least one j  0 Sound statistical practice: When reporting F-tests, be sure to provide not just the p-value but also both the numerator and denominator degrees of freedom

  32. Conducting individual t-tests in multiple regression Individual t-tests: Controlling for all other predictors in the model, does each individual predictor, Xj, have an effect? Individual t-tests in multiple regression are analogous to those in single variable regression The key difference comes in our interpretation of the results Statistically controlling for L2Doc, there is an effect of GRE Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -87.29494 43.07364 -2.03 0.0459 GRE 1 63.31660 7.60956 8.32 <.0001 L2Doc 1 15.34201 2.94746 5.21 <.0001 Statistically controlling for GRE, there is an effect of L2DOC

  33. How might we summarize the results of these analyses? Large Medium Small Definitions of program size Small: = 16 doctoral grads (24) Medium = 32 doctoral grads (25) Large = 64 doctoral grads (26)

  34. Examining residuals to examine assumptions as in simple regression Under-predicted Ratings are higher than we expected UC Berkeley Stem Leaf # Boxplot 20 6 1 | 18 17 2 | 16 3939 4 | 14 069 3 | 12 2 1 | 10 174 3 | 8 158579 6 | 6 4427 4 +-----+ 4 6759 4 | | 2 235800145 9 | | 0 6689028 7 *--+--* 0 9643119821 10 | | -2 671 3 | | -4 54437 5 | | -6 88075 5 +-----+ -8 860 3 | -10 862081 6 | -12 96754 5 | -14 10 2 | -16 37 2 | -18 3 1 | -20 | -22 | -24 | -26 8 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Berkeley UCDavis, Ohio State Studentized residuals Plot residuals (either raw or studentized) vs. each X Delaware UC Berkeley USF, UHouston Hofstra Delaware Over-predicted Ratings are lower than we expected Delaware (very high GREs)

  35. We should also plot residuals vs. fitted values UC Berkeley Delaware (very high GREs) • Possible nonlinearity? • Might this improve when we add other predictors in Unit 7? • Might this improve if we allow the effect GRE to interact with L2Doc (in Unit 10)

  36. What’s the big takeaway from this unit? • Multiple regression serves several purposes • We can more accurately explain the variation in the outcome Y by considering several predictors simultaneously • The basic principles of model fitting, data analysis, and inference remain essentially the same • Inference in multiple regression focuses both on the overall model and on the role of individual predictors (controlling for other predictors in the model) • Omnibus F-tests tell about the model as a whole • Individual t-tests provide information about an individual predictor when controlling for the other predictor(s) in the model • We have to make wise decisions about how to best present findings • Multidimensional graphs would be ideal, but we usually find ourselves displaying our findings in just two dimensions • Different plots emphasize different messages—you need to learn how to think about what the prototypical plots will look like and make educated decisions about what plots to display • Tables can be helpful in presenting results from several models that include different combinations of predictors

  37. Appendix: Annotated PC-SAS Code for fitting multiple regression models Note that the handouts include only annotations for the needed additional code. For the complete program, check program “Unit 6—EdSchools analysis” on the website. Proc reg allows you to fit multiple regression models by adding additional predictors to your model statement (following the equal “=” sign). The syntax for the output statement is similar, except that now you also need to ask for the predicted values (the fitted values of y), to use in residual plots to explore assumption violations. *-----------------------------------------------------------------* Fitting multiple regression model PEERRAT on L2DOC and GRE *-----------------------------------------------------------------*; proc reg data=one; model PeerRat=L2Doc GRE; output out=resdat1 r=residual student=student predicted=yhat; *-----------------------------------------------------------------* Univariate summary information on studentized residuals from multiple regression model PEERRAT on L2DOC GRE *----------------------------------------------------------------*; proc univariate data=resdat1 plots; var student; id school; *-----------------------------------------------------------------* Plotting studentized residuals vs. each Predictor and YHat *-----------------------------------------------------------------*; proc gplot data = resdat1; plot student*(L2Doc GRE yhat); symbol value='dot'; *-----------------------------------------------------------------* Computing fitted values and residuals from the 3 models *-----------------------------------------------------------------*; data one; set one; yhatgre = -40.51619 + 69.07083*GRE; yhatdoc = 247.62750 + 18.90487*L2Doc; yhatmr = -87.29494 + 63.31660*GRE + 15.34201*L2Doc; residgre = peerrat - yhatgre; residdoc = peerrat - yhatdoc; residmr = peerrat - yhatmr; proc univariatecan be used as usual (with the plots option) to analyze the new dataset RESDAT and to provide summary statistics for the residuals. To analyze residual assumptions for multiple regression model use proc gplot to produce plots of the residuals vs. predicted value of Y. You can obtain these fitted values and residuals from the separate PROC REGs but it’s MUCH easier to just write code in a data step, which is what I did.

  38. Glossary terms included in Unit 6 • Statistical control • Interactions • Main effects • Omnibus F-test

More Related