1 / 27

Experimental design and analysis

Experimental design and analysis. Multiple linear regression. Ó Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors. Multiple regression. One response (dependent) variable: Y More than one predictor (independent variable) variable:

skyla
Download Presentation

Experimental design and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experimental design and analysis Multiple linear regression Ó Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

  2. Multiple regression • One response (dependent) variable: • Y • More than one predictor (independent variable) variable: • X1, X2, X3 etc. • number of predictors = p • Number of observations = n

  3. Example • A sample of 51 mammal species (n = 51) • Response variable: • total sleep time in hrs/day (y) • Predictors: • body weight in kg (x1) • brain weight in g (x2) • maximum life span in years (x3) • gestation time in days (x4)

  4. Regression models Population model (equation): • yi = 0 + 1x1 + 2x2 + .... + i Sample equation: • yi = b0 + b1x1 + b2x2 + ....

  5. Example • Regression model: sleep = intercept + 1*bodywt + 2*brainwt + 3*lifespan + 4*gestime

  6. Multiple regression equation Log lifespan Total sleep Log body weight

  7. Partial regression coefficients • Ho: 1 = 0 • Partial population regression coefficient (slope)for y on x1, holding all other x’s constant, equals zero • Example: • slope of regression of sleep against body weight, holding brain weight, max. life span and gestation time constant, is 0.

  8. Partial regression coefficients • Ho: 2 = 0 • Partial population regression coefficient (slope)for y on x2, holding all other x’s constant, equals zero • Example: • slope of regression of sleep against brain weight, holding body weight, max. life span and gestation time constant, is 0.

  9. Testing HO: i = 0 • Use partial t-tests: • t = bi / SEbi • Compare with t-distribution with n-2 df • Separate t-test for each partial regression coefficient in model • Usual logic of t-tests: • reject HO if P < 0.05

  10. Model comparison • To test HO: 1 = 0 • Fit full model: • y = 0+1x1+2x2+3x3+… • Fit reduced model: • y = 0+2x2+3x3+… • Calculate SSextra: • SSRegression(full) - SSRegression(reduced) • F = MSextra / MSResidual(full)

  11. Overall regression model • Ho: 1 = 2 = ... = 0 (all population slopes equal zero). • Test of whether overall regression equation is significant. • Use ANOVA F-test: • Variation explained by regression • Unexplained (residual) variation

  12. Regression diagnostics • Residual is still observed y - predicted y • Studentised residuals still work • Other diagnostics still apply: • residual plots • Cook’s D statistics

  13. Assumptions • Normality and homogeneity of variance for response variable • Independence of observations • Linearity • No collinearity

  14. Collinearity • Collinearity: • predictors correlated • Assumption of no collinearity: • predictor variables are uncorrelated with (ie. independent of) each other • Collinearity makes estimates of i’s and their significance tests unreliable: • low power for individual tests on i’s

  15. Collinearity Response (y) and 2 predictors (x1 and x2); n=20 1. x1 and x2 uncorrelated (r = -0.24) coeff se tol tP intercept -0.17 1.03 -0.16 0.873 x1 1.13 0.14 0.95 7.86 <0.001 x2 0.12 0.14 0.95 0.86 0.404 R2 = 0.787, F = 31.38, P < 0.001

  16. Collinearity 2. rearrange x2 so x1 and x2 highly correlated (r = 0.99) coeff se tol tP intercept 0.49 0.72 0.69 0.503 x1 1.55 1.21 0.01 1.28 0.219 x2 -0.45 1.21 0.01 -0.37 0.714 R2 = 0.780, F = 30.05, P < 0.001

  17. Checks for collinearity • Correlation matrix between predictors • Tolerance for each predictor: • 1-R2 for regression of that predictor on all others • if tolerance is low (<0.1) then collinearity is a problem • Variance inflation factor(VIF) for each predictor: • 1/tolerance • if VIF>10 then collinearity is a problem

  18. Explained variance R2 proportion of variation in y explained by linear relationship with x1, x2etc. SS Regression SS Total

  19. Example Sleep Bodywt Brainwt Lifespan Gestime 3.3 6654.000 5712.0 38.6 645 12.5 3.385 44.5 14.0 60 etc. African elephant Arctic fox etc.

  20. Boxplots of variables

  21. Predictors log transformed Parameter Estimate SE Tol tP Intercept 18.94 3.11 6.09 <0.001 Bodywt -0.76 1.31 0.08 -0.58 0.565 Brainwt -0.84 2.03 0.05 -0.42 0.680 Lifespan 2.60 2.05 0.33 1.27 0.211 Gestime -5.11 1.81 0.36 -2.82 0.007 R2 = 0.486 • Collinearity problem for body weight and brain weight • low tolerance • highly correlated

  22. Omit brain weight because body weight and brain weight are so highly correlated. Parameter Estimate SE Tol t P Intercept 19.06 3.07 6.21 <0.001 Bodwt -1.25 0.59 0.36 -2.09 0.042 Lifespan 2.19 1.78 0.43 1.23 0.225 Gestime -5.39 1.67 0.42 -3.23 0.002 R2 = 0.484 • No collinearity between any predictors: • all tolerances OK • reduced SE and larger slope for body weight

  23. Examples from literature

  24. Lampert (1993) • Ecology 74:1455-1466 • Response variable: • Daphnia (water flea) clutch size • Predictors: • body size (mm) • particulate organic carbon (mg/L) • temperature (oC)

  25. Lampert (1993) Parameter Coeff. SE tP Intercept -42.34 27.52 -1.54 0.168 Body size 14.76 7.10 2.08 0.076 POC 0.27 0.43 0.61 0.559 Temp 0.73 0.68 1.07 0.321 ANOVA P = 0.052, R2 = 0.684, n = 11

  26. Williams et al. (1993) • Ecology 74:904-918 • Response variable: • Zostera (seagrass) growth • Predictors: • epiphyte biomass • porewater ammonium

  27. Williams et al. (1993) Parameter Coeff. P Epiphyte biomass 0.340 >0.05 Porewater ammonium 0.919 <0.05 R2 = 0.71 Tolerance = 0.839 (so no collinearity)

More Related