1 / 18

# Multiple Linear Regression. - PowerPoint PPT Presentation

Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Multiple Linear Regression.' - robert-larson

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Concept and uses.

• Model and assumptions.

• Intrinsically linear models.

• Model development and validation.

• Problem areas.

• Non-normality.

• Heterogeneous variance.

• Correlated errors.

• Influential points and outliers.

• Collinearity.

• Errors in X variables.

AGR206

Did you know?

ANOVA

REGRESSION

• Description restricted to data set. Did biomass increase with pH in the sample?

• Prediction of Y. How much biomass we expect to find in certain soil conditions?

• Extrapolation for new conditions: can we predict biomass in other estuaries?

• Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors?

• Control of process: requires causality. Can we create sites with certain biomass by changing the pH?

AGR206

• Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people.

• Random sample of people.

• Y was measured by an expensive and very accurate method (assume it reveals true %fat).

• X1: thickness of triceps skinfold

• X2: thigh circumference

• X3: midarm circumference.

• Bodyfat.jmp

AGR206

• Does thickness of triceps skinfold contribute significantly to predict fat content?

• What is the CI for fat content for a person whose X’s have been measured?

• Do I have more or less fat than last summer?

• Do I have more fat than recommended?

AGR206

• Linear, additive model to relate Y to p independent variables.

• Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept.

• Yi=b0+ b1 Xi1+…+ bp Xip+ei

• where ei are normal and independent random variables with common variance s2.

• In matrix notation the model and solution are exactly the same as for SLR:Y= Xb + eb=(X’X)-1(X’Y)

• All equations from SLR apply without change.

• AGR206

• Linear, and intrinsically linear models.

• Linearity refers to the parameters. The model can involve any function of X’s for as long as they do not have parameters that have to be adjusted.

• A linear model does not always produce a hyperplane.

• Yi=b0+ b1 f1(Xi1)+…+ bp fp(Xi1)+ei

• Polynomial regression.

• Is a special case where the functions are powers of X.

AGR206

AGR206

• Effects of order of entry on SS.

• The 4 types of SS.

• Partial correlation.

AGR206

Y

Yi

E{Yi}

X2

X1

The response surface in more than 3D is a hyperplane.

AGR206

• What variables to include.

• Depends on objective:

• descriptive -> no need to reduce number of variables.

• Prediction and estimation of Yhat: OK to reduce for economical use.

• Estimation of b and understanding: sensitive to deletions; may bias MSE and b. No real solution other than getting more data from better experiment. (Sorry!)

AGR206

• Effects of elimination of variables:

• MSE is positively biased unless true b for variables eliminated is 0.

• bhat and Yhat are biased unless previous condition or variables eliminated are orthogonal to those retained.

• Variance of estimated parameters and predictions is usually lower.

• There are conditions for which MSE for reduced model (including variance and bias2) is smaller.

AGR206

• R2 - Coefficient of determination.

• R2 = SSReg/SSTotal

• MSE or MSRes - Mean squared residuals.

• if all X’s in it estimates s2.

• R2adj = 1-MSE/MSTo = =1-[(n-1)/(n-p)] (SSE/SSTo)

• Mallow’s Cp

• Cp=[SSRes/MSEFull] + 2 p- n(p=number of parameters)

AGR206

AGR206

• Note that although we have many X’s, errors are still in a single dimension.

• Residual analysis is performed as for SLR, sometimes repeated over different X’s.

• Normality. Use proc univ normal option. Transform.

• Homogeneity of variance. Plot error vs. each X. Transform. Weighted least squares.

• Independence of errors.

• Adequacy of model. Plots errors. LOF.

• Influence and outliers. Use influence option in proc reg.

• Collinearity. Use collinoint option of proc reg.

AGR206

data s00.spart2;

set s00.spartina;

colin=2*ph+0.5*acid+sal+rannor(23);

run;

proc reg data=s00.spart2;

model bmss= colin h2s sal eh7 ph acid

p k ca mg na mn zn cu nh4 /

r influence vif collinoint stb partial;

run;

model colin=ph sal acid;

run;

AGR206

Model: MODEL1

Dependent Variable: BMSS

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Prob>F

Model 15 16369583.2 1091305.552 11.297 0.0001

Error 29 2801379.9 96599.307

C Total 44 19170963.2

Root MSE 310.80429 R-square 0.8539

Dep Mean 1000.80000 Adj R-sq 0.7783

C.V. 31.05558

AGR206

Parameter Estimates

Parameter Standard T for H0: Standardized

Variable DF Estimate Error Parameter=0 Prob > |T| Estimate

INTERCEP 1 3809.233562 3038.081 1.254 0.2199 0.00000000

COLIN 1 -178.317065 58.718 -3.037 0.0050 -1.06227792

H2S 1 0.336242 2.656 0.127 0.9001 0.01563626

SAL 1 150.513276 61.960 2.429 0.0216 0.84818417

EH7 1 2.288694 1.785 1.282 0.2099 0.12813770

PH 1 486.417077 306.756 1.586 0.1237 0.91891994

ACID 1 -24.816449 109.856 -0.226 0.8229 -0.09422943

P 1 0.153015 2.417 0.063 0.9500 0.00639498

K 1 -0.733250 0.439 -1.668 0.1061 -0.33059243

CA 1 -0.137163 0.111 -1.230 0.2286 -0.35706572

MG 1 -0.318586 0.243 -1.308 0.2010 -0.45340287

NA 1 -0.005294 0.022 -0.239 0.8127 -0.05520175

MN 1 -4.279887 4.836 -0.885 0.3835 -0.15872971

ZN 1 -26.270852 19.452 -1.351 0.1873 -0.32953283

CU 1 346.606818 99.295 3.491 0.0016 0.54452366

NH4 1 0.539373 3.061 0.176 0.8614 0.03862822

Variance

Inflation

0.00000000

24.28364757

3.02785626

24.19556405

1.98216733

66.64921013

34.53131689

2.02507775

7.79660017

16.72702792

23.82835726

10.57323219

6.38589662

11.81574077

4.82931410

9.53842459

AGR206