Multiple linear regression
Download
1 / 18

Multiple Linear Regression. - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multiple Linear Regression.' - robert-larson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multiple linear regression
Multiple Linear Regression.

  • Concept and uses.

  • Model and assumptions.

  • Intrinsically linear models.

  • Model development and validation.

  • Problem areas.

    • Non-normality.

    • Heterogeneous variance.

    • Correlated errors.

    • Influential points and outliers.

    • Model inadequacies.

    • Collinearity.

    • Errors in X variables.

AGR206


Concept uses
Concept & Uses.

Did you know?

ANOVA

REGRESSION

  • Description restricted to data set. Did biomass increase with pH in the sample?

  • Prediction of Y. How much biomass we expect to find in certain soil conditions?

  • Extrapolation for new conditions: can we predict biomass in other estuaries?

  • Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors?

  • Control of process: requires causality. Can we create sites with certain biomass by changing the pH?

AGR206


Body fat example in jmp
Body fat example in JMP.

  • Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people.

  • Random sample of people.

  • Y was measured by an expensive and very accurate method (assume it reveals true %fat).

  • X1: thickness of triceps skinfold

  • X2: thigh circumference

  • X3: midarm circumference.

  • Bodyfat.jmp

AGR206


Ho s or values of interest
Ho’s or values “of interest”

  • Does thickness of triceps skinfold contribute significantly to predict fat content?

  • What is the CI for fat content for a person whose X’s have been measured?

  • Do I have more or less fat than last summer?

  • Do I have more fat than recommended?

AGR206


Model and assumptions
Model and Assumptions.

  • Linear, additive model to relate Y to p independent variables.

    • Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept.

  • Yi=b0+ b1 Xi1+…+ bp Xip+ei

    • where ei are normal and independent random variables with common variance s2.

  • In matrix notation the model and solution are exactly the same as for SLR:Y= Xb + eb=(X’X)-1(X’Y)

  • All equations from SLR apply without change.

  • AGR206


    Linear models
    Linear models

    • Linear, and intrinsically linear models.

      • Linearity refers to the parameters. The model can involve any function of X’s for as long as they do not have parameters that have to be adjusted.

      • A linear model does not always produce a hyperplane.

      • Yi=b0+ b1 f1(Xi1)+…+ bp fp(Xi1)+ei

    • Polynomial regression.

      • Is a special case where the functions are powers of X.

    AGR206



    Extra sum of squares
    Extra Sum of Squares

    • Effects of order of entry on SS.

    • The 4 types of SS.

    • Partial correlation.

    AGR206



    Response plane and error
    Response plane and error

    Y

    Yi

    E{Yi}

    X2

    X1

    The response surface in more than 3D is a hyperplane.

    AGR206


    Model development
    Model development

    • What variables to include.

      • Depends on objective:

        • descriptive -> no need to reduce number of variables.

        • Prediction and estimation of Yhat: OK to reduce for economical use.

        • Estimation of b and understanding: sensitive to deletions; may bias MSE and b. No real solution other than getting more data from better experiment. (Sorry!)

    AGR206


    Variable selection
    Variable Selection

    • Effects of elimination of variables:

      • MSE is positively biased unless true b for variables eliminated is 0.

      • bhat and Yhat are biased unless previous condition or variables eliminated are orthogonal to those retained.

      • Variance of estimated parameters and predictions is usually lower.

      • There are conditions for which MSE for reduced model (including variance and bias2) is smaller.

    AGR206


    Criteria for variable selection
    Criteria for variable selection

    • R2 - Coefficient of determination.

      • R2 = SSReg/SSTotal

    • MSE or MSRes - Mean squared residuals.

      • if all X’s in it estimates s2.

    • R2adj - Adjusted R2.

      • R2adj = 1-MSE/MSTo = =1-[(n-1)/(n-p)] (SSE/SSTo)

    • Mallow’s Cp

      • Cp=[SSRes/MSEFull] + 2 p- n(p=number of parameters)

    AGR206


    Example
    Example

    AGR206


    Checking assumptions
    Checking assumptions.

    • Note that although we have many X’s, errors are still in a single dimension.

    • Residual analysis is performed as for SLR, sometimes repeated over different X’s.

      • Normality. Use proc univ normal option. Transform.

      • Homogeneity of variance. Plot error vs. each X. Transform. Weighted least squares.

      • Independence of errors.

      • Adequacy of model. Plots errors. LOF.

    • Influence and outliers. Use influence option in proc reg.

    • Collinearity. Use collinoint option of proc reg.

    AGR206


    Code for proc reg
    code for PROC REG

    data s00.spart2;

    set s00.spartina;

    colin=2*ph+0.5*acid+sal+rannor(23);

    run;

    proc reg data=s00.spart2;

    model bmss= colin h2s sal eh7 ph acid

    p k ca mg na mn zn cu nh4 /

    r influence vif collinoint stb partial;

    run;

    model colin=ph sal acid;

    run;

    AGR206


    Spartina anova output
    Spartina ANOVA output

    Model: MODEL1

    Dependent Variable: BMSS

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Prob>F

    Model 15 16369583.2 1091305.552 11.297 0.0001

    Error 29 2801379.9 96599.307

    C Total 44 19170963.2

    Root MSE 310.80429 R-square 0.8539

    Dep Mean 1000.80000 Adj R-sq 0.7783

    C.V. 31.05558

    AGR206


    Parameters and vif
    Parameters and VIF

    Parameter Estimates

    Parameter Standard T for H0: Standardized

    Variable DF Estimate Error Parameter=0 Prob > |T| Estimate

    INTERCEP 1 3809.233562 3038.081 1.254 0.2199 0.00000000

    COLIN 1 -178.317065 58.718 -3.037 0.0050 -1.06227792

    H2S 1 0.336242 2.656 0.127 0.9001 0.01563626

    SAL 1 150.513276 61.960 2.429 0.0216 0.84818417

    EH7 1 2.288694 1.785 1.282 0.2099 0.12813770

    PH 1 486.417077 306.756 1.586 0.1237 0.91891994

    ACID 1 -24.816449 109.856 -0.226 0.8229 -0.09422943

    P 1 0.153015 2.417 0.063 0.9500 0.00639498

    K 1 -0.733250 0.439 -1.668 0.1061 -0.33059243

    CA 1 -0.137163 0.111 -1.230 0.2286 -0.35706572

    MG 1 -0.318586 0.243 -1.308 0.2010 -0.45340287

    NA 1 -0.005294 0.022 -0.239 0.8127 -0.05520175

    MN 1 -4.279887 4.836 -0.885 0.3835 -0.15872971

    ZN 1 -26.270852 19.452 -1.351 0.1873 -0.32953283

    CU 1 346.606818 99.295 3.491 0.0016 0.54452366

    NH4 1 0.539373 3.061 0.176 0.8614 0.03862822

    Variance

    Inflation

    0.00000000

    24.28364757

    3.02785626

    24.19556405

    1.98216733

    66.64921013

    34.53131689

    2.02507775

    7.79660017

    16.72702792

    23.82835726

    10.57323219

    6.38589662

    11.81574077

    4.82931410

    9.53842459

    AGR206


    ad