regression 2 multiple linear regression and path analysis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Regression: (2) Multiple Linear Regression and Path Analysis PowerPoint Presentation
Download Presentation
Regression: (2) Multiple Linear Regression and Path Analysis

Loading in 2 Seconds...

play fullscreen
1 / 32

Regression: (2) Multiple Linear Regression and Path Analysis - PowerPoint PPT Presentation


  • 517 Views
  • Uploaded on

Regression: (2) Multiple Linear Regression and Path Analysis. Hal Whitehead BIOL4062/5062. Multiple Linear Regression and Path Analysis. Multiple linear regression assumptions parameter estimation hypothesis tests selecting independent variables collinearity polynomial regression

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Regression: (2) Multiple Linear Regression and Path Analysis' - lolita


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple linear regression and path analysis
Multiple Linear Regression and Path Analysis
  • Multiple linear regression
    • assumptions
    • parameter estimation
    • hypothesis tests
    • selecting independent variables
    • collinearity
    • polynomial regression
  • Path analysis
regression
Regression

One Dependent Variable Y

Independent Variables X1,X2,X3,...

purposes of regression
Purposes of Regression

1. Relationship between Y and X's

2. Quantitative prediction of Y

3. Relationship between Y and X controlling for C

4. Which of X's are most important?

5. Best mathematical model

6. Compare regression relationships: Y1 on X, Y2 on X

7. Assess interactive effects of X's

slide5
Simple regression: one X
  • Multiple regression: two or more X's

Y = ß0 + ß1X(1) + ß2X(2) + ß3X(3) + ... + ßkX(k) + E

multiple linear regression assumptions 1
Multiple linear regression:assumptions (1)
  • For any specific combination of X's, Y is a (univariate) random variable with a certain probability distribution having finite mean and variance (Existence)
  • Y values are statistically independent of one another (Independence)
  • Mean value of Y given the X's is a straight linear function of the X's (Linearity)
multiple linear regression assumptions 2
Multiple linear regression:assumptions (2)
  • The variance of Y is the same for any fixed combinations of X's (Homoscedasticity)
  • For any fixed combination ofX's, Y has a normal distribution (Normality)
  • There are no measurement errors in the X's (Xs measured without error)
multiple linear regression parameter estimation
Multiple linear regression:parameter estimation

Y = ß0 + ß1X(1) + ß2X(2) + ß3X(3) + ... + ßkX(k) + E

  • Estimate the ß's in multiple regression using least squares
  • Sizes of the coefficients not good indicators of importance of X variables
  • Number of data points in multiple regression
    • at least one more than number of X’s
    • preferably 5 times number of X’s
why do large animals have large brains schoenemann brain behav evol 2004
Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)

Multiple regression of Y [Log (CNS)] on:

X’ s ß SE(ß)

Log(Mass) -0.49 (0.70)

Log(Fat) -0.07 (0.10)

Log(Muscle) 1.03 (0.54)

Log(Heart) 0.42 (0.22)

Log(Bone) -0.07 (0.30)

N=39

multiple linear regression hypothesis tests
Multiple linear regression:hypothesis tests

Usually test:

H0: Y = ß0 + ß1⋅X(1) + ß2⋅X(2) + ... + ßj⋅X(j) + E

H1: Y = ß0 + ß1⋅X(1) + ß2⋅X(2) + ... + ßj⋅X(j) + ... + ßk⋅X(k) + E

F-test with k-j, n-(k-j)-1 degrees of freedom (“partial F-test”)

H0: variables X(j+1),…,X(k) do not help explain variability in Y

multiple linear regression hypothesis tests1
Multiple linear regression:hypothesis tests

e.g. Test significance of overall multiple regression

H0: Y = ß0 + E

H1: Y = ß0 + ß1⋅X(1) + ß2⋅X(2) + ... + ßk⋅X(k) + E

  • Test significance of
    • adding independent variable
    • deleting independent variable
why do large animals have large brains schoenemann brain behav evol 20041
Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)

Multiple regression of Y [Log (CNS)] on:

X’ s ß SE(ß) P

Log(Mass) -0.49 (0.70) 0.49

Log(Fat) -0.07 (0.10) 0.52

Log(Muscle) 1.03 (0.54) 0.07

Log(Heart) 0.42 (0.22) 0.06

Log(Bone) -0.07 (0.30) 0.83

Tests

whether

removal

of

variable

reduces

fit

multiple linear regression selecting independent variables
Multiple linear regression:selecting independent variables
  • Reasons for selecting a subset of independent variables (X’s):
    • cost (financial and other)
    • simplicity
    • improved prediction
    • improved explanation
multiple linear regression selecting independent variables1
Multiple linear regression:selecting independent variables
  • Partial F-test
    • predetermined forward selection
    • forward selection based upon improvement in fit
    • backward selection based upon improvement in fit
    • stepwise (backward/forward)
  • Mallow’s C(p)
  • AIC
multiple linear regression selecting independent variables2
Multiple linear regression:selecting independent variables
  • Partial F-test
    • predetermined forward selection
      • Mass, Bone, Heart, Muscle, Fat
    • forward selection based upon improvement in fit
    • backward selection based upon improvement in fit
    • Stepwise (backward/forward)
multiple linear regression selecting independent variables3
Multiple linear regression:selecting independent variables
  • Partial F-test
    • predetermined forward selection
    • forward selection based upon improvement in fit
    • backward selection based upon improvement in fit
    • stepwise (backward/forward)
why do large animals have large brains schoenemann brain behav evol 20042
Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)
  • Complete model (r2=0.97):
  • Forward stepwise (α-to-enter=0.15; α-to-remove=0.15):
    • 1. Constant (r2=0.00)
    • 2. Constant + Muscle (r2=0.97)
    • 3. Constant + Muscle + Heart (r2=0.97)
    • 4. Constant + Muscle + Heart + Mass (r2=0.97)

-0.18 - 0.82xMass +1.24xMuscle + 0.39xHeart

why do large animals have large brains schoenemann brain behav evol 20043
Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)
  • Complete model (r2=0.97):
  • Backward stepwise (α-to-enter=0.15; α-to-remove=0.15):
    • 1. All (r2=0.97)
    • 2. Remove Bone (r2=0.97)
    • 3. Remove Fat (r2=0.97)

-0.18 - 0.82xMass +1.24xMuscle + 0.39xHeart

comparing models
Comparing models
  • Mallow’s C(p)
    • C(p) = (k-p).F(p) + (2p-k+1)
      • k parameters in full model; p parameters in restricted model
      • F(p) is the F value comparing the fit of the restricted model with that of the full model
    • Lowest C(p) is best model
  • Akaike Information Criteria (AIC)
    • AIC=n.Log(σ2) +2p
    • Lowest AIC indicates best model
    • Can compare models not included in one another
collinearity
Collinearity
  • If two (or more) X’s are linearly related:
    • they are collinear
    • the regression problem is indeterminate

X(3)=5.X(2)+16, or

X(2)=4.X(1)+ 16.X(4)

  • If they are nearly linearly related (near collinearity), coefficients and tests are very inaccurate
what to do about collinearity
What to do about collinearity?
  • Centering (mean = 0)
  • Scaling (SD =1)
  • Regression on first few Principal Components
  • Ridge Regression
curvilinear polynomial regression
Curvilinear (Polynomial) Regression
  • Y = ß0 + ß1⋅X + ß2⋅X² + ß3⋅X3 + ... + ßk⋅Xk + E
  • Used to fit fairly complex curves to data
  • ß’s estimated using least squares
  • Use sequential partial F-tests, or AIC, to find how many terms to use
    • k>3 is rare in biology
  • Better to transform data and use simple linear regression, when possible
curvilinear polynomial regression1
Curvilinear (Polynomial) Regression

Y=0.066 + 0.00727.X

Y=0.117 + 0.00085.X + 0.00009.X²

Y=0.201 - 0.01371.X + 0.00061.X²

- 0.000005.X3

From Sokal and Rohlf

path analysis1

A B

C D

E

Path Analysis
  • Models with causal structure
  • Represented by path diagram
  • All variables quantitative
  • All path relationships assumed linear
    • (transformations may help)
path analysis2

A B

C D

E

U

Path Analysis
  • All paths one way
    • A => C
    • C => A
  • No loops
  • Some variables may not be directly observed:
    • residual variables (U)
  • Some variables not observed but known to exist
    • latent variables (D)
path analysis3

A B

C D

E

U

Path Analysis
  • Path coefficients and other statistics calculated using multiple regressions
  • Variables are:
    • centered (mean = 0) so no constants in regressions
    • often standardized (SD = 1)
  • So: path coefficients usually between -1 and +1
  • Paths with coefficients not significantly different from zero may be eliminated
path analysis an example
Path Analysis: an example
  • Isaak and Hubert. 2001. “Production of stream habitat gradients by montane watersheds: hypothesis tests based on spatially explicit path analyses” Can. J. Fish. Aquat. Sci.
slide30

- - - Predicted negative interaction ________ Predicted positive interaction