regression 1 simple linear regression n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Regression: (1) Simple Linear Regression PowerPoint Presentation
Download Presentation
Regression: (1) Simple Linear Regression

Loading in 2 Seconds...

play fullscreen
1 / 52

Regression: (1) Simple Linear Regression - PowerPoint PPT Presentation


  • 173 Views
  • Uploaded on

Regression: (1) Simple Linear Regression. Hal Whitehead BIOL4062 / 5062. Regression. Purposes of regression Simple linear regression Formula Assumptions If assumptions hold, what can we do? Testing assumptions When assumptions do not hold. Regression. One Dependent Variable Y

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Regression: (1) Simple Linear Regression' - gabriella


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
regression 1 simple linear regression

Regression:(1) Simple Linear Regression

Hal Whitehead

BIOL4062 / 5062

regression
Regression
  • Purposes of regression
  • Simple linear regression
    • Formula
    • Assumptions
    • If assumptions hold, what can we do?
    • Testing assumptions
    • When assumptions do not hold
regression1
Regression

One Dependent Variable Y

Independent Variables X1,X2,X3,...

purposes of regression
Purposes of Regression

1. Relationship between Y and X's

2. Quantitative prediction of Y

3. Relationship between Y and X controlling for C

4. Which of X's are most important?

5. Best mathematical model

6. Compare regression relationships: Y1 on X, Y2 on X

7. Assess interactive effects of X's

slide5
Simple regression: one X
  • Multiple regression: two or more X's
simple linear regression
Simple linear regression

Y = β0 + β1X + Error

assumptions of simple linear regression
Assumptions of simple linear regression

1. Existence

2. Independence

3. Linearity

4. Homoscedasticity

5. Normality

6. X measured without error

assumptions of simple linear regression1
Assumptions of simple linear regression

1. For any fixed value of X, Y is a random variable with a certain probability distribution having finite mean and variance

(Existence)

Y

Prob of Y

X

assumptions of simple linear regression2
Assumptions of simple linear regression

2. The Y values are statistically independent of one another

(Independence)

assumptions of simple linear regression3
Assumptions of simple linear regression

3. The mean value of Y given X is a straight line function of X

(Linearity)

Y

Prob of Y

X

assumptions of simple linear regression4
Assumptions of simple linear regression

4. The variance of Y is the same for all X

(Homoscedasticity)

Y

Prob of Y

X

assumptions of simple linear regression5
Assumptions of simple linear regression

5. For any fixed value of X, Y has a normal distribution

  • (Normality)

Y

Prob of Y

X

assumptions of simple linear regression6
Assumptions of simple linear regression

6. There are no measurement errors in X

(X measured without error)

assumptions of simple linear regression7
Assumptions of simple linear regression

1. Existence

2. Independence

3. Linearity

4. Homoscedasticity

5. Normality

6. X measured without error

if assumptions hold what can we do
If assumptions hold, what can we do?

1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty

2. Describe quality of fit (variation of data around straight line) by estimate of σ² or r²

3. Tests of slope and intercept

4. Prediction and prediction bands

5. ANOVA Table

parameters estimated using least squares
Parameters estimated using least-squares
  • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue)

Find line which minimizes

squares of residuals

1 estimate 0 intercept 1 slope together with measures of uncertainty
1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty
  • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue)
1 estimate 0 intercept 1 slope together with measures of uncertainty1
1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty
  • β0 = 0.230

(SE 0.028)

    • 95% c.i.:

0.164; 0.296

  • β1 = -0.0035

(SE 0.0009)

    • 95% c.i.:

-0.0056; 0.0013

2 describe quality of fit by estimate of or r
2. Describe quality of fit by estimate of σ² or r²

σ² = 0.0195

r2 = 0.679

r2 (adjusted)= 0.633

(Propn. variance accounted for by

regression)

3 tests of slope and intercept
3. Tests of slope and intercept

a) Slope = 0 {Equivalent to r=0}

b) Slope = Predetermined constant

c) Intercept = 0

d) Intercept = Predetermined constant

e) Compare slopes

f) Compare intercepts {Assume same slope}

(tests use t-distribution)

3a slope 0 equivalent to r 0
3a) Slope = 0 {Equivalent to r=0}

Does pregnancy rate change with age?

H0: β1 = 0

H1: β1≠ 0

P=0.006

Does pregnancy rate decline with age?

H0: β1 = 0

H1: β1 > 0

P=0.003

3b slope predetermined constant
3b) Slope = Predetermined constant

β1 = 2.868 (SE 0.058)

95% c.i.: 2.752; 2.984

Does shape change with length?

H0: β1 = 3

H1: β1≠ 3

P<0.05

weight=length3

Weights and Lengths of Cetacean Species

Whitehead & Mann In Cetacean Societies 2000

3c intercept 0
3c) Intercept = 0

β0 = 0.436 (SE 0.080)

95% c.i.: 0.276; 0.596

Is birth length proportional to length?

H0: β0 = 0

H1: β0≠ 0

P=0.000

3e compare slopes
3e) Compare slopes

β1 (m) = 2.528 (SE 0.409)

β1 (o) = 2.962 (SE 0.094)

Does shape change differently with length for odontocetes and mysticetes?

H0: β1 (m) = β1 (o)

H1: β1 (m) ≠ β1 (o) P = 0.146

Weights and Lengths of Cetacean Species

Whitehead & Mann 2000

3f compare intercepts assume same slope
3f) Compare intercepts{Assume same slope}

β0 (m) = 2.528 (SE 0.409)

β0 (o) = 2.962 (SE 0.094)

Are odontocetes and mysticetes equally fat?

H0: β0 (m) = β0 (o)

H1: β0 (m) ≠β0 (o) P = 0.781

15

10

Log(Weight)

5

ORDER

m

o

0

0

1

2

3

4

Log(Length)

4 prediction and prediction bands
4. Prediction and prediction bands

95% Confidence Bands for

Regression Line

95% Prediction Bands

From: http://www.tufts.edu/~gdallal/slr.htm

5 anova table
5. ANOVA Table

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

Regression 286.27 1 286.27 2475.07 0.00

Residual 5.32 46 0.12

if assumptions hold what can we do1
If assumptions hold, what can we do?

1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty

2. Describe quality of fit (variation of data around straight line) by estimate of σ²or r²

3. Tests of slope and intercept

4. Prediction and prediction bands

5. ANOVA Table

testing assumptions diagnostics

Expected

Testing assumptions: diagnostics
  • Use residuals to look at assumptions of regression:

e(i) = Y(i) - (β0 + β1X(i))

Observed

residuals
Residuals
  • Residual: e(i) = Y(i) - (β0 + β1X(i))
  • Standardized residuals: e(i)/S

{S is the standard deviation of the residuals

with adjusted degrees of freedom}

  • Studentized residuals: e(i) / [S(1 - h(i))]

{h(i) is the "leverage value" of observation i:

h(i) =1/n + (X(i) - ΣX(i)/n )²/[(n-1)S(X)²]}

  • Jackknifed residuals: e(i) / [S(-i) (1 - h(i))]

{The residual variance (S(-i)) is calculated separately with each observation deleted}

use residuals to
Use Residuals to:

a) look for outliers which we may wish to remove

b) examine normality

c) check for linearity

d) check for homoscedasticity

e) check for some kinds of non-independence

should outliers be removed
Yes

if “outlier” was probably not produced by the process being studied

measurement error

different species

...

No

if “outlier” was probably produced by the process being studied

extreme specimen

Should outliers be removed?
b using residuals to examine normality
b) Using residuals to examine normality
  • Lilliefors test for normality:

P=0.62

  • Lilliefors test for normality (excluding Bowhead whale):

P=0.68

e use residuals to check for some kinds of non independence
e) Use residuals to check for some kinds of non-independence
  • Durbin-Watson D Statistic: 1.48
    • low values (<2) indicate autocorrelation
  • First Order Autocorrelation: 0.26

Days spent following sperm whales

use residuals to1
Use Residuals to:

a) look for outliers which we may wish to remove

b) examine normality

c) check for linearity

d) check for homoscedasticity

e) check for some kinds of non-independence

assumptions of simple linear regression8
Assumptions of simple linear regression

1. Existence

2. Independence

3. Linearity

4. Homoscedasticity

5. Normality

6. X measured without error

when assumptions do not hold
When assumptions do not hold:

1. Existence:

Forget it!

when assumptions do not hold1
When assumptions do not hold:

2. Independence:

  • collect data differently
  • reduce the size of the data set
  • add additional terms to the regression model
    • (e.g. autocorrelation term, species effect)

More a problem for testing than prediction

when assumptions do not hold2
When assumptions do not hold:

3. Linearity:

  • Transform either X or Y or both variables. e.g.:

Log(Y) = ß0+ ß1 Log(X) + E

  • Polynomial regression:

Y = ß0 + ß1X + ß2X² + ... + E

  • Non-linear regression. e.g.:

Y = c + EXP(ß0 + ß1X) + E

  • Piecewise linear regression:

Y = ß0 + ß1X [X>XK] + E

where [X> XK]=0 if X< XK and [X> XK]=1 if X> XK.

slide44

Y = ß0 + ß1X [X>XK] + E

  • Log(Y) = ß0+ ß1 Log(X) + E
  • Y = ß0 + ß1X + ß2X² + ... + E
  • Y = c + EXP(ß0 + ß1X) + E
when assumptions do not hold3
When assumptions do not hold:

4. Homoscedasticity:

  • Transformations of the Y variable
  • Weighted regressions(if we know that some observations are more accurate than others)
when assumptions do not hold4
When assumptions do not hold:

5. Normality:

  • Transformations of the Y variable
  • Non-normal error structures (e.g. Poisson)

Small departures from normality are not especially important, unless doing a test

when assumptions do not hold5
When assumptions do not hold:

6. X measured without error:

  • Major axis regression
  • Reduced major axis, or geometric mean, regression
major axis regression
Major axis regression:
  • Minimize sum of squares of perpendicular distances from observations to regression line
  • Only if variables are in same units

{First principal component of covariance matrix}

reduced major axis regression
Reduced major axis regression:
  • Each of the two variables is transformed to have a mean of zero and a standard deviation of 1
  • Then, minimize sum of squares of perpendicular distances from observations to regression line
  • Its slope cannot be sensibly tested against zero

{first principal component using the correlation matrix}

regression2
Regression
  • Extremely useful technique!
  • Check assumptions using residuals
  • Can be extended in several ways
    • multiple regression
    • non-linear regression
    • non-normal errors
    • piecewise regression
    • ...