Linear regression
Download
1 / 49

Linear Regression - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Ecole Nationale Vétérinaire de Toulouse. Linear Regression. Didier Concordet [email protected] ECVPT Workshop April 2011. Can be downloaded at http://www.biostat.envt.fr/. An example. b>0. Y. Y. Y. a. b=0. b>0. a. b<0. a=0. x. x. x. About the straight line.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Linear Regression' - piper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Linear regression

Ecole Nationale

Vétérinaire

de Toulouse

Linear Regression

Didier Concordet

[email protected]

ECVPT Workshop April 2011

Can be downloaded at http://www.biostat.envt.fr/



About the straight line

b>0

Y

Y

Y

a

b=0

b>0

a

b<0

a=0

x

x

x

About the straight line

Y= a + b x

a = intercept

b = slope


Questions
Questions

  • How to obtain the best straight line ?

  • Is this straight line the best curve to use ?

  • How to use this straight line ?


How to obtain the best straight line
How to obtain the best straight line ?

Proceed in three main steps

  • write a (statistical) model

  • estimate the parameters

  • graphical inspection of data


Write a model
Write a model

A statistical model

Mean model :

functionnal relationship

Variance model :

Assumptions on the residuals


Write a model1
Write a model

Mean model

= residual (error term)


Assumptions on the residuals
Assumptions on the residuals

  • the xi 's are not random variables

  • they are known with a high precision

  • the ei 'shave a constant variance

  • homoscedasticity

  • the ei 'sare independent

  • the ei 'sare normally distributed

  • normality


Homoscedasticity
Homoscedasticity

homoscedasticity

heteroscedasticity



Estimate the parameters
Estimate the parameters

A criterion is needed to estimate parameters

A statistical model

A criterion


How to estimate the best a et b
How to estimate the "best" a et b ?

Intuitive criterion :

minimum

compensation

Reasonnable criterion :

minimum

Linear model

Homoscedasticity

Normality

Least squares criterion (L.S.)



Result of optimisation
Result of optimisation

and

change with samples

and

are random variables


Balance sheet
Balance sheet

True mean straight line

Estimated straight line

or

Mean predicted value for the ith observation

ith residual


Example
Example

Dep Var: HPLC N: 18

Effect Coefficient Std Error t P(2 Tail)

CONSTANT 20.046 3.682 5.444 0.000

CONCENT 2.916 0.069 42.030 0.000

Intercept

Estimated straight line

Slope




Residual variance
Residual variance

by construction

but

The residual variance is defined by

standard error of estimate


Example3
Example

Dep Var: HPLC N: 18

Multiple R: 0.996 Squared multiple R: 0.991

Adjusted squared multiple R: 0.991

Standard error of estimate : 8.282

Effect Coefficient Std Error t P(2 Tail)

CONSTANT 20.046 3.682 5.444 0.000

CONCENT 2.916 0.069 42.030 0.000


Questions1
Questions

  • How to obtain the best straight line ?

  • Is this straight line the best curve to use ?

  • How to use this straight line ?


Is this model the best one to use
Is this model the best one to use ?

  • Tools to check the mean model :

    • scatterplot residuals vs fitted values

    • test(s)

  • Tools to check the variance model :

    • scatterplot residuals vs fitted values

    • Probability plot (Pplot)


Checking the mean model
Checking the mean model

scatterplot residuals vs fitted values

0

0

structure in the residuals

change the mean model

No structure in the residuals

OK


Checking the mean model tests
Checking the mean model : tests

Two cases

No replication

Try a polynomial model

(quadratic first)

Replications

Test of lack of fit


Without replication
Without replication

try another mean model and test the improvement

Example :

If the test on c is significant (c  0) then keep this model

Dep Var: HPLC N: 18

Multiple R: 0.996 Squared multiple R: 0.991

Adjusted squared multiple R: 0.991

Standard error of estimate: 8.539

Effect Coefficient Std Error t P(2 Tail)

CONSTANT 21.284 6.649 3.201 0.006

CONCENT 2.842 0.335 8.486 0.000

CONCENT

*CONCENT 0.001 0.003 0.227 0.824


With replications

Departure from linearity

With replications

Perform a test of lack of fit

Pure error

Principle : compare

to

if

-

>

then change the model


Test of lack of fit how to do it
Test of lack of fit : how to do it ?

Three steps

1) Linear regression

2) One way ANOVA

3)

if

then change the model


Test of lack of fit example
Test of lack of fit : example

Three steps

1) Linear regression

2) One way ANOVA

Dep Var: HPLC N: 18

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

CONCENT 121251.776 5 24250.355 289.434 0.000

Error 1005.427 12 83.786

3)

if

We keep the straight line


Checking the variance model homoscedasticity
Checking the variance model : homoscedasticity

scatterplot residuals vs fitted values

0

0

No structure in the residuals

but heteroscedasticity

change the model (criterion)

homoscedasticity

OK


What to do with heteroscedasticity
What to do with heteroscedasticity ?

scatterplot residuals vs fitted values :

modelize the dispersion.

0

The standard deviation of the residuals increases

with : it increases with x


What to do with heteroscedasticity1
What to do with heteroscedasticity ?

Estimate again the slope and the intercept but with

weights proportionnal to the variance.

with

and check that the weight residuals (as defined

above) are homoscedastic


Checking the variance model normality
Checking the variance model : normality

0

Expected value for normal distribution

Expected value for normal distribution

0

No curvature :

Normality

Curvature : non normality

is it so important ?


What to do with non normality
What to do with non normality ?

Try to modelize the distribution of residuals

In general, it is difficult with few observations

If enough observations are available,

the non normality does not affect too much

the result.


An interesting indice r
An interesting indice R²

R² = square correlation coefficient

= % of dispersion of the Yi's explained

by the straight line (the model)

0  R²  1

If R² = 1, all theei = 0, the straight line explain all the variation of the Yi's

If R² = 0, the slope is = 0, the straight line does not explain any variation of the Yi's


An interesting indice r1
An interesting indice R²

R² and R (correlation coefficient) are not designed to measure linearity !

Example :

Multiple R: 0.990

Squared multiple R: 0.980

Adjusted squared multiple R: 0.980


Questions2
Questions

  • How to obtain the best straight line ?

  • Is this straight line the best curve to use ?

  • How to use this straight line ?


How to use this straight line
How to use this straight line ?

  • Direct use : for a given x

    • predict the mean Y

    • construct a confidence interval of the mean Y

    • construct a prediction interval of Y

  • Reverse use calibration (approximate results): for a given Y

    • predict the mean x

    • construct a confidence interval of the mean x

    • construct a prediction interval of X



Confidence interval of the mean y
Confidence interval of the mean Y

There is a probability 1-a that a+bx belongs to this interval




Prediction interval of y
Prediction interval of Y

100(1-a)% of the measurements carried-out for this x belongs to this interval




Reverse use for a given y y 0 predict the mean x
Reverse use : for a given Y=y0 predict the mean X

Example :


For a given y y 0 a confidence interval of the mean x
For a given Y=y0 a confidence interval of the mean X

Y0

X

U

L


Confidence interval of the mean x
Confidence interval of the mean X

There is a probability 1-a that the mean X belongs to [ L , U ]

L and U are so that



What you should no longer believe
What you should no longer believe

One can fit the straight line by inverting x and Y

If the correlation coefficient is high, the straight line is the best model

Normality of the xi's is required to perform a regression

Normality of the ei's is essential to perform a good regression


ad