Simple Linear Regression

1 / 35

# Simple Linear Regression - PowerPoint PPT Presentation

Simple Linear Regression. Data available ： (X,Y).    G oal ： To predict the response Y. (i.e. to obtain the fitted response function f(X)). How to determine this regression function? (need to estimate the parameters.). Least Squares Fitting Method.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Simple Linear Regression' - teneil

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data available：(X,Y)

Goal：To predict the response Y.

(i.e. to obtain the fitted response function f(X))

How to determine this regression function?

(need to estimate the parameters.)

Least Squares Fitting Method

### Least Squares Regression Function：

Least Squares Estimates

Terminology

Fitted model

True model

Fitted regression function

Obs MIDTERM FINAL

1 68 75

2 49 63

3 60 57

4 68 88

5 97 88

6 82 79

7 59 82

8 50 73

9 73 90

10 39 62

11 71 70

12 95 96

13 61 76

14 72 75

15 87 85

16 40 40

17 66 74

18 58 70

19 58 75

20 77 72

Figure 1.4 SAS PROC PRINT output for the grade data problem.

DATA;

INPUT MIDTERM FINAL;

CARDS;

68 75

49 63

60 57

. .

77 72

;

PROC PLOT;

PLOT FINAL*MIDTERM=’O’ PRED*MIDTERM=’P’ / OVERLAY;

LABEL FINAL=’FINAL’;

PROC RANK NORMAL=VW;

VAR RESID;

RANKS NSCORE;

•  PROC PLOT;
• PLOT RESID*NSCORE=’R’;
• LABEL NSCORE=’NORMAL SCORE’;
• RUN;

PROC PRINT;

PROC REG;

MODEL FINAL=MIDTERM / P;

OUTPUT PREDICTED=PRED

RESIDUAL=RESID;

Model: MODEL1

Dependent Variable: FINAL

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 1774.44117 1774.44117 24.26 0.0001

Error 18 1316.55883 73.14216

Corrected Total 19 3091.00000

Root MSE 8.55232 R-Square 0.5741

Dependent Mean 74.50000 Adj R-Sq 0.5504

Coeff Var 11.47962

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 34.56757 8.32984 4.15 0.0006

MIDTERM 1 0.60049 0.12192 4.93 0.0001

Dep Var Predicted

Obs FINAL Value Residual

1 75.0000 75.4007 -0.4007

2 63.0000 63.9915 -0.9915

3 57.0000 70.5968 -13.5968

4 88.0000 75.4007 12.5993

5 88.0000 92.8149 -4.8149

6 79.0000 83.8076 -4.8076

7 82.0000 69.9963 12.0037

8 73.0000 64.5920 8.4080

9 90.0000 78.4032 11.5968

10 62.0000 57.9866 4.0134

11 70.0000 77.2022 -7.2022

12 96.0000 91.6139 4.3861

13 76.0000 71.1973 4.8027

14 75.0000 77.8027 -2.8027

15 85.0000 86.8100 -1.8100

16 40.0000 58.5871 -18.5871

17 74.0000 74.1998 -0.1998

18 70.0000 69.3959 0.6041

19 75.0000 69.3959 5.6041

20 72.0000 80.8051 -8.8051

Sum of Residuals 0

Sum of Squared Residuals 1316.55883

Predicted Residual SS (PRESS) 1668.47241

|

100 +

| o

|

| o p p

| o o

| o

| o p

80 + p o

F | o p pp

I | o o o o o

N | o pp o o

A | p p

L | p

| o o

60 + p

| p o

|

|

|

|

|

40 + o

|

-+------------+------------+------------+------------+------------+------------+------------+

30 40 50 60 70 80 90 100

NOTE: 6 obs hidden.

MIDTERM

Figure 1.6 Output for the first PROC PLOT step for the grade data problem.

20 +

|

|

|

| R

| R R

10 +

| R

R |

e | R R R

s | R

i |

d 0 +---------------------------------R---------R--R---------------------------------------------

u | R R

a | R

l | R R

| R

| R

-10 +

|

| R

|

|

| R

-20 +

|

--+----------+----------+----------+----------+----------+----------+----------+----------+--

55 60 65 70 75 80 85 90 95

Predicted Value of FINAL

Figure 1.7 The remainder of the output from the first PROC PLOT step.

20 +

|

|

|

| R

| R R

10 +

| R

R |

e | R R R

s | R

i |

d 0 + R R R

u | R R

a | R

l | R R

| R

| R

-10 +

|

| R

|

|

| R

-20 +

|

--+----------+----------+----------+----------+----------+----------+----------+----------+--

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

NORMAL SCORE

＊Pearson’s Correlation Coefficient

＊Goal：The degree of linear correlation

between two variables.

The range lies between –1 and 1.

＊Coefficient of Determination: the fraction of the

variance in y that is explained by regression on x.

Goal：may be used as an index of linearity for the

relation of y to x.

Definition：

120 +

| o

|

| o

|

100 + o

|

| o

| o

P | o

R 80 +

E | o o

S | o

S | o

U | o

R 60 + o o

| o

| o

| o

| o

40 + o o

| o o

| o o o

| o

|

20 +

|

---+---------+---------+---------+---------+---------+---------+---------+---------+--

10 15 20 25 30 35 40 45 50

VOLUME

Figure 3.3: A plot of the air pressure data (an example of residual analysis).

|

30 +

|

|

|

| *

|

20 +

|

R |

e | *

s |*

i |

d 10 + * *

u |

a | *

l | * *

|

| * *

0 +------------------------------------------------------------------------------*-------------

| *

| * *

| * *

| * *

| * * * *

-10 + * * *

|

-+---------+---------+---------+---------+---------+---------+---------+---------+---------+-

16.357 25.007 33.658 42.308 50.959 59.609 68.259 76.910 85.560 94.210

Predicted Value of P

Figure 3.4 The residual on fit plot after fitting the model P= a + b V + e to the air pressure data.

0.50 +

| *

|

| *

|

0.25 +

|

| * * * * * *

| * * *

R | * * *

e 0.00 +-----------------------*--------------------------*------------------------

s | *

i | * * *

d |

u | *

a -0.25 + *

l | *

| *

|

|

-0.50 +

|

|

| *

|

-0.75 +

---+-------------+-------------+-------------+-------------+-------------+--

20 40 60 80 100 120

Predicted Value of P

Figure 3.5 The residual on the fit plot using the model P = a + b/V +e for the air pressure data.

Weighted Regression

Problem： (unequal variance)

Model：

Claim：minimize

Ordinary Regression

Model：

Claim：minimize

How to determine the weights?

So the optimal weights are inversely proportional to the variances of the y.

PROC REG;

MODEL P=VI;

WIGHT W;

OUTPUT P=FIT R=RES;

DATA;

SET;

WRES=SQRT(W)*RES;

DATA;

INPUT V P;

VI=1/V;

CARDS;

48 29.1

.

.

.

12 117.6

;

PROC RANK NORMAL=VW;

VAR WRES;

RANKS NSCORE;

PROC PLOT;

PLOT WRES*FIT=’*’ / VREF=0 VPOS=30;

POLT WRES*NSCORE=’*’ /VPOS=30;

LABEL WRES=’WEIGHTED RESIDUAL’ NSCORE=’NORMAL SCORE’;

RUN;

PROC REG;

MODEL P=VI;

OUTPUT P=LSFIT;

DATA;

SET;

W=1/LSFIT;

|

0.050 +

|

| *

W | *

E |

I 0.025 + * * *

G | * * *

H | *

T | * * *

E | * *

D 0.000 +-----------------------*---------------------------------------------------

| *

R | * * *

E | *

S |

I -0.025 + *

D | *

U |

A | *

L |

-0.050 + *

|

|

|

| *

-0.075 +

|

---+-------------+-------------+-------------+-------------+-------------+--

20 40 60 80 100 120

Predicted Value of P

Figure 3.13 Weighted residual plot for a weighted fit of the model P = a + b/V + e to the air pressure data .

0.0002 +

|

|

| *

| *

0.0001 + * *

| *

|

R | * * *

e | * *

s 0 +------*--------*-------------------------------*---------------*--------------------*

i | * * * *

d | *

u | * *

a | *

l -0.0001 + *

|

|

|

|

-0.0002 +

|

|

| *

|

-0.0003 +

|

---+---------------+---------------+---------------+---------------+---------------+--

-0.034 -0.029 -0.024 -0.019 -0.014 -0.009

Predicted Value of PT

Figure 3.17 Residual on fit plot for the model –1/ P =α+ BV + e in air pressure data.

|

|

0.0002 +

|

|

| *

| *

0.0001 + * *

| *

|

R | * * *

e | * *

s 0 + * * * * *

i | * * * *

d | *

u | * *

a | *

l -0.0001 + *

|

|

|

|

-0.0002 +

|

|

| *

|

-0.0003 +

|

---+------------------+------------------+------------------+------------------+--

-2 -1 0 1 2

NORMAL SCORE

Figure 3.18 Residual normal probability plot for the model –1/ P =α+ BV + e in air pressure data..

|

|

0.0001 + *

| *

| *

|

| *

0.00005 + * *

| *

| *

R | *

e | * *

s 0 +----------------------------------------------------*------------------------

i | * * *

d | * *

u |

a | * *

l -0.00005 + *

| * *

|

|

| *

-0.0001 +

|

|

| *

|

-0.00015 +

|

---+-------+-------+-------+-------+-------+-------+-------+-------+-------+--

-0.033 -0.030 -0.027 -0.024 -0.021 -0.018 -0.016 -0.013 -0.010 -0.007

Predicted Value of PT

Figure 3.19 Residual on fit plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.

|

|

0.0001 + *

| *

| *

|

| *

0.00005 + * *

| *

| *

R | *

e | * *

s 0 + *

i | * * *

d | * *

u |

a | * *

l -0.00005 + *

| * *

|

|

| *

-0.0001 +

|

|

| *

|

-0.00015 +

|

---+------------------+------------------+------------------+------------------+--

-2 -1 0 1 2

NORMAL SCORE

Figure 3.20 Residual normal probability plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.

How to determine the weights of transformation T

such that

(assuming T is monotonic increasing)