Chapter 6

1 / 26

# Chapter 6 - PowerPoint PPT Presentation

Chapter 6. Exercise 1. X=c(5,8,9,7,14) Y=c(3,1,6,7,19) R function ols ( x,y ) returns ( Intercept ) -8.477876 x ( slope ): 1.823009 mean(x )=8.6, mean(y )=7.2. Exercise 2. X=c(5,8,9,7,14) Y=c(3,1,6,7,19). Exercise 3. X=c(5,8,9,7,14) Y=c(3,1,6,7,19)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Chapter 6' - conner

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 6

Exercise 1

X=c(5,8,9,7,14)

Y=c(3,1,6,7,19)

R function ols(x,y) returns (Intercept) -8.477876

x (slope): 1.823009

mean(x)=8.6, mean(y)=7.2

Exercise 2

X=c(5,8,9,7,14)

Y=c(3,1,6,7,19)

Exercise 3

X=c(5,8,9,7,14)

Y=c(3,1,6,7,19)

The sum of squared residuals will be larger in this line relative to LSR because the LSR line is designed to minimize the residuals.

Exercise 5

a=c(3,104,50,9,68,29,74,11,18,39,0,56,54,77,14,32,34,13,96,84,5,4,18,76,34,14,9,28,7,11,21,30,26,2,11,12,6,3,3,47,19,2,25,37,11,14,0)

b=c(0,5,0,0,0,6,0,1,1,2,17,0,3,6,4,2,4,2,0,0,13,9,1,4,2,0,4,0,4,6,4,4,1,6,6,13,3,1,0,3,1,6,1,0,2,11,3)

The R function ols(a,b)returns

(Intercept) 4.58061839

x (slope) -0.04051423

Exercise 6

c=c(300,280,305,340,348,357,380,397,453,456,510,535,275,270,335,342,354,394,383,450,446,513,520,520)

d=c(32.75,28,30.75,29,27,31.20,27,27,23.50,21,21.5,22.8,30.75,27.25,31,26.50,23.50,22.70,25.80,27.80,21.50,22.50,20.60,21)

Ols(c,d) yields:

Higher levels of solar radiation predict lower rates of cancer.

Exercise 7

a=c(500,530,590,660,610,700,570,640)

b=c(2.3,3.1,2.6,3.0,2.4,3.3,2.6,3.5)

R function ols(a,b) returns

(Intercept) 0.484615385

X (slope) 0.003942308

Exercise 8

R function ols(a,b) returns

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) 0.484615385 1.289275061 0.3758821 0.7199360

x 0.003942308 0.002137246 1.8445735 0.1146492

\$Ftest.p.value

value

0.1146492

Thismeansthat SAT accounts for about 36% of the variance in GPA. Thisgives an indication of the strength of the assocition

\$R.squared

[1] 0.3618685

Exercise 9

x=c(40,41,42,43,44,45,46)

y=c(1.62,1.63,1.90,2.64,2.05,2.13,1.94)

ols(x,y)

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) -1.25321429 2.73157319 -0.4587885 0.6656396

x 0.07535714 0.06345636 1.1875429 0.2883482

\$Ftest.p.value

value

0.2883482

\$R.squared

[1] 0.2200002

Exercise 10

c=c(300,280,305,340,348,357,380,397,453,456,510,535,275,270,335,342,354,394,383,450,446,513,520,520)

d=c(32.75,28,30.75,29,27,31.20,27,27,23.50,21,21.5,22.8,30.75,27.25,31,26.50,23.50,22.70,25.80,27.80,21.50,22.50,20.60,21)

Ols(c,d) yields

(Intercept) 39.99094634

X(slope) -0.03565283

600 exceeds the range of X values, so the prediction is based on

extrapolation.The relationship between the variables may change

in extreme values.

Exercise 11

mou=c(63.3,60.1,53.6,58.8,67.5,62.5)

time=c(241.5,249.8,246.1,232.4,237.2,238.4)

R function cor.test(mou,time) returns Pearson\'s product-moment correlation

t = -0.7872, df = 4, p-value = 0.4752

sampleestimates:cor -0.3662634

There is insufficientevidence to

determinethat the correlation is

differentthan 0.

> qt(0.975,4): [1] 2.776445

pt(-0.7872,4): [1] 0.2375939, for two tailed 0.234*2=0.475

P>0.05

T=-0.78 doesnot

Exceedcrticialvalue

Of 2.77 or -2.77

Exercise 12

x=c(1,2,3,4,5,6)

y=c(1,4,7,7,4,1)

ols(x,y) (Intercept) 4.000000e+00 (slope) -5.838669e-16 (reasonably close to 0)

Data is consistent with an inverted U

shape rather than with the linear model.

There might be an association here that is

not detected.

Exercise 13

x=c(1,2,3,4,5,6)

y=c(4,5,6,7,8,2)

The LSR slope is still 0 even though there is a clear linear trend to the data,

which is masked by a single outlier

Exercise 14

The nature of the relationship between two variables can vary with the predictor value. In other words, the association between Y and X can change as a function of X values. Extrapolating beyond the data range, therefore, can be problematic, even when the association appears to be linear. In non-linear associations, the LSR line can be misleading.

Exercise 15

age=c(5.2,8.8,10.5,10.6,10.4,1.8,12.7,15.6,5.8,1.9,2.2,4.8,7.9,5.2,0.9,11.8,7.9,1.5,10.6,8.5,11.1,12.8,11.3,1,14.5,11.9,8.1,13.8,15.5,9.8,11.0,14.4,11.1,5.1,4.8,4.2,6.9,13.2,9.9,12.5,13.2,8.9,10.8)

cpep=c(4.8,4.1,5.2,5.5,5,3.4,3.4,4.9,5.6,3.7,3.9,4.5,4.8,4.9,3.0,4.6,4.8,5.5,4.5,5.3,4.7,6.6,5.1,3.9,5.7,5.1,5.2,3.7,4.9,4.8,4.4,5.2,5.1,4.6,3.9,5.1,5.1,6.0,4.9,4.1,4.6,4.9,5.1)

R function: cor(age,cpep) returns;

[1] 0.3906776

R function: hc4test(age,cpep) returns:

\$test

[1] 4.705966

\$p.value

[1] 0.03005811

Thus, r=0.39, and the hc4test rejects at 0.05

Exercise 16

age=c(5.2,8.8,10.5,10.6,10.4,1.8,12.7,15.6,5.8,1.9,2.2,4.8,7.9,5.2,0.9,11.8,7.9,1.5,10.6,8.5,11.1,12.8,11.3,1,14.5,11.9,8.1,13.8,15.5,9.8,11.0,14.4,11.1,5.1,4.8,4.2,6.9,13.2,9.9,12.5,13.2,8.9,10.8)

cpep=c(4.8,4.1,5.2,5.5,5,3.4,3.4,4.9,5.6,3.7,3.9,4.5,4.8,4.9,3.0,4.6,4.8,5.5,4.5,5.3,4.7,6.6,5.1,3.9,5.7,5.1,5.2,3.7,4.9,4.8,4.4,5.2,5.1,4.6,3.9,5.1,5.1,6.0,4.9,4.1,4.6,4.9,5.1)

ols(age[age<7],cpep[age<7])

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) 3.5148814 0.37014633 9.495924 6.244186e-07

x 0.2474008 0.08924835 2.772049 1.689761e-02

C-peptide concentrations increase

to about age 7. The regression

line plateaus beyond that age.

Using a single line or correlation

To describe the relationship is

ls(age[age>7],cpep[age>7])

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) 4.7535568 0.64125948 7.4128445 5.654828e-08

x 0.0132083 0.05550626 0.2379606 8.137083e-01

Exercise 17

size=c(2359,3397,1232,2608,4870,4225,1390,2028,3700,2949,688,3147,4000,4180,3883,1937,2565,2722,4231,1488,4261,1613,2746,1550,3000,1743,2388,4522)

price=c(510,690,365,592,1125,850,363,559,860,695,182,860,1050,675,859,435,555,525,805,369,930,375,670,290,715,365,610,1290)

R Function ols(size,price) returns

(Intercept) 38.1921217

X (Slope) 0.2153008

The conclusion hereisthat a home size of 0 cost 38.192, whichmakes no sense.

This illustrates ho non-linearrelationshipscanmake the regression land midleading. Extrapolation beyond the data canbeproblematic.

Exercise 18

lot=c(18200,12900,10060,14500,76670,22800,10880,10880,23090,10875,3498,42689,17790,38330,18460,17000,15710,14180,19840,9150,40511,9060,15038,5807,16000,3173,24000,16600)

price=c(510,690,365,592,1125,850,363,559,860,695,182,860,1050,675,859,435,555,525,805,369,930,375,670,290,715,365,610,1290)

R function ols(lot,price) returns

EstimateStd. Error t value Pr(>|t|)

(Intercept) 436.83367567 66.609568133 6.558122 5.927679e-07

x (slope) 0.01104288 0.002754693 4.008752 4.569549e-04

Exercise 19

This would generally be the case when the relationship are linear and homoscedastic.

Exercise 20

x=c(18,20,35,16,12)

y=c(36,29,48,64,18)

R function ols(x,y) returns:

EstimateStd. Error t value Pr(>|t|)

(Intercept) 25.3283679 23.774217 1.0653713 0.3648449

x 0.6768135 1.096856 0.6170485 0.5808715

\$Ftest.p.value: 0.5808715

R functioncor.test(x,y) returns:

t = 0.617, df = 3, p-value = 0.5809

sampleestimatescor: 0.3355929

Both analyses agree, both not significant. X and Y can still be dependent in nonlinear ways,

and there are power considerations with a small sample size.

Exercise 21

x=c(12.2,41,5.4,13,22.6,35.9,7.2,5.2,55,2.4,6.8,29.6,58.7)

y=c(1.8,7.8,0.9,2.6,4.1,6.4,1.3,0.9,9.1,0.7,1.5,4.7,8.2)

R function ols(x,y) returns

EstimateStd. Error t value Pr(>|t|)

(Intercept) 0.3269323 0.248122843 1.317623 2.144131e-01

x 0.1550843 0.008413901 18.431919 1.280856e-09

The estimate of the slopeis 0.155 with a SE of 0.0084. The 0.975 quantile of Twith 24 dfis:

> qt(0.975,24)

[1] 2.063899

The scatter plot suggests that X and Y increase together, but with the same

confidence interval situations arise when it is not always the case

Exercise 22

x=c(34,49,49,44,66,48,49,39,54,57,39,65,43,43,44,42,71,40,41,38,42,77,40,38,43,42,36,55,57,57,41,66,69,38,49,51,45,141,133,76,44,40,56,50,75,44,181,45,61,15,23,42,61,146,144,89,71,83,49,43,68,57,60,56,63,136,49,57,64,43,71,38,74,84,75,64,48)

y=c(129,107,91,110,104,101,105,125,82,92,104,134,105,95,101,104,105,122,98,104,95,93,105,132,98,112,95,102,72,103,102,102,80,125,93,105,79,125,102,91,58,104,58,129,58,90,108,95,85,84,77,85,82,82,111,58,99,77,102,82,95,95,82,72,93,114,108,95,72,95,68,119,84,75,75,122,127)

R function ols(x,y) returns

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) 97.95728197 4.73432147 20.6908809 9.985891e-33

x (slope) -0.02136595 0.07096758 -0.3010664 7.641969e-01

pq(0.975.df=77)

[1] 1.99

Exercise 23

khomreg(size,price)

\$test

[1,] 6.115014, \$p.value[1,] 0.01340384

khomreg(lot,price)

\$test

[1,] 0.1683221

\$p.value

[1,] 0.6816073

We actually do reject for house size but not for lot size. This test may not have sufficient power to

detect heteroscedasticity, so when we fail to reject, it is difficult to draw conclusions

Exercise 24

ols(x,y)

EstimateStd. Error t valuePr(>|t|)

(Intercept) 65.46175413 18.4508380 3.5479014 0.000673844

x (slope) -0.05649584 0.1876524 -0.3010664 0.764196940

(Slopeis close to 0, with P<0.764, do not rejet with OLS)

\$Ftest.p.value

value

0.7641969 (bookhastypo)

rqfit(x,y)

\$coef

(Intercept) x

95.2000000 -0.4333333

\$ci

lower bd upper bd

(Intercept) 64.4610733 105.972735

X (Slope) -0.5505706 -0.1450298 (CI for slopedoes not contain 0, sorejectwithrqfit.

ols(y,x)

regplot(y,x,regfun=rqfit)

As isevidentin the scatterplot of OLS, there are severaloutliersbetween the X values of

100-130. To minimize least squared distances, theseoutliers pull the regression line

upward in a mannerthatmakesithorizontal.Therqfitisbased on the median of Y

instead of mean. It isthusinsensitive to outliers , making the regressionline (in blue)

go throughthe middle (0.5 y quantile/X) of the bulk of the observations.

Exercise 25

The data can be accessed by library(MASS)

X=c(2300,750,4300,2600,6000, 10500, 10000, 17000, 5400, 7000, 9400, 32000, 35000, 100000, 100000, 52000, 100000, 4400, 3000, 4000, 1500, 9000, 5300, 10000, 19000, 27000, 28000, 31000, 26000, 21000, 79000, 100000,100000)

Y=c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65,56,65,17,7,16,22,3,4,2,3,8,4,3,30,4,43)

ols(X,Y)

\$coef

EstimateStd. Error t value Pr(>|t|)

(Intercept) 53.8899623928 1.027986e+01 5.242286 1.072131e-05

x -0.0004461206 2.296306e-04 -1.942775 6.117379e-02

\$Ftest.p.value0.06117379

Olshc4 reject. It has a smaller standard error for the slope

olshc4(X,Y)

\$ci

Coef. Estimates ci.lowerci.upper p-value Std.Error

(Intercept) 0 53.8899623928 30.5619402421 7.721798e+01 4.902827e-05 1.143803e+01

Slope 1 -0.0004461206 -0.0008776261 -1.461508e-05 4.315956e-02 2.115728e-04