Chapter 7

Chapter 7

Exercise 1 Mean X*:7.6,8.1,9.6,10.2,10.7,12.3,13.4,13.9,14.6,15.2 Here the percentile BS method is applied, testing P is given by where is the proportion of bootstrap sample means greater than μo, in this case 8.5. 80% of the boostrap samples are greater than 8.5, thus P is estimated at 0.4. Calculation is given by (2✕(1-0.8)) The middle 80% of observed bootstraped sample means range from 8.1 to 14.6, so this would be the 0.8 CI for the population mean.

Exercise 2 x=c(5,12,23,24,6,58,9,18,11,66,15,8) trimci(x) [1] 7.157829 22.842171 trimci(x, tr=0) [1] 8.504757 33.995243 Distribution is skewed with outliers Inflating variance for means but not trimmed means out(x) $out.val [1] 58 66

Exercise 3 x=c(5,12,23,24,6,58,9,18,11,66,15,8) > trimcibt(x,tr=0,side=F) [1] "Taking bootstrap samples. Please wait." [1] "NOTE: p.value is computed only when side=T" $estimate [1] 21.25 $ci [1] 12.40284 52.46772 $test.stat [1] 3.669678 trimci(x) [1] 7.157829 22.842171 trimci(x, tr=0) [1] 8.504757 33.995243

Exercise 4 The trimmed mean will be more accurate because skewness and outliers shift the the T distribution in a manner that changes its cumulative probabilities (and quantiles). Trimmed means, based in bootstrap methods (trimcibt), perform much better under skewness, and they remove outliers. The boostrap method estimates well the sampling distribution and actual T.

Exercise 5 y=c(7.6,8.1,9.6,10.2,10.7,12.3,13.4,13.9,14.6,15.2) trimcibt(y, side=F) $estimate [1] 11.68333 $ci [1] 8.171324 15.754730

Exercise 6 x=c(5,12,23,24,6,58,9,18,11,66,15,8) > trimpb(x) [1] "The p-value returned by the this function is based on the" [1] "null value specified by the argument null.value, which defaults to 0" [1] "Taking bootstrap samples. Please wait." $ci [1] 9.75 31.50 $p.value [1] 0

Exercise 7 Percentile t bootstrap will be more accurate in this case, based on simulations studies:

Exercise 8 a=c(2,4,6,7,8,9,7,10,12,15,8,9,13,19,5,2,100,200,300,400) onesampb(a) $ci [1] 7.425758 19.806385

Exercise 9 a=c(2,4,6,7,8,9,7,10,12,15,8,9,13,19,5,2,100,200,300,400) onesampb(a) $ci [1] 7.425758 19.806385 trimpb(a) $ci [1] 7.25000 63.33333 The M estimators remove the outliers by design. In contrast, when bootstrapping trimmed means with 4 outliers, some bootstrap samples will include more than 4 outliers (sampling with replacement) in a manner that exceeds the 20% trimming. For this reason, the CI will be longer for trimmed means.

Exercise 10 trimpb(a,tr=.30) $ci [1] 7.125 35.500 CI is shorter relative to 20% trimming. The trimming is higher so it can withstand higher number of outliers that are sampled in the bootstrap procedure.

Exercise 11 trimpb(a,tr=.40) [1] 7.0 14.5 CI is now even shorter because the trimming is high enough to handle almost all outliers that are sampled in the bootstrap procedure.

Exercise 12 The bootstrap method for estimating standard errors can be highly inaccurate with respect to probability coverage in the case of the median when there are tied values in the data set. Tied values can significantly alter the estimate of the median across different bootstrap samples.

Exercise 13 sint(a) [1] 7.00000 14.52953 > onesampb(a) $ci [1] 7.425758 19.806385 There are situation where CI for the median is shorter than CI for M estimators even though they remove outliers.

Exercise 14 Data from table 6.7, not 6.6. x=c(34,49,49,44,66,48,49,39,54,57,39,65,43,43,44,42,71,40,41,38,42,77,40,38,43,42,36,55,57,57,41,66,69,38,49,51,45,141,133,76,44,40,56,50,75,44,181,45,61,15,23,42,61,146,144,89,71,83,49,43,68,57,60,56,63,136,49,57,64,43,71,38,74,84,75,64,48) y=c(129,107,91,110,104,101,105,125,82,92,104,134,105,95,101,104,105,122,98,104,95,93,105,132,98,112,95,102,72,103,102,102,80,125,93,105,79,125,102,91,58,104,58,129,58,90,108,95,85,84,77,85,82,82,111,58,99,77,102,82,95,95,82,72,93,114,108,95,72,95,68,119,84,75,75,122,127) lsfitci(x,y) $intercept.ci [1] 89.80599 110.16031 $slope.ci [,1] [,2] [1,] -0.3293914 0.1169401

Exercise 15 corb(x,y) $r [1] -0.03474317 $ci [1] -0.3326596 0.2138606 CI contains 0, so do not reject

Exercise 16 indt(x,y) [1] "Taking bootstrap sample, please wait.” $dstat [1] 26.20945 $p.value.d [1] 0.016 Test for independence is rejected, so variable are dependent.

Exercise 17 0 correlation does not mean independence. There are many situations for which there is an association between variables while r=0, some pertain to curvature, heteroscedasticity, and bad leverage points.

Exercises 18 and 19 pcorb(x,y) $r [1] -0.3868361 $ci [1] -0.6403710 -0.1176394 Now we reject, whereas previously we did not lsfitci(x,y,xout=T) $intercept.ci [1] 104.4706 140.0141 $slope.ci [,1] [,2] [1,] -0.8733226 -0.07469801 After manual removal lsfitci(x,y) [$intercept.ci [1] 104.3009 137.8797 $slope.ci [,1] [,2] [1,] -0.7979083 -0.1467203 Data from table 6.7, not 6.6.

Exercise 20 c=c(300,280,305,340,348,357,380,397,453,456,510,535,275,270,335,342,354,394,383,450,446,513,520,520) d=c(32.75,28,30.75,29,27,31.20,27,27,23.50,21,21.5,22.8,30.75,27.25,31,26.50,23.50,22.70,25.80,27.80,21.50,22.50,20.60,21) lsfitci(c,d) $intercept.ci [1] 36.11869 44.19982 $slope.ci [,1] [,2] [1,] -0.04761471 -0.02518437 Conventional methods can have accurate probability coverage in certain situations. There isn’t always a difference.

Exercise 21 x=c(0.032,0.034,0.214,0.263,0.275,0.275,0.450,0.500,0.500,0.630,0.800,0.900,0.900,0.900,0.900,1.000,1.100,1.100,1.400,1.700,2.000,2.000,2.000,2.000) y=c(170,290,-130,-70,-185,-220,200,290,270,200,300,-30,650,150,500,920,450,500,500,960,500,850,800,1090) lsfitci(x,y) $intercept.ci [1] -191.9351 108.0508 $slope.ci [,1] [,2] [1,] 305.8974 642.884

Chapter 7

Chapter 7

Presentation Transcript

Chapter 7

Chapter 7

Chapter 7

CHAPTER 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7