1 / 48

Chapter 4

Chapter 4. Exercise 1. The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the 0.025 and 0.975 quantiles of the sampling distribution of the sample statistic. . Exercise 2.

hisano
Download Presentation

Chapter 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4

  2. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the 0.025 and 0.975 quantilesof the sampling distribution of the sample statistic.

  3. Exercise 2 C would be the 1-α/2 quantile on the normal distribution. From Table 1 or R function qnorm: For CI of 0.8, the 0.9 quantileis 1.281 For CI of 0.92, the 0.96 quantile is 1.750 For CI of 0.98, the 0.99 quantile is 2.326

  4. Exercise 3 • From Table 1:

  5. Exercise 4 From Table 1:

  6. Exercise 5 μ=1200, σ=25, n=36 For CI 0f 95% The 95% CI for μ does not contain 1200, so the claim seems unreasonable

  7. Exercise 6

  8. Exercise 7 • Random sampling requires: • That all observations are sampled from the same distribution • That the sampled observations are independent, meaning that the probability of sampling a given observation does not alter the probability of sampling another. (Note: this is not the same as equal probability)

  9. Exercise 8 The sampling distribution is centered around population μ so it will be 9. The variance of the sampling distribution is given by In this case:

  10. Exercise 9 X: 1 2 3 4 P(x) 0.2 0.1 0.5 0.2 So

  11. Exercise 10 The expected value of the sample mean equals the population mean, so if you average 1000 sample means the grand average should approximately equal μ, in this case, 2.7.

  12. Exercise 11 Based on the same principle, the expected value of the sample variance equals to the population variance, so if you average 1000 sample variances should approximately equal , in this case, 1.01

  13. Exercise 12 a=c(2,6,10,1,15,22,11,29) n=8 var(a) [1] 94.28571 The variance of the sample mean is estimated by And standard error is estimated by

  14. Exercise 13 The estimate of μ in this case would be based on a single observation = 32. With a single observation, it is not possible to estimate the standard error because there is no variance in the sample. As the sample size increases, the variance of the sampling distribution decreases (squared standard error). Note that n is in the denominator of the standard error. Lower variance in the sampling distribution means smaller standard error, a less error in the sample estimates.

  15. Exercise 14 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) N=12 • var(b) • [1] 93663.52 • Squared SE=

  16. Exercise 15 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) > out(b) $out.val [1] 450 600 1000 Theseoutlierssubstantiallyinflatethe standard error, as theyinflatethevariance.

  17. Exercise 16 c=c(6,3,34,21,34,65,23,54,23) n=9 var(c) [1] 413.9444 The squared SE is:

  18. Exercise 17 No. An accurate estimate of the standard error requires independence among sampled observations.

  19. Exercise 18 The variance of the mixed normal is 10.9, so the squared standard error for a sample of 25 would be 10.9/25=0.436, compared to 1/25=0.04 This means that under small departures from normality, the standard error can inflate more than 10 fold. The inflation greatly increases error, and the length of CIs.

  20. Exercise 19 When sampling from a non-normal distribution, the sampling distribution of the mean no longer conforms to the probabilities that of the normal curve. In other words, the sampling distribution is no longer normal, so the se cannot be used accurately to determine probabilities and Cis.

  21. Exercise 20 μ=30, σ=2, n=16, so SE=2/4=0.5. Determine Z, and consult Table 1, or use R. For pnorm(29,30,2/sqrt(16)) [1] 0.02275013 For pnorm(30.5,30,2/sqrt(16)) [1] 0.8413447 1-0.841=0.159 For pnorm(31,30,2/sqrt(16)) [1] 0.9772499 0.9777-0.022=0.955

  22. Exercise 21 μ=30, σ=5, n=25, so SE=5/5=1. Determine Z, and consult Table 1, or use R. • pnorm(4,5,1) [1] 0.1586553 b. pnorm(7,5,1) [1] 0.9772499. 1-0.977=0.023 c. pnorm(3,5,1) [1] 0.02275013. 0.977-0.022=0.955.

  23. Exercise 22 μ=100000, σ=10000, n=16, so SE=10000/4=2500 From Table 1 P<0.0227 UsingR: pnorm(95000,100000,10000/sqrt(16)) [1] 0.02275013

  24. Exercise 23 μ=100000, σ=10000, n=16, so SE=10000/4=2500 Compute z scoresforeachvalue and consult Table 1. Or use R: pnorm(97500,100000,10000/sqrt(16)) [1] 0.1586553 pnorm(102500,100000,10000/sqrt(16)) [1] 0.8413447.

  25. Exercise 24 μ=750, σ=100, n=9, so SE=100/3=33.333 Compute z scoresforeachvalue and consult Table 1. Or use R. • > pnorm(700,750,100/sqrt(9)) • [1] 0.0668072 • > pnorm(800,750,100/sqrt(9)) • [1] 0.93319280. • 933-0.06=0.873

  26. Exercise 25 μ=36, σ=5, n=16, so SE=5/4 pnorm(37,36,5/4) [1] 0.7881446 pnorm(33,36,5/4) [1] 0.008197536. 1-0.008=0.992 Use table 1 For p<-1.6 pnorm(34,36,5/4) [1] 0.05479929 pnorm(37,36,5/4) [1] 0.7881446 > pnorm(34,36,5/4) [1] 0.05479929 0.788-0.054=0.734

  27. Exercise 26 μ=25, σ=3, n=25, so SE=3/5 • pnorm(24,25,3/5) [1] 0.04779035 • pnorm(26,25,3/5) [1] 0.9522096 c. 1-0.0477=0.9523 d. 0.95-0.047=0.903

  28. Exercise 27 Heavy tailed distributions generally yield long CI for the mean because their large variance inflates the SE. Central limit thorem does not remedy this problem.

  29. Exercise 28 Light tailed, symmetric distributions provide relatively accurate probability coverage for CI even with small sample sizes. Central limit theorem works relatively well in this case.

  30. Exercise 29 C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9df Or use R:qt(0.975,9): [1] 2.262157 a. b. c.

  31. Exercise 30 C is the 1-α/2 quantile of a T distribution with n-1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9df Or use R:qt(0.99,9): [1] 2.82 a. b. c.

  32. Exercise 31 x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611) The R function t.test(x) returns: One Sample t-test data: x t = 3.2848, df = 18, p-value = 0.004117 alternativehypothesis: truemean is notequal to 0 95 percentconfidenceinterval: 161.5030 734.7075 sampleestimatesmeanof x : 448.1053

  33. Exercise 32 y=c(5,12,23,24,18,9,18,11,36,15) The R function t.test(y) returns: One Sample t-test data: y t = 6.042, df = 9, p-value = 0.0001924 alternativehypothesis: truemean is notequal to 0 95 percentconfidenceinterval: 10.69766 23.50234 sampleestimates: meanof x 17.1

  34. Exercise 33 Heavy tailed distributions inflate the standard error in a manner that changes the cumulative probabilities of the T distribution. In this situation, the new T quantiles correspond to values that are different than T under normality. The inflation of the SE, due the larger frequency of extreme values in the tails, leads to very long CI that far exceed the nominal value of the state probability coverage under normality. For example, the intended 95% CI will yield a range that in reality covers over 99% of the distribution. When distributions are skewed, T becomes skewed, off centered (mean and median no longer 0 – due to the dependency that is now created between the mean and SD), with values that do not correspond to the quantiles in Table 4. This results in highly inaccurate probability coverage for CIs.

  35. Exercise 34 When the variance is estimated by the empirical sample in a light tailed skewed distribution, the t distribution markedly departs from the values to student t (becoming skewed and no longer centered around 0), so probability coverage is no longer accurate.

  36. Exercise 35 a. c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 df=24-24✕2✕0.2-1=15 qt(0.975,15) [1] 2.13145 b. Df=36-36✕2✕0.2-1=21 qt(0.975,21) [1] 2.079614 12-12✕2✕0.2-1=7 qt(0.975,7) [1] 2.364624

  37. Exercise 36 c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 a. qt(0.99,15) [1] 2.60248 b. qt(0.99,21) [1] 2.517648 c. qt(0.99,7) [1] 2.997952

  38. Exercise 37 x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376,428,515,666,1310,2611) The R function trimci(x) returns $ci [1] 160.3913 404.9933

  39. Exercise 38 With trimmed means the CI is 244.6 long With means it is 573.2, which is 2.34 times longer. The mean has a larger standard errors, resulting in larger CI.

  40. Exercise 39 m=c(56,106,174,207,219,237,313,365,458,497,515,529,557,615,625,645,973,1065,3215). For mean: t.test(m) 266.6441, 930.3033 For trimmed mean: trimci(m) 293.5976, 595.9409 Checking for outliers: out(m) $out.val [1] 3215 The CI for trimmed mean is far shorter than the CI for the mean because the outlier (3213) inflates the SE. In the case of the trimmed mean, it is trimmed. Other values in the data set may have a smiliar effect.

  41. Exercise 40 Under normality, the sample mean has the smallest standard error. So it isthe only candidate for being ideal. But as we have seen, other estimators havea smaller standard error than the mean in other situations, so an optimal estimator does notexist across board.

  42. Exercise 41 No, because what often appears to be normal is not normal. In addition, there are robust estimators that compare relatively well (although not as well) to the mean under normality but perform far better in situations that mildly depart from normality. In other word, under normality, the difference is small, under non-normality it can be very large.

  43. Exercise 42 c=c(250,220,281,247,230,209,240,160,370,274,210,204,243,251,190,200,130,150,177,475,221,350,224,163,272,236,200,171,98) CI for the mean: t.test(c) 95 percentconfidenceinterval: 200.7457 257.5991 CI for the trimmedmean: trimci(c) [1] 196.6734 244.9056

  44. Exercise 43 And outlier analysis reveals 4 outliers: out(c) $out.val [1] 370 475 350 98 These increase the length of the CI foe the mean. They are trimmed with the trimmed mean CI.

  45. Exercise 44 Even if the two measures are identical, outliers can largely inflate the CI based on means, rendering the outcome less informative.

  46. Exercise 45 In this case we have 16 successes in 16 trials. The R function: binomci(16,16, alpha=0.01) $ci [1] 0.7498942 1.0000000

  47. Exercise 46 • In this case we have 0 successes in 200000 trials. The R function: binomci(0,200000) $ci [1] 0.000000e+00 1.497855e-05

  48. Exercise 47 val=0 for(i in 1:5000) val[i]=median(rbinom(25,6,0.9)) splot(val) This is an example of how the sampling distribution of the median can largely depart from the expected bell curve dues to tied values. Each of the 5000 samples has many tied values because there are 25 trial in every sample and only 7 possible outcomes. Thus values are bound to repeat.

More Related