1 / 54

Assignment 4 answers

Assignment 4 answers. Purpose: The purpose of this assignment is to demonstrate understanding of hypothesis testing in general and to perform hypothesis tests of one mean and of one proportion.

nikita
Download Presentation

Assignment 4 answers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assignment 4 answers

  2. Purpose: The purpose of this assignment is to demonstrate understanding of hypothesis testing in general and to perform hypothesis tests of one mean and of one proportion.

  3. 1. Chapter 9, question 4, describe at least 3 similarities/differences as well as when you’d use the t distribution vs. the normal The standard normal and the t distribution are both symmetric. The standard normal and the t distribution both have mean 0. The standard normal and the t distribution are both bell shaped. The t distribution has heavier tails than the standard normal. As the df increases, the t distribution becomes more like the standard normal distribution. It would be important to use the t distribution when you do not know the standard deviation of the population being sampled.

  4. 2. Chapter 10, question 3 A p-value is the probability that you observe a statistic (e.g. mean or proportion) as extreme as you do, in the hypothetical situation that the null hypothesis is true. 3. Chapter 10, question 6 Type I errors can be made when a null hypothesis is rejected when it is in fact true. Type II errors can be made when a null hypothesis is not rejected when it is false. 4. Chapter 10, question 8 • Sample size • Significance level • Population variance • The difference between the null mean and the alternate mean

  5. 5 Chapter 10, question 10 a. H0 : µ≥7250 HA : µ<7250 b. Using ttesti in Stata: . ttesti 15 4767 3204 7250 One-sample t test One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 15 4767 827.2692 3204 2992.684 6541.316 ------------------------------------------------------------------------------ mean = mean(x) t = -3.0014 Ho: mean = 7250 degrees of freedom = 14 Ha: mean < 7250 Ha: mean != 7250 Ha: mean > 7250 Pr(T < t) = 0.0048 Pr(|T| > |t|) = 0.0095 Pr(T > t) = 0.9952 c. The p-value is 0.0048 which is less than 0.05 so I reject the null.

  6. 6. Hypothesis test of one mean a. Write the null and alternative hypothesis for a hypothesis test that the average hours of sleep in the population from which our sample was drawn is <6.75 hours (the alternative hypothesis). What are you setting as your significance level? H0 : µ≥6.75 HA : µ<6.75 I set =0.05.

  7. 6b. Use the summ command to get the mean and standard deviation, and using these, perform the hypothesis test. Calculate your test statistic and the p value for the test using the ttail command. Note that if your alternative hypothesis is Ha:<0 then you should be finding the P(T<tstat). . summsleep_hrs Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- sleep_hrs | 503 6.647217 1.016645 2 10 . . di (6.647217-6.75)/1.016645*sqrt(503) -2.2674408 . di 1-ttail(502,-2.2674408) .01189381 The p-value is 0.012.

  8. c. Run the ttesti command to check your work. . ttesti 503 6.647217 1.016645 6.75 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 503 6.647217 .04533 1.016645 6.558157 6.736277 ------------------------------------------------------------------------------ mean = mean(x) t = -2.2674 Ho: mean = 6.75 degrees of freedom = 502 Ha: mean < 6.75 Ha: mean != 6.75 Ha: mean > 6.75 Pr(T < t) = 0.0119 Pr(|T| > |t|) = 0.0238 Pr(T > t) = 0.9881 The leftmost p-value confirms my previous result.

  9. d. Because the data are already in Stata, you can also run the ttest command, rather than using the “immediate” function. Run ttestsleep_hrs==6.75 and compare the results to your previous results. . ttestsleep_hrs==6.75 One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- sleep_~s | 503 6.647217 .04533 1.016645 6.558157 6.736277 ------------------------------------------------------------------------------ mean = mean(sleep_hrs) t = -2.2674 Ho: mean = 6.75 degrees of freedom = 502 Ha: mean < 6.75 Ha: mean != 6.75 Ha: mean > 6.75 Pr(T < t) = 0.0119 Pr(|T| > |t|) = 0.0238 Pr(T > t) = 0.9881 The results are the same as before. e. Give the p-value and state your conclusion. Be sure to use correct terminology. The p-value is 0.0119, therefore we reject the null hypothesis and conclude that the mean hours of sleep is less than 6.75 hours.

  10. f. State the null and alternative hypothesis, the significance level, run the test, and state your conclusion if we were only worried about getting less than 6.5 hours of sleep. H0 : µ≥6.5 HA : µ<6.5 I set =0.05. . ttestsleep_hrs==6.5 One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- sleep_~s | 503 6.647217 .04533 1.016645 6.558157 6.736277 ------------------------------------------------------------------------------ mean = mean(sleep_hrs) t = 3.2477 Ho: mean = 6.5 degrees of freedom = 502 Ha: mean < 6.5 Ha: mean != 6.5 Ha: mean > 6.5 Pr(T < t) = 0.9994 Pr(|T| > |t|) = 0.0012 Pr(T > t) = 0.0006 The p-value is 0.999, therefore we fail to reject the null hypothesis that the mean hours of sleep in the population is at least 6.5 hours.

  11. 7.a Hypothesis test of one proportion H0 : p=0.50 HA : p≠0.50 I will set =0.05. b. Use a Stata command to run the hypothesis test using the normal approximation. . tab sex Biological | sex at | birth | Freq. Percent Cum. ------------+----------------------------------- Male | 218 41.84 41.84 Female | 303 58.16 100.00 ------------+----------------------------------- Total | 521 100.00 . prtest sex==.5 One-sample test of proportion sex: Number of obs = 521 ------------------------------------------------------------------------------ Variable | Mean Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- sex | .5815739 .0216119 .5392153 .6239324 ------------------------------------------------------------------------------ p = proportion(sex) z = 3.7239 Ho: p = 0.5 Ha: p < 0.5 Ha: p != 0.5 Ha: p > 0.5 Pr(Z < z) = 0.9999 Pr(|Z| > |z|) = 0.0002 Pr(Z > z) = 0.0001

  12. c. Use a Stata command to run the hypothesis test using the binomial distribution. . bitest sex==.5 Variable | N Observed k Expected k Assumed p Observed p -------------+------------------------------------------------------------ sex | 521 303 260.5 0.50000 0.58157 Pr(k >= 303) = 0.000113 (one-sided test) Pr(k <= 303) = 0.999920 (one-sided test) Pr(k <= 218 or k >= 303) = 0.000226 (two-sided test) OR . dibinomialtail(521,303,.5) .00011321

  13. d. Compare your results in b and c and explain differences or similarities and your overall conclusion. The p-value for the 2-sided test is <0.05 in both cases and we reject the null. They are similar because np is large. e. Construct an exact 95% confidence interval for the proportion female. Based on the 95% confidence interval, would you have rejected or failed to reject the null hypothesis above? Why or why not? . ci sex, binomial -- Binomial Exact -- Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- sex | 521 .5815739 .0216119 .5378925 .6243247 This 95% confidence interval does not include the hypothesized value, so I would reject the null.

  14. Lecture review 6

  15. Overview • Sample size calculations • Type I & Type II error considerations • Power • Comparison of two means • Dependent means (paired) • Independent means • Comparison of two proportions

  16. Types of error • Type I = incorrectly reject the null =  • Type II = incorrectly fail to reject the null =  • H0 is a statement about the population and is either true or false • Using a sample we to try to determine the answer • Type I error or Type II error depend on whether H0 is true or false • To minimize these errors • We set , the chance of a Type I error • Design our study to minimize the chance of a Type II error

  17. Chance of a type II error , chance of failing to reject the null if the alternative is true Fail to reject the null Reject the Null

  18. If the alternative is very different from the null, the chance of a Type II error is low , low chance of failing to reject the null if the alternative is true Fail to reject the null Reject the Null

  19. If the alternative is not very different from the null, the chance of a Type II error is high , high chance of failing to reject the null if the alternative is true Fail to reject the null Reject the Null

  20. Chance of a Type II error is lower if the SEM is smaller This is relevant because the SD for the distribution of a sample mean is σ/n So increasing n decreases the SD of the mean

  21. Finding , P(Type II error) • Find the critical value for your test • At what Xwill zstat be greater than 1.96 (or 1.645 for a one-sided test) ? • This depends on n, , and  • What is the probability of getting a sample mean less extreme than the critical value if the true mean is the alternate mean? This is .

  22. Power • The power of a statistical test is lower for alternative values that are closer to the null value (the chance of a Type II error is higher) and higher for more extreme alternative values. High β, hence low power (1- β). Lowβ, hence high power (1- β).

  23. Sample size calculations • With n fixed • You can calculate how big the alternative has to be to reject the null with 80% probability assuming the alternative is true • The difference between this alternative and the null is called the minimum detectable difference

  24. Comparison of two means Dependent

  25. Comparison of two means: the paired t-test • Paired samples, numerical variables • Two determinations on the same person (before and after) • Matched samples – measurement on pairs of persons similar in some characteristics, i.e. identical twins • Matching or pairing is performed to control for extraneous factors • Each person or pair has 2 data points, and we calculate the difference for each • Then we can use our one-sample methods to test hypotheses about the value of the difference

  26. Comparison of two means: paired t-test • Step 1: The hypotheses • Two sided • Generically H0: μ1-μ2 =δ HA: μ1-μ2 ≠δ • Often δ=0, no difference So H0: μ1-μ2 =0, i.e. H0: μ1=μ2 HA: μ1-μ2 ≠0, i.e. HA: μ1≠μ2 • One sided • Generically H0: μ1-μ2 ≥δ or H0: μ1-μ2 ≤δ HA: μ1-μ2 <δH0: μ1-μ2 <δ • Often δ=0, no difference So H0: μ1 ≥ μ2 or H0: μ1 ≤ μ2 HA: μ1 < μ2 HA: μ1 > μ2

  27. Comparison of two means: paired t-test • Step 2: Calculate the test statistic • If δ=0, the formula for tstat is

  28. Comparison of two means: paired t-test • Step 3: Reject or fail to reject the null • Is the p-value (the probability of observing a difference as large or larger, under the null hypothesis) greater than or less than the significance level, ?

  29. Example • We think participants are reporting different amounts of alcohol use, measured by the AUDIT-C, in study 2 (vs. study 1). The null hypothesis is that they are reporting the same amount. H0: μ2-μ1 =0 μ2=μ1HA: μ1-μ2 0  μ2  μ1 • Significance level=0.05

  30. . summ auditc_diff Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------------------- auditc_diff | 28 .5357143 .8811669 0 3 *** calculate the t statistic . di 0.5357/0.8812*sqrt(28) 3.2168157 *** calculate the p-value . di 2*ttail(27,3.2168) .00335519  So we reject the null

  31. Using the ttest command . ttest auditc_diff==0 One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- auditc~f | 28 .5357143 .1665249 .8811669 .1940334 .8773951 ------------------------------------------------------------------------------ mean = mean(auditc_diff) t = 3.2170 Ho: mean = 0 degrees of freedom = 27 Ha: mean < 0 Ha: mean != 0 Ha: mean > 0 Pr(T < t) = 0.9983 Pr(|T| > |t|) = 0.0034 Pr(T > t) = 0.0017 Note that mean>0 here is mean difference

  32. Another way without calculating the difference The command is ttest var1==var2 . ttest auditc_s2==auditc_s1 Paired t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- auditc~2 | 28 1 .3170632 1.677741 .34944 1.65056 auditc~1 | 28 .4642857 .2438782 1.290482 -.036111 .9646824 ---------+-------------------------------------------------------------------- diff | 28 .5357143 .1665249 .8811669 .1940334 .8773951 ------------------------------------------------------------------------------ mean(diff) = mean(auditc_s2 - auditc_s1) t = 3.2170 Ho: mean(diff) = 0 degrees of freedom = 27 Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0 Pr(T < t) = 0.9983 Pr(|T| > |t|) = 0.0034 Pr(T > t) = 0.0017 .

  33. Comparison of two means Independent samples

  34. Comparison of two means: t-test • The goal is to compare means from two independent samples • Two different populations • E.g. vaccine versus placebo group • E.g. women with adequate versus in adequate micronutrient levels

  35. Comparison of two means: t-test Step 1: State the hypothesis • Two sided hypothesis H0: μ1=μ2 HA: μ1≠μ2 • One sided hypothesis H0: μ1≥μ2 HA: μ1<μ2 • One sided hypothesis H0: μ1≤μ2 HA: μ1>μ2

  36. Comparison of two means: t-test when σis unknown Step 2: calculate the T-test statistic • T-test test statistic • The formula for the pooled SD is a weighted average of the individual sample SDs • The degrees of freedom for the test are n1+n2-2

  37. Comparison of two means: t-test • Step 3: • As in our other hypothesis tests, compare the t statistic to the t-distribution to determine the probability of obtaining a mean difference as large or larger as the observed difference • Step 4: • Reject the null if the probability, the p-value, is less than , the significance level • Fail to reject the null if p≥ 

  38. Comparison of two means: Example • Study of non-pneumatic anti-shock garment (Miller et al) • Two groups – pre-intervention received usual treatment, intervention group received NASG • Comparison of hemorrhaging in the two groups • Null hypothesis: The hemorrhaging is the same in the two groups H0: μ1=μ2 HA: μ1≠μ2 • The data: • External blood loss after entry: • Pre-intervention group (n=83) mean blood loss =340.4 SD=248.2 • Intervention group (n=83) mean blood loss =73.5 SD=93.9

  39. Calculating by hand • External blood loss: • Pre-intervention group (n=83) mean=340.4 SD=248.2 • Intervention group (n=83) mean=73.5 SD=93.9 • First calculate sp2 sp2 = (82*248.22 + 82*93.92)/(83+83-2) = 35210.2 tstat = (340.4-73.5)/sqrt(35210.2*(2/83)) = 9.16 df =83+83-2=164 . di 2*ttail(164,9.16) 2.041e-16

  40. Comparison of two means, example *ttesti n1 mean1 sd1 n2 mean2 sd2 ttesti 83 340.4 248.2 83 73.5 93.9 Two-sample t test with equal variances ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 83 340.4 27.24349 248.2 286.204 394.596 y | 83 73.5 10.30686 93.9 52.99636 94.00364 ---------+-------------------------------------------------------------------- combined | 166 206.95 17.85377 230.0297 171.6987 242.2013 ---------+-------------------------------------------------------------------- diff | 266.9 29.12798 209.3858 324.4142 ------------------------------------------------------------------------------ diff = mean(x) - mean(y) t = 9.1630 Ho: diff = 0 degrees of freedom = 164 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

  41. In stata • Remember that for a one-sample t-test, use .ttesti n mean sdhypothesizedmean • When testing the equality of 2 means, use ttesti n1 mean1 sd1 n2 mean2 sd2

  42. For a 95% confidence interval for the difference between the 2 means • If the confidence interval for the difference does not include 0, then you can reject the null hypothesis of no difference

  43. Comparison of two means: t-test • This t-test assumes equal variances in the two underlying populations • With unequal variances the T-test statistic being

  44. Comparison of two means, example ttesti 83 340.4 248.2 83 73.5 93.9, unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 83 340.4 27.24349 248.2 286.204 394.596 y | 83 73.5 10.30686 93.9 52.99636 94.00364 ---------+-------------------------------------------------------------------- combined | 166 206.95 17.85377 230.0297 171.6987 242.2013 ---------+-------------------------------------------------------------------- diff | 266.9 29.12798 209.1446 324.6554 ------------------------------------------------------------------------------ diff = mean(x) - mean(y) t = 9.1630 Ho: diff = 0 Satterthwaite's degrees of freedom = 105.002 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

  45. Summary: T-test of the means of independent samples in STATA • With the different groups in different columns, use ttest var1==var2, unpaired or ttest var1==var2, unpaired unequal • Data all in one variable, and the grouping in another variable. use ttest var, by(groupvar) or ttest var, by(groupvar) unequal

  46. Confidence interval for the difference of two means from independent samples, when unequal variances are assumed

  47. Comparison of two proportions

  48. Comparison of two proportions • Similar to comparing two means Step 1: State the hypothesis • Null hypothesis about two proportions, p1 and p2, H0: p1= p2 HA: p1≠ p2 • If n1 and n2 are sufficiently large, the difference between the two proportions follows a normal distribution.

  49. Comparison of two proportions Step 2: Calculate the z statistic • Where to find the probability of observing a difference as large as we do, under the null hypothesis of no difference Step 3; Step 4:

  50. Comparison of two proportions • Step 3: • determine the probability of obtaining a difference in the two proportions as large or larger as the observed difference • Step 4: • Reject the null if the probability, the p-value, is less than , the significance level • Fail to reject the null if p≥ 

More Related