1 / 54

# Assignment 4 answers - PowerPoint PPT Presentation

Assignment 4 answers. Purpose: The purpose of this assignment is to demonstrate understanding of hypothesis testing in general and to perform hypothesis tests of one mean and of one proportion.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Purpose: The purpose of this assignment is to demonstrate understanding of hypothesis testing in general and to perform hypothesis tests of one mean and of one proportion.

1. Chapter 9, question 4, describe at least 3 similarities/differences as well as when you’d use the t distribution vs. the normal

The standard normal and the t distribution are both symmetric.

The standard normal and the t distribution both have mean 0.

The standard normal and the t distribution are both bell shaped.

The t distribution has heavier tails than the standard normal.

As the df increases, the t distribution becomes more like the standard normal distribution.

It would be important to use the t distribution when you do not know the standard deviation of the population being sampled.

2. Chapter 10, question 3

A p-value is the probability that you observe a statistic (e.g. mean or proportion) as extreme as you do, in the hypothetical situation that the null hypothesis is true.

3. Chapter 10, question 6

Type I errors can be made when a null hypothesis is rejected when it is in fact true. Type II errors can be made when a null hypothesis is not rejected when it is false.

4. Chapter 10, question 8

• Sample size
• Significance level
• Population variance
• The difference between the null mean and the alternate mean

5 Chapter 10, question 10

a. H0 : µ≥7250

HA : µ<7250

b. Using ttesti in Stata:

. ttesti 15 4767 3204 7250

One-sample t test

One-sample t test

------------------------------------------------------------------------------

| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

x | 15 4767 827.2692 3204 2992.684 6541.316

------------------------------------------------------------------------------

mean = mean(x) t = -3.0014

Ho: mean = 7250 degrees of freedom = 14

Ha: mean < 7250 Ha: mean != 7250 Ha: mean > 7250

Pr(T < t) = 0.0048 Pr(|T| > |t|) = 0.0095 Pr(T > t) = 0.9952

c. The p-value is 0.0048 which is less than 0.05 so I reject the null.

6. Hypothesis test of one mean

a. Write the null and alternative hypothesis for a hypothesis test that the average hours of sleep in the population from which our sample was drawn is <6.75 hours (the alternative hypothesis). What are you setting as your significance level?

H0 : µ≥6.75

HA : µ<6.75

I set =0.05.

6b. Use the summ command to get the mean and standard deviation, and using these, perform the hypothesis test. Calculate your test statistic and the p value for the test using the ttail command. Note that if your alternative hypothesis is Ha:<0 then you should be finding the P(T<tstat).

. summsleep_hrs

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

sleep_hrs | 503 6.647217 1.016645 2 10

. . di (6.647217-6.75)/1.016645*sqrt(503)

-2.2674408

. di 1-ttail(502,-2.2674408)

.01189381

The p-value is 0.012.

c. Run the ttesti command to check your work.

. ttesti 503 6.647217 1.016645 6.75

One-sample t test

------------------------------------------------------------------------------

| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

x | 503 6.647217 .04533 1.016645 6.558157 6.736277

------------------------------------------------------------------------------

mean = mean(x) t = -2.2674

Ho: mean = 6.75 degrees of freedom = 502

Ha: mean < 6.75 Ha: mean != 6.75 Ha: mean > 6.75

Pr(T < t) = 0.0119 Pr(|T| > |t|) = 0.0238 Pr(T > t) = 0.9881

The leftmost p-value confirms my previous result.

d. Because the data are already in Stata, you can also run the ttest command, rather than using the “immediate” function. Run ttestsleep_hrs==6.75 and compare the results to your previous results.

. ttestsleep_hrs==6.75

One-sample t test

------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

sleep_~s | 503 6.647217 .04533 1.016645 6.558157 6.736277

------------------------------------------------------------------------------

mean = mean(sleep_hrs) t = -2.2674

Ho: mean = 6.75 degrees of freedom = 502

Ha: mean < 6.75 Ha: mean != 6.75 Ha: mean > 6.75

Pr(T < t) = 0.0119 Pr(|T| > |t|) = 0.0238 Pr(T > t) = 0.9881

The results are the same as before.

e. Give the p-value and state your conclusion. Be sure to use correct terminology.

The p-value is 0.0119, therefore we reject the null hypothesis and conclude that the mean hours of sleep is less than 6.75 hours.

f. State the null and alternative hypothesis, the significance level, run the test, and state your conclusion if we were only worried about getting less than 6.5 hours of sleep.

H0 : µ≥6.5

HA : µ<6.5

I set =0.05.

. ttestsleep_hrs==6.5

One-sample t test

------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

sleep_~s | 503 6.647217 .04533 1.016645 6.558157 6.736277

------------------------------------------------------------------------------

mean = mean(sleep_hrs) t = 3.2477

Ho: mean = 6.5 degrees of freedom = 502

Ha: mean < 6.5 Ha: mean != 6.5 Ha: mean > 6.5

Pr(T < t) = 0.9994 Pr(|T| > |t|) = 0.0012 Pr(T > t) = 0.0006

The p-value is 0.999, therefore we fail to reject the null hypothesis that the mean hours of sleep in the population is at least 6.5 hours.

7.a Hypothesis test of one proportion

H0 : p=0.50

HA : p≠0.50

I will set =0.05.

b. Use a Stata command to run the hypothesis test using the normal approximation.

. tab sex

Biological |

sex at |

birth | Freq. Percent Cum.

------------+-----------------------------------

Male | 218 41.84 41.84

Female | 303 58.16 100.00

------------+-----------------------------------

Total | 521 100.00

. prtest sex==.5

One-sample test of proportion sex: Number of obs = 521

------------------------------------------------------------------------------

Variable | Mean Std. Err. [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex | .5815739 .0216119 .5392153 .6239324

------------------------------------------------------------------------------

p = proportion(sex) z = 3.7239

Ho: p = 0.5

Ha: p < 0.5 Ha: p != 0.5 Ha: p > 0.5

Pr(Z < z) = 0.9999 Pr(|Z| > |z|) = 0.0002 Pr(Z > z) = 0.0001

c. Use a Stata command to run the hypothesis test using the binomial distribution.

. bitest sex==.5

Variable | N Observed k Expected k Assumed p Observed p

-------------+------------------------------------------------------------

sex | 521 303 260.5 0.50000 0.58157

Pr(k >= 303) = 0.000113 (one-sided test)

Pr(k <= 303) = 0.999920 (one-sided test)

Pr(k <= 218 or k >= 303) = 0.000226 (two-sided test)

OR

. dibinomialtail(521,303,.5)

.00011321

d. Compare your results in b and c and explain differences or similarities and your overall conclusion.

The p-value for the 2-sided test is <0.05 in both cases and we reject the null. They are similar because np is large.

e. Construct an exact 95% confidence interval for the proportion female. Based on the 95% confidence interval, would you have rejected or failed to reject the null hypothesis above? Why or why not?

. ci sex, binomial

-- Binomial Exact --

Variable | Obs Mean Std. Err. [95% Conf. Interval]

-------------+---------------------------------------------------------------

sex | 521 .5815739 .0216119 .5378925 .6243247

This 95% confidence interval does not include the hypothesized value, so I would reject the null.

### Lecture review 6

Overview
• Sample size calculations
• Type I & Type II error considerations
• Power
• Comparison of two means
• Dependent means (paired)
• Independent means
• Comparison of two proportions
Types of error
• Type I = incorrectly reject the null = 
• Type II = incorrectly fail to reject the null = 
• H0 is a statement about the population and is either true or false
• Using a sample we to try to determine the answer
• Type I error or Type II error depend on whether H0 is true or false
• To minimize these errors
• We set , the chance of a Type I error
• Design our study to minimize the chance of a Type II error
Chance of a type II error

, chance of failing to reject the null if the alternative is true

Fail to reject the null

Reject the Null

If the alternative is very different from the null, the chance of a Type II error is low

, low chance of failing to reject the null if the alternative is true

Fail to reject the null

Reject the Null

If the alternative is not very different from the null, the chance of a Type II error is high

, high chance of failing to reject the null if the alternative is true

Fail to reject the null

Reject the Null

Chance of a Type II error is lower if the SEM is smaller

This is relevant because the SD for the distribution of a sample mean is σ/n

So increasing n decreases the SD of the mean

Finding , P(Type II error)
• Find the critical value for your test
• At what Xwill zstat be greater than 1.96 (or 1.645 for a one-sided test) ?
• This depends on n, , and 
• What is the probability of getting a sample mean less extreme than the critical value if the true mean is the alternate mean? This is .
Power
• The power of a statistical test is lower for alternative values that are closer to the null value (the chance of a Type II error is higher) and higher for more extreme alternative values.

High β, hence low power (1- β).

Lowβ, hence high power (1- β).

Sample size calculations

• With n fixed
• You can calculate how big the alternative has to be to reject the null with 80% probability assuming the alternative is true
• The difference between this alternative and the null is called the minimum detectable difference

### Comparison of two means

Dependent

Comparison of two means: the paired t-test
• Paired samples, numerical variables
• Two determinations on the same person (before and after)
• Matched samples – measurement on pairs of persons similar in some characteristics, i.e. identical twins
• Matching or pairing is performed to control for extraneous factors
• Each person or pair has 2 data points, and we calculate the difference for each
• Then we can use our one-sample methods to test hypotheses about the value of the difference
Comparison of two means: paired t-test
• Step 1: The hypotheses
• Two sided
• Generically H0: μ1-μ2 =δ

HA: μ1-μ2 ≠δ

• Often δ=0, no difference

So H0: μ1-μ2 =0, i.e. H0: μ1=μ2

HA: μ1-μ2 ≠0, i.e. HA: μ1≠μ2

• One sided
• Generically H0: μ1-μ2 ≥δ or H0: μ1-μ2 ≤δ

HA: μ1-μ2 <δH0: μ1-μ2 <δ

• Often δ=0, no difference

So H0: μ1 ≥ μ2 or H0: μ1 ≤ μ2

HA: μ1 < μ2 HA: μ1 > μ2

Comparison of two means: paired t-test
• Step 2: Calculate the test statistic
• If δ=0, the formula for tstat is
Comparison of two means: paired t-test
• Step 3: Reject or fail to reject the null
• Is the p-value (the probability of observing a difference as large or larger, under the null hypothesis) greater than or less than the significance level, ?
Example
• We think participants are reporting different amounts of alcohol use, measured by the AUDIT-C, in study 2 (vs. study 1). The null hypothesis is that they are reporting the same amount.

H0: μ2-μ1 =0 μ2=μ1HA: μ1-μ2 0  μ2  μ1

• Significance level=0.05

. summ auditc_diff

Variable | Obs Mean Std. Dev. Min Max

-------------+-------------------------------------------------

auditc_diff | 28 .5357143 .8811669 0 3

*** calculate the t statistic

. di 0.5357/0.8812*sqrt(28)

3.2168157

*** calculate the p-value

. di 2*ttail(27,3.2168)

.00335519

 So we reject the null

Using the ttest command

. ttest auditc_diff==0

One-sample t test

------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

auditc~f | 28 .5357143 .1665249 .8811669 .1940334 .8773951

------------------------------------------------------------------------------

mean = mean(auditc_diff) t = 3.2170

Ho: mean = 0 degrees of freedom = 27

Ha: mean < 0 Ha: mean != 0 Ha: mean > 0

Pr(T < t) = 0.9983 Pr(|T| > |t|) = 0.0034 Pr(T > t) = 0.0017

Note that mean>0 here is mean difference

Another way without calculating the difference

The command is

ttest var1==var2

. ttest auditc_s2==auditc_s1

Paired t test

------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

auditc~2 | 28 1 .3170632 1.677741 .34944 1.65056

auditc~1 | 28 .4642857 .2438782 1.290482 -.036111 .9646824

---------+--------------------------------------------------------------------

diff | 28 .5357143 .1665249 .8811669 .1940334 .8773951

------------------------------------------------------------------------------

mean(diff) = mean(auditc_s2 - auditc_s1) t = 3.2170

Ho: mean(diff) = 0 degrees of freedom = 27

Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0

Pr(T < t) = 0.9983 Pr(|T| > |t|) = 0.0034 Pr(T > t) = 0.0017

.

### Comparison of two means

Independent samples

Comparison of two means: t-test
• The goal is to compare means from two independent samples
• Two different populations
• E.g. vaccine versus placebo group
Comparison of two means: t-test

Step 1: State the hypothesis

• Two sided hypothesis

H0: μ1=μ2

HA: μ1≠μ2

• One sided hypothesis

H0: μ1≥μ2

HA: μ1<μ2

• One sided hypothesis

H0: μ1≤μ2

HA: μ1>μ2

Comparison of two means: t-test when σis unknown

Step 2: calculate the T-test statistic

• T-test test statistic
• The formula for the pooled SD is a weighted average of the individual sample SDs
• The degrees of freedom for the test are n1+n2-2
Comparison of two means: t-test
• Step 3:
• As in our other hypothesis tests, compare the t statistic to the t-distribution to determine the probability of obtaining a mean difference as large or larger as the observed difference
• Step 4:
• Reject the null if the probability, the p-value, is less than , the significance level
• Fail to reject the null if p≥ 
Comparison of two means: Example
• Study of non-pneumatic anti-shock garment (Miller et al)
• Two groups – pre-intervention received usual treatment, intervention group received NASG
• Comparison of hemorrhaging in the two groups
• Null hypothesis: The hemorrhaging is the same in the two groups H0: μ1=μ2

HA: μ1≠μ2

• The data:
• External blood loss after entry:
• Pre-intervention group (n=83) mean blood loss =340.4 SD=248.2
• Intervention group (n=83) mean blood loss =73.5 SD=93.9
Calculating by hand
• External blood loss:
• Pre-intervention group (n=83) mean=340.4 SD=248.2
• Intervention group (n=83) mean=73.5 SD=93.9
• First calculate sp2

sp2 = (82*248.22 + 82*93.92)/(83+83-2)

= 35210.2

tstat = (340.4-73.5)/sqrt(35210.2*(2/83))

= 9.16

df =83+83-2=164

. di 2*ttail(164,9.16)

2.041e-16

Comparison of two means, example

*ttesti n1 mean1 sd1 n2 mean2 sd2

ttesti 83 340.4 248.2 83 73.5 93.9

Two-sample t test with equal variances

------------------------------------------------------------------------------

| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

x | 83 340.4 27.24349 248.2 286.204 394.596

y | 83 73.5 10.30686 93.9 52.99636 94.00364

---------+--------------------------------------------------------------------

combined | 166 206.95 17.85377 230.0297 171.6987 242.2013

---------+--------------------------------------------------------------------

diff | 266.9 29.12798 209.3858 324.4142

------------------------------------------------------------------------------

diff = mean(x) - mean(y) t = 9.1630

Ho: diff = 0 degrees of freedom = 164

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

In stata
• Remember that for a one-sample t-test, use

.ttesti n mean sdhypothesizedmean

• When testing the equality of 2 means, use

ttesti n1 mean1 sd1 n2 mean2 sd2

• If the confidence interval for the difference does not include 0, then you can reject the null hypothesis of no difference
Comparison of two means: t-test
• This t-test assumes equal variances in the two underlying populations
• With unequal variances the T-test statistic being
Comparison of two means, example

ttesti 83 340.4 248.2 83 73.5 93.9, unequal

Two-sample t test with unequal variances

------------------------------------------------------------------------------

| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

x | 83 340.4 27.24349 248.2 286.204 394.596

y | 83 73.5 10.30686 93.9 52.99636 94.00364

---------+--------------------------------------------------------------------

combined | 166 206.95 17.85377 230.0297 171.6987 242.2013

---------+--------------------------------------------------------------------

diff | 266.9 29.12798 209.1446 324.6554

------------------------------------------------------------------------------

diff = mean(x) - mean(y) t = 9.1630

Ho: diff = 0 Satterthwaite's degrees of freedom = 105.002

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

Summary: T-test of the means of independent samples in STATA
• With the different groups in different columns, use

ttest var1==var2, unpaired

or ttest var1==var2, unpaired unequal

• Data all in one variable, and the grouping in another variable. use

ttest var, by(groupvar)

or ttest var, by(groupvar) unequal

Confidence interval for the difference of two means from independent samples, when unequal variances are assumed

### Comparison of two proportions

Comparison of two proportions
• Similar to comparing two means

Step 1: State the hypothesis

• Null hypothesis about two proportions, p1 and p2, H0: p1= p2

HA: p1≠ p2

• If n1 and n2 are sufficiently large, the difference between the two proportions follows a normal distribution.
Comparison of two proportions

Step 2: Calculate the z statistic

• Where

to find the probability of observing a difference as large as we do, under the null hypothesis of no difference

Step 3;

Step 4:

Comparison of two proportions
• Step 3:
• determine the probability of obtaining a difference in the two proportions as large or larger as the observed difference
• Step 4:
• Reject the null if the probability, the p-value, is less than , the significance level
• Fail to reject the null if p≥ 
Comparison of two proportions
• Example: Having a cold in the class data set

Males:

N=214

74 (34.6%) reported having 1 or more colds

Females:

N=291

116 (39.9%) reported having 1 or more colds

Comparison of two proportions
• Null hypothesis: The rate of having a cold in males and females is the same

H0: p1= p2 (so p1 – p2 = 0)

• Z statistic is calculated:

p̂ = (74+116)/(214+291) = 0.376

zstat = (.346-.399)/sqrt( .376*(1-.376)*(1/214+1/291))

=--1.215

. di 2*normal(-1.215)

.22436609

Comparison of two proportions

prtesti n1 p1 n2 p2

. . prtesti 214 .346 291 .399

Two-sample test of proportion x: Number of obs = 214

y: Number of obs = 291

------------------------------------------------------------------------------

Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .346 .0325177 .2822664 .4097336

y | .399 .0287063 .3427367 .4552633

-------------+----------------------------------------------------------------

diff | -.053 .0433757 -.1380149 .0320149

| under Ho: .0436317 -1.21 0.224

------------------------------------------------------------------------------

diff = prop(x) - prop(y) z = -1.2147

Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.1122 Pr(|Z| < |z|) = 0.2245 Pr(Z > z) = 0.8878

Comparison of two proportions

. prtest coldany, by(sex)

Two-sample test of proportion Male: Number of obs = 214

Female: Number of obs = 291

------------------------------------------------------------------------------

Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

Male | .3457944 .0325132 .2820698 .409519

Female | .3986254 .0287018 .342371 .4548798

-------------+----------------------------------------------------------------

diff | -.052831 .0433693 -.1378333 .0321712

| under Ho: .0436248 -1.21 0.226

------------------------------------------------------------------------------

diff = prop(Male) - prop(Female) z = -1.2110

Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(Z < z) = 0.1129 Pr(|Z| < |z|) = 0.2259 Pr(Z > z) = 0.8871

.