Introduction to Hypothesis Testing: Understanding Statistical Significance

Testing a Claim Chapter 11

1.1 Intro • “I can make 80% of my BBall free throws”. To test my claim, you ask me to shoot 20 free throws. I only make 8  and you say “aha! Someone who makes 80% of their free throws would NEVER only make 8/20!” But I say, ‘hey, what if I’m having a bad day, or am injured, or the ball is dead, or the hoop is bent…” Since all of these are possibilities, you say “OK, well I’ll decide how likely your claim is based on the probability that someone who genuinely makes 80% would shoot 8/20 on one trial run” • (In actuality, the probability that someone who makes 80% free throws only hits 8/20 is .0001 or 1/10,000. This is enough to convince you I’m lying!) • This is the basics of hypothesis testing: an outcome that would rarely happen if a claim were true is good evidence that the claim is not true.

Stating a hypothesis • A statistical test starts with a careful statement of the claims we want to compare. Because the reasoning of tests look for evidence against a claim, we start with the claim we seek evidence AGAINST. • This is called our NULL HYPOTHESIS. • The claim about the population we are trying to find evidence FOR is our ALTERNATIVE hypothesis

Example (one-sided) • Attitudes towards school and study habits on a national survey range from 0 to 200. The mean score for US college students is 115 with a standard deviation of 30. Assume normality in this population. A teacher suspects that older students have better attitudes towards school. She gives the survey to a SRS of 25 students. • We seek evidence AGAINST the claim that Mu = 115. • Null: H0: μ= 115 • Alternate: Ha: μ> 115 *Be sure to state the hypothesis in terms of population b/c we are making inference/claims about our pop!

One vs. Two sided hypoth • If the previous example said the teacher thought that seniors had a DIFFERENT attitude towards study habits (but didn’t specify better or worse), we would be doing a two sided hypothesis because the alternate is that μcould be > or < 115. So we state it as: • Ha: μ≠ 115 • **The alternative hypothesis should express the hopes or suspicions we have before we see the data. It is cheating to first look at the data and then frame Ha to fit what the data show!

Conditions for Significance tests • The same 3 conditions from Chapter 10 should be verified before performing a significance test about a population mean or proportion. • 1. SRS • 2. Normality: • For means (data)- population distribution Normal, or large sample size (n>30) or smaller sample with normal histogram/boxplot/probability plot • For proportions- np > 10 and n(1-p) > 10 • 3. Independence (10)(n)< population

Test Statistic • The significance test compares the value of the parameter (true pop mean, as stated in the null) with the calculated sample mean. Values of the sample far from the true parameter give evidence against H0 • To assess how far the sample statistic is from the population parameter, we have to standardize it (to make comparison) • Test statistic = (sample value = mu)/standard deviation of sampling distribution • this is either sigma/square root n or Sx / square root n, depending on if you know sigma or not

Z test • We will focus on the Z test first, which is when we know sigma, so the Z test statistic formula is: • In that last example, lets say the mean (X bar) of the 25 seniors sampled was 125. Our calculated Z would be (125 – 115)/ (30/√25) = 1.67

Where does that fall?

P-Value • The p value is the probability of getting your observed statistic, assuming the null is true. The smaller your p value, the less likely it is, and the more confident you are in rejecting your claim (the null). • The p value is the area under the curve from your calculated test statistic, to the tail end. • In the previous example, that’s a little less than .1 or 10%.

Statistical Significance • We typically compare the P-value with a fixed P value to make our decision whether or not we the probability (or P-value) is small enough to reject our claim. We set a P value before calculating our observed test statistic and we call this our significance level. • Most commonly we choose .05 as our cutoff point (meaning we need our calculated p value to be less than .05, indicating there is a less than 5% chance of obtaining our result had our original claim been true. • αis the symbol for our chosen significance level we need to beat. So we would say α=.05

Determining significance • If our calculated (or observed) p-value is less than or equal to our alpha level, we say that the data is “statistically significant at level α” • “significant” doesn’t mean important! It just means you are rejecting your null hypothesis • In the previous example, our P value was .1 which is bigger than α=.05, so we would FAIL to reject our null (meaning our sample didn’t provide enough evidence to reject the claim that seniors have the same attitudes as other students) • (or accept the claim that seniors differ significantly)

Typical alpha levels • Just like confidence intervals, our most typical are .1, .05, .01 • It is possible to have results that are significant at the .05 level, but not the .01 level (example: calculated p value = .04) • If significant at .01, significant at .05 and .1 too (example: p = .003)

Final Step: Interpreting results in context • We make our official decision to reject H0 or fail to reject H0 based on whether we “beat” our chosen alpha level (remember to ‘beat’ it means our calculated p-level was smaller) • *warning! In real life, always set alpha level BEFORE analyzing your data!

11.2 Carrying out significance tests • The process is very similar to constructing confidence intervals. Follow the 4 steps on the following slide (sometimes referred to as our “inference toolbox)

Step 1. Hypothesis: Identify the population of interest and the parameter you want to draw conclusions about (usually the mean, mu). State hypothesis (null and alternative, with appropriate symbols) • Step 2: Conditions: Choose the appropriate inference procedure (in this case/chapter, Z test). Verify the conditions for using it (these are the same 3 conditions as before: SRS, Normality, Independence

Step 3: Calculations: If conditions are met, carry out the inference procedure. • Calculate the test statistic • Find the P-value • Step 4: Interpretation: Interpret your results in the context of the problem • Interpret the P-value or make a decision about rejecting H0 using statistical significance • *3C’s Conclusion, Connection and Context!

Example Executives’ blood pressure P. 706 • Director of a company concerned about effects of stress on employees. According to national center for health, the mean blood pressure for males 35 to 44 years of age is 128 and the standard deviation in this population is 15. The director examines the medical records of 72 male executives in this age group and finds that their mean blood pressure is 129.93. Is this evidence that the company’s male execs is different from the national average?

What do we know? • Population: Males 35-44 μ= 128 σ = 15 • Sample: n = 72 = 129.93 • 1. Hypothesis (words and symbols!) • H0: μ= 128 male executives at this company have a mean blood pressure of 128 • Ha: μ≠ 128 male executives at this company have a mean blood pressure that differs from the national mean of 128

Step 2: Conditions • SRS: not told in sample, so we must assume it was an SRS to proceed • Normality: We do not know that the population distribution of blood pressure among male execs is Normally distributed, but the large sample size (n = 72) guarantees that the sampling distribution of the means will be approximately normal by the CLT. • Independence: We must assume that there at least 10x72 =720 middle aged male execs in this large company (b/c this is the population we are making inferences about…not ALL male execs in the world!)

Step 3: Calculations • Test Statistic: we know sigma, so we do a Z test. = (129.93 – 128)/(15/√72) = 1.09 • P-Value: draw a picture NormalCDF(1.09, 1000) =.1379 (this is area in one tail) multiply by 2, p = .2757

Step 4: Interpretation • More than 27% of the time, an SRS of size 72 from the general male population would have a mean blood pressure at least as far from 128 as that of our sample. The observe mean 129.93 is therefore not good evidence that middle aged male execs at this company have blood pressure that differs from the national average. • If we had originally stated we wanted an alpha level of .05, we would say “fail to reject the null at .05 alpha level. Results non-significant”

Confidence intervals and testing for significance • If you wanted to test for significance using a confidence interval, you would construct your interval (as before) and check to see if μfalls in our interval. If it does NOT, then we reject the null and say we have found significant results. • Remember to construct your interval based on the alpha you set. So if you want α = .01, you have to construct a 99% CI etc.

Example • lets do the previous one, but using a CI and a 90% α level. • Construct interval around =129.93 using our σ = 15 129.93 +/- (1.645)( 15/√72) = (127.02, 132.84) Our true population mean μ= 128 does fall in this interval, so we FAIL TO REJECT the null. Results non-significant.

11.3 Uses and abuses of significance tests • Cautions: • Statistical significance is not the same as practical importance. • A few outliers can produce highly significant results (or destroy the significance of otherwise-convincing data). • NOT ROBUST • Beware of multiple analysis (running multiple alpha levels to attain significance etc).

11.4 Using inference to make decisions • Type I vs. Type II error: when we make a decision based on a significance test (reject vs. fail to reject), we hope our decision is correct, but it may in fact be wrong (we really have no way of knowing- if we did, we wouldn’t have done the test in the first place). • Sometimes we get a rare freak sample and reject the null when it’s actually true, or we might fail to reject when it’s truly a false claim.

Example • What if, after doing the study with the male execs and blood pressure, you find out that the 72 people in your sample just came back from a week long spa vacation and were extra relaxed. Had you tested them during a typical work week, their stress levels and blood pressure would have been MUCH higher. • So the fact that we failed to reject the null, when in reality we should have rejected it (had our data been accurate), is a type II error

Which is more serious? Type I or Type II? • Depends on the study. If we’re talking about a drug for example, failing to reject the null might have disastrous consequences!

Error probabilities • While we never can know if we are making a type I or type II error, we can calculate the probability of making an error. • Probability of making a type I error is just alpha! So if your alpha is .05, the probability of rejecting the null when it is in fact true is 5%

Power and Type II error • Power is the probability of correctly rejecting the null. We want power to be high (meaning when we do a test or experiment, our goal is to reject the null and find something interesting, but in the grand scheme that is meaningless if when you reject the null your probability of it being an erroneous rejection is high). • Since making a type II error is the probability of failing to reject the null when you should have rejected, power is the opposite (or converse) probability. • 1 – probability of a type II error.

Continued • Beta βis our symbol for a type II error • power = 1 – β • High power is desirable. Along with 95% CI’s and .05 significance tests, 80% power is desirable. • Many US govt agencies that provide research funds require a the tests to be sufficient to detect important results 80% of the time using a significance test with alpha = .05.

Increasing power • Increase α. A test at the 5% significance level will have a greater chance of rejecting the null than a 1% level because you have a smaller critical value to beat (aka the strength of evidence required for rejection is less). • Increase the sample size. More data provides more info about x bar, so we will have a better chance of distinguishing values of mu. • Decrease sigma (same effect as increasing sample size…more info about x bar). • Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease sigma.

Best advice? • to maximize power, choose as high an alpha level (type I error probability) as you are willing to risk AND as large a sample as you can afford. • You will not compute power or type II error in this course unless one of them is given to you (then you can calculate the other).

Introduction to Hypothesis Testing: Understanding Statistical Significance

Introduction to Hypothesis Testing: Understanding Statistical Significance

Presentation Transcript

Chapter 11 Testing a Claim

Chapter 9: Testing a Claim

Testing a Claim

Testing a Claim

Ch 9 – Testing a Claim

Chapter 9: Testing a Claim

Chapter 9: Testing a Claim

Section 8.4 Testing a claim about a mean ( σ known)

Section 8.5 Testing a claim about a mean ( σ unknown )

Testing a Claim about a Proportion

Testing a Claim

Section 8.6 Testing a claim about a standard deviation

Chapter 9 Testing a Claim

Section 8.3 Testing a claim about a Proportion

CHAPTER 9 Testing a Claim

Chapter 9 Testing a Claim

CHAPTER 9 Testing a Claim

CHAPTER 9 Testing a Claim

Section 8-5 Testing a Claim About a Mean:  Not Known