Building on the logic of hypothesis testing: T-tests

Building on the logic of hypothesis testing: T-tests

One-Sample T-test: Outline • Introduce the t-test and explain when it should be used • Define Directional Hypotheses (one-tailed t-tests) and contrast them with ‘Non-Directional Hypotheses’ (two-tailed t-tests) that were described earlier. • Learn how to find tcrit for directional hypotheses • Highlight a second method for making decisions regarding the null hypothesis: the p-value method

One-Sample T-test: Outline • Learn to calculate a p-value and review important points about using this method • Demonstrate one measure for estimating whether an experimental effect is large or small (Cohen’s D) • Advanced Topic: Calculating  and the Power associated with a hypothesis test • Outline the steps for conducting Hypothesis Tests using SPSS

What is a one-sample t-test? When we have a single sample mean and we are trying to decide if it is drawn from a specific population. …this is what we’ve been doing so far in hypothesis testing.

The Animal Cracker Packer Abby is the manager of an animal cracker factory. She is concerned that her aging cracker packer might need to be replaced. Each bag of crackers is supposed to weigh 454 grams. Abby would like to conduct a hypothesis test but she faces one big obstacle: she does not know the population variability (σ), which means she cannot use the formula for zobs. Q: Have we ever faced a similar problem? A: Yes; when we wanted to calculate confidence intervals and we did not know σ.

The Animal Cracker Packer Q: What did we do then? A: Substituted s for σ and substituted t for z Q: Is that what Abby should do now? A: You betcha! Why do you think this section is called t-tests?

Limitations of hypothesis testing using z-scores • Our sample must be large (> 30) in order for the Central Limit Theorem to ‘kick in’. • σ must be known. • This is a more serious limitation because it is almost impossible to know σ in the real world.

What do we do if σ is unknown? Just like we did with confidence intervals: • Use s as an estimate of σ • Use t as our test statistic instead of z

Visualizing the t-distribution • Generally not normal-flattened and stretched out too much • Shape determined by df • Approximates a normal distribution at larger sample sizes • Approximates z when sample size is large

Step 1: Determining tcrit tcrit depends on the degrees of freedom; df = n-1 If α = .05, and n = 10: t(=.05; 9) = 2.262 *If specific dfis not listed; look up t values for df in between and then use the larger t value

Step 2: Calculating tobs The only difference is that s replaces σ NOTE: is an estimate of SE Step 3: compare tobswithtcrit Exactly the same as before.

Steps for completing a one sample t-test • Specify the NULL hypothesis (HO) • Specify the ALTERNATIVE hypothesis (HA) • Designate the rejection region by selecting . • Determine the critical value of your test statistic Use appropriate degrees of freedom (n-1) • Use sample statistics to calculate test statistic. tobs= • Compare observed value with criticalvalue: If test statistic falls in RR, we reject the null. Otherwise, we fail to reject the null. • Interpret your decision regarding the null What do your data imply regarding the question that motivated your experiment?

Abby’s Animal Cracker Packer Abby samples the next 25 cracker bags packed by the machine to determine whether it is putting 454g of crackers in each bag. The sample statistics are as follows: M = 462.4 g; s = 16 g. Is the machine properly filling cracker packages? Step 1: Ho:  = 454 g Step 2: Ha:  454 g Step 3:  = .05

We must figure our what the sampling distribution would look like if the null is true. • Then we can make a judgment about how likely our sample mean is. µ0 SE

Abby’s Animal Cracker Packer N=25, M = 462.4 g; s = 16 g. Step 1: Ho:  = 454 g Step 2: Ha:  454 g Step 3:  = .05 Step 4: tcrit( = .05, df = 24) = ±2.064. Step 5: tobs Step 6: Because the tobs falls in the rejection region, we would reject the null.

Comparing tobs and tcrit: The Cracker Packer Step 7: Interpret results Is Abby’s machine properly filling bags?

Proper Statistical Notation “Old school style” t (df) = t-observed, p < alpha OR p > alpha t (24) = 2.625, p < .05 • Code for: “With degrees of freedom of 24, our t-observed value was 2.625 and the probability of getting this value or one more extreme if the null were true is less than a 5% chance.” “New APA style” t (24) = 2.625, p = .025 (exact p can only be calculated using a computer)

The Crocodile Hunter Before his untimely death, the Crocodile hunter, Steve Irwin was studying the length of a new group of crocs in Perth. He wanted to know how the length of an average adult in this new group compared to other adult crocodiles. Owing to his many years of experience with crocs, Steve knows that  = 21 feet. Because catching and measuring crocs is dangerous – even for the Crocodile Hunter – Steve is only able to capture and measure five crocs (M = 24, s = 1.58). Are the Perth Crocs really different from typical crocs? Assume  = .05 Complete steps 1-7 for a hypothesis test

Are Perth Crocs as long as regular Crocs? Step 1: Ho: Step 2: Ha: Step 3: α = Step 4: tcrit( = .05, df = 4) = Step 5: tobs= Step 6: Step 7:

Two-tailed vs. One-tailed tests Two-tailed: observing a sample mean in either the upper or lower tail of the sampling distribution would be theoretically meaningful • Ho:  = some value • Ha:  some value • Rejection region split between two tails: α/2 in upper tail, α/2 in lower tail. One-tailed: observing a sample mean in only one of the two tails of the sampling distribution would be theoretically meaningful • Ho:  some value OR ≤ some value • Ha:  < some value OR  > some value • Rejection region located entirely in one tail; EITHER α in upper tail OR α in lower tail.

Directional (one-tailed) Hypothesis Tests: What happens at the drive-thru? Let’s pretend that Abby leaves the cracker packer factory for the fast-paced, high-pay, take-no-prisoners life of a fast food restaurant manager. Abby is considering whether or not to buy a new intercom system for the drive-thru, so she rents one to test it out. What if Abby wanted the new intercom to reduce errors?

What if Abby rented the new intercom to reduce errors? Decreasing errors: the only meaningful result would be if the sample mean fell at the extreme low end of the sampling distribution. In this case: • Ho:  some value • Ha:  < some value And the rejection region would look like this: If her sample mean was out here (big increase in errors), we wouldn’t care, we’d still fail to reject Ho

Directional (one-tailed) Hypothesis Tests: What happens at the drive-thru? Let’s pretend that Abby leaves the cracker packer factory for the fast-paced, high-pay, take-no-prisoners life of a fast food restaurant manager. Abby is considering whether or not to buy a new intercom system for the drive-thru, so she rents one to test it out. What if Abby rented the new intercom to increase sales?

What if Abby rented the new intercom to increase sales? Increasing sales: the only meaningful result would be if the sample mean falls at the extreme high end of the sampling distribution. In this case: • Ho:  ≤ some value • Ha:  > some value And the rejection region would look like this: If her sample mean was out here (big decrease in sales), we wouldn’t care, we’d still fail to reject Ho

Finding the critical value for one-tailed test One tailed: Entire amount of alpha in one tail; = .05 Two tailed: Divide alpha in half; = .05,α/2 = .025 Are you more likely to reject the null for a one-tailed test or a two-tailed test? How do you decide which one to do?

My honest opinion on one-tailed tests…. ONE-TAILED TESTS ARE INAPPROPRIATE UNLESS YOU HAVE AN EXTREMELY GOOD REASON FOR USING THEM BEFORE YOU RUN YOUR EXPERIMENT. I HAVE NEVER SEEN A STRONG ARGUMENT FOR WHY ONE WAS APPROPRIATE. EVER!!!!

Steps for conducting t-tests: including directional tests • Decide whether you are conducting a one- or a two-tailed test. • Specify the NULL hypothesis (HO) 2-tailed: µ = some value; 1-tailed: µ ≤ or ≥ some value • Specify the ALTERNATIVE hypothesis (HA) 2-tailed: µ ≠ some value 1-tailed: u > or < some value • Designate the rejection region by selecting . 2-tailed: /2 in the tail 1-tailed: α in the tail • Determine the critical value of your test statistic (remember to use appropriate df) • Use sample statistics to calculate test statistic. tobs= • Compare observed value with critical value If test statistic falls in RR, we reject the null. Otherwise, we fail to reject the null. • Interpret your decision regarding the null What do your data imply regarding the question that motivated your experiment?

Comparing the results of one- and two-tailed t-tests You lost a lot of money at the track and were forced to become the personal statistician of notorious underworld crime boss “Big Lou”. Big Lou wants to know if his son “Moderately-Sized Lou” is stealing from his gambling operation. Before Lou Jr. took over the operation, it used to gross $3500 per night (µ). Big Lou tells you, “I don’t care if he is grossing more than $3500, I only care if he’s grossing less. Got it?!”

Comparing the results of one- and two-tailed t-tests At this point, you could give Big Lou a lecture regarding the theoretical considerations that guide the choice between one- and two-tailed tests, but I would not be so bold... You sample the gross earnings of the casino over the next 25 nights. The average of the sample is $3338; s = 450. Is “Moderately-Sized Lou” in trouble?  = .05?

$3,500 Lou Jr. gets whacked Lou Jr. is ok!

Comparing the results of one- and two-tailed t-tests Gross $3500 per night (µ) Sample 25 nights, M =$3338; s = 450. Step 1: Big Lou has asked us to conduct a one-tailed test with the entire rejection region in the lower tail. Thus, our null and alternative hypotheses will be as follows: Step 2: Ho:  3500 Step 3: Ha:  < 3500 Step 4: = .05 Step 5: tcrit(α=.05, df = 24; 1-tailed) = -1.711. Step 6: tobs Negative value!

Comparing the results of one- and two-tailed t-tests Gross $3500 per night (µ) Sample 25 nights, M =$3338; s = 450. Step 7: Our observed t falls in the rejection region. Therefore, we would rejectthe null: t (24) = -1.8, p <.05 Step 8: Interpret the results

Big Bad Lou as a Two-Tailed Test Although it would be unwise for you to challenge Big Lou’s decision to run a one-tailed test, the same is not true for Mrs. Lou. She loves her baby boy and wisely asks you to conduct a two-tailed test, just to see what would happen. After all, wouldn’t Lou Jr. deserve a big raise if receipts from the gambling operation increased rather than decreased? Bear in mind that, just like with selecting a value for α, the time to make a decision regarding whether to run a one- or two-sampled test is BEFORE you have seen the data.

Following mama Lou’s request Step 1: Because we have decided to conduct a two-tailed test, our statistical hypotheses would be as follows: Step 2: Ho:  = 3500 Step 3: Ha:  ≠ 3500 Step 4:  = .05 Step 5: tcrit(α=.05, df = 24, 2-tailed) = ± 2.064. Step 6: The observed value of our test statistic does not change: tobs= -1.8

Following mama Lou’s request Step 7: Our observed t DOES NOT fall in the rejection region. Therefore, we would fail to rejectthe null: t (24) = -1.8, p > .05 Ethics: Be judicious when making a choice. Choose before you see your data! Step 8: Interpret results

Vertical-Horizontal Illusion Although the two lines are exactly the same length, the vertical line appears longer. To examine the strength of this illusion a researcher prepares a version of the illusion where each line is exactly 10 inches. They tell 25 participants that the horizontal line is 10 inches and ask them to estimate the length of the vertical line. The average length estimated is 12.2 inches, s = 1.00. Conduct a one-tailed hypothesis test (α = .01)

Vertical-Horizontal Illusion Mean = 12.2; s = 1.00; n = 25 Step 1: One-tailed test Step 2: Ho:  ≤ 10 Step 3: Ha: > 10 Step 4:  = .01 Step 5: Step 6: Step 7: Step 8:

The critical value method(what we’ve been using so far) 1.) We set alpha 2.) We find the correspond t or z critical value 3.) We see if our t or z observed is in the rejection region (beyond the critical value)

Introducing the p-value method(APA’s new recommendation) 1.) We set alpha 2.) We find the exact probability of obtaining our sample mean or one more extreme if the null is true: called the p-value 3.) We see if the p-value is less than alpha

p-value=the probability (assuming Ho is true) of observing a sample mean that is at least as extreme as the observed sample mean.

Using the p-value method If you know the population SD and can use z • Finding the p-value is easy! It’s just the proportion in the tail that corresponds to zobs • For a one-tailed: p = area in the tail • For a two-tailed: p= 2 x (area in the tail) • Either way • If p < α, we REJECT THE NULL. • If p > α, we FAIL TO REJECT THE NULL.

Using the p-value method • EXAMPLE: Lou’s question using a two-tailed (for illustration only: t-test is more appropriate) • zobs= -1.80 • Proportion in the tail: .0359 • Two-tailed: .0359 x 2 = .0718 • Alpha = .05, • .0718 > .05, so we fail to reject the null .0359 .0359 1.80 -1.80

Using the p-value method • If you don’t know the population SD and are doing a t-test • We can’t find an exact p-value in the table • BUT we can find a p-value if we use RC • (RC is “smart” and knows all the exact values for t at any possible df and the corresponding p value) • RC is also smart and already calculated p-value X 2 • So you don’t have to do that step!

Using RC to get the p-value National average pieces of fruit eaten per day is 2.5 pieces How do AC students compare to the national average? Ho: = 2.5 Ha:  ≠ 2.5 α = .05 p-value method: We want to see if the probability of getting our mean or one more extreme in either tail is less than alpha

One-Sample T-test: RC Output One Sample t-test data: Fruit t = 1.4947, df = 29, p-value = 0.1458 alternative hypothesis: true mean is not equal to 2.5 95 percent confidence interval: 2.242207 4.157793 sample estimates: mean of x 3.2 mean sd n 3.2 2.565017 30

Reporting the results of a t-test Amherst College students consumed an average of 3.2 pieces of fruit per day (s = 2.57). These data did not provide enough evidence to conclude that Amherst College students differed from the national average in terms of fruit consumption: t (29) = 1.49, p = .15. Mean SD Df tobs p-value

Using RC to get the p-value National average pieces of fruit eaten per day is 2.0 pieces How do AC students compare to the national average? Ho: = 2.0 Ha:  ≠ 2.0 α = .05 p-value method: We want to see if the probability of getting our mean or one more extreme in either tail is less than alpha

One-Sample T-test: RC Output One Sample t-test data: Fruit t = 2.5624, df = 29, p-value = 0.01585 alternative hypothesis: true mean is not equal to 2.0 95 percent confidence interval: 2.242207 4.157793 sample estimates: mean of x 3.2 mean sd n 3.2 2.565017 30

Building on the logic of hypothesis testing: T-tests

Building on the logic of hypothesis testing: T-tests

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7