Math 3680 Lecture #7 The Sign Test and the Binomial Exact Test

Math 3680 Lecture #7 The Sign Test and the Binomial Exact Test

The Sign Test

Example: For this data set, there are 14 pairs in which there is a difference in the two measured amounts. Let K = the number of pairs in which the first method returned a higher amount. We choose a = 0.05. We observe that there are 12 pairs where the first method returns a higher value. This gives the value of the test statisticks= 12. 79.2 74.0 96.8 95.8 105.8 97.8 76.0 75.0 99.2 98.0 99.5 96.2 69.5 67.5 99.2 99.0 100.0 101.8 23.5 21.2 91.0 100.2 93.8 88.0 95.2 94.8 72.0 67.5

Let’s play devil’s advocate and assume the null hypothesis is correct. That is, let’s assume that p = 0.5, and let’s work through the logical ramifications of this assumption. If the null hypothesis is correct, what’s the probability of obtaining a value of Kat least as extreme as the observed test statistic? This probability is called the P-value, or the observed level of significance: Excel: 1 - BINOMDIST(11,14,0.5,1)

In other words, if we assume the null hypothesis, we also have to accept that fact that there’s less than 1 chance in 150 of obtaining a test statistic this large or larger. We then ask the question: which is more plausible? In this case, the P-value is less than the stipulated significance level of a = 0.05. It’s more plausible to reject the null hypothesis in favor of the alternative hypothesis. Conclusion (written in plain English): We reject the null hypothesis. There is good reason to believe that the first method returns a higher amount than the second method.

Summary of algorithm for hypothesis testing: H0: The first method does not return higher values than the second method (p = 0.5) Ha: The first method returns higher values than the second method (p > 0.5) (one-sided test) We choose a = 0.05 Test statistic: ks = 12 for a sample size of n = 14 P-value = 0.00646973 Conclusion: We reject the null hypothesis. There is good reason to believe that the first method returns a higher amount than the second method.

Notes. Note 1. Notice we havenot proven beyond a shadow of a doubt that the first method returns a higher value than the second method. Is it possible for 14 fair coins to land so that 12 or more are heads? Yes. In other words, we may simply have had a run of luck. However, we can reasonably justify our rejection of the null hypothesis.

Note 2. The alternative hypothesis is p > 0.5. It is not that p = 12/14. Good practice is to state the null and alternative hypotheses (and select α ) before looking at the data. Note 3. Small P-values are evidence against the null hypothesis; they indicate that something besides chance is at work.

Note 4. If P < 5%, the result is often called statistically significant. If P < 1%, the result is called highly statistically significant. These phrases are often used in media reports on scientific progress – especially breakthroughs in medical research.

After dropping for years, teen smoking in the U.S. has leveled off • Monday, June 12, 2006; Posted: 10:14 a.m. EDT (14:14 GMT) • ATLANTA, Georgia (AP) -- The long, steady decline in teen smoking in the United • States since the late 1990s appears to have come to a standstill, health officials said • Friday. • A survey released this week showed that smoking among high school students • held steady at around one in four teenagers between 2003 and 2005. Two other surveys in • the past year or so found that teen smoking has apparently plateaued since 2002. • "We were making good progress, and now it looks like we're not," said • Dr. Corinne Husten, acting director of the Office on Smoking and Health at the Centers • for Disease Control and Prevention. • The trend was outlined in the CDC's National Youth Risk Behavior Survey, • which is conducted every other year and involves about 14,000 high school students • across the country. The results of the latest survey were released last week. • The survey had been showing a steady and pronounced decline in youth smoking • since 1997, when more than 36 percent of students said they had smoked in the previous • 30 days. The percentages dropped to about 35 in 1999, 28.5 in 2001 and 22 in 2003. But • when students were asked the question last spring, 23 percent said they had smoked. The • increase from the 2003 survey was not considered statistically significant, but it was • disturbing news, health advocates said.

Study shows fliers are out of breath Wednesday, April 27, 2005 Posted: 7:23 AM EDT (1123 GMT) LONDON, England -- Airline passengers are putting up with "significant" drops in the supply of oxygen while flying at high altitude, according to researchers. Just over half of all fliers analyzed had oxygen levels 6 percent lower than usual when the airplane was at maximum altitude -- a level at which doctors normally administer extra oxygen for hospital patients. "We believe that these falling oxygen levels, together with factors such as dehydration, immobility and low humidity, could contribute to illness during and after flights," said Susan Humphreys of the Royal Group of Hospitals in Belfast, whose group conducted the research. "This has become a greater problem in recent years as modern airplanes are able to cruise at much higher altitudes." A drop in oxygen levels can be a contributing factor to deep vein thrombosis (DVT), a potentially fatal blood clot which is also called "economy class syndrome." Low oxygen levels also can lead to headaches, fatigue and impaired mental performance.

"We should be giving people with ill health more advice about things they can do, such as drinking more water when they fly, to avoid problems," researcher Rachel Deyermond told the UK's Daily Telegraph newspaper. The researchers from Belfast, Northern Ireland published their results in the May issue of Anaesthesia, a British medical journal. They recorded the blood oxygen levels and the pulse rate of 84 passengers, aged 1 to 78, at both ground level and at peak altitude during a flight. The research shows a "statistically significant" reduction in oxygen levels in all passengers traveling on both long- and short-haul flights. On average, oxygen levels in passengers dropped by 4 percent by the time the plane had reached cruising altitude. A total of 54 percent of passengers had oxygen levels below this level. Of the 84 passengers who were analyzed, 55 were on flights lasting more than two hours, while the rest were on short-haul journeys. Similar results were obtained from both groups. None of them had severe cardio-respiratory problems or required permission from their doctor to fly.

Note 5. We are NOT saying that there is 1 chance in 150 for the null hypothesis to be correct. Instead, the P-value is used as a tool to determine whether or not to reject the null hypothesis. Note 6. The significance level a should be chosen before inspecting the data. Seeing the evidence before deciding on the value of a is called data snooping, which may bias our decision.

Note 7. When computing the P-value, we found P(K ≥ 12) and notP(K = 12). The idea is that, assuming the null hypothesis is true, we want to compute the probability of getting an observed value either this extreme or even more extreme. Why does this makes sense? Suppose a fair coin is flipped 1000 times and lands heads 501 times. We should retain the null hypothesis, and the chance of getting 501 or more heads is quite large (48.7%). However, the chance of getting exactly 501 heads is very small (2.5%); using the latter figure would have led us to incorrectly reject the null hypothesis.

Example: Ten children (ages 8 to 14) with a history of severe learning and behavioral disorders were recruited for a six-week study. For three weeks, each child was given a placebo; for the other three weeks, each child was given ethosuximide, widely prescribed for epilepsy. Five of the children received the placebo first; the other five received the placebo last. After each three-week period, each child was given an IQ test. The table (P/E) shows the two verbal IQ scores for each child. Was the medication effective for increasing IQ scores? 97 113 102 111 104 106 106 113 111 122 90 110 106 101 115 121 96 126 95 119

Solution. H0: The IQ scores after ethosuximide were the same as the scores after placebo (p = 0.5) Ha: The IQ scores after ethosuximide were different than the scores after placebo (p 0.5) (two-sided test) We choose a = 0.05 Before continuing, why isn’t Ha written as p > 0.5?

Test statistic: ks = 9 for a sample size of n = 10. P-value. Assuming H0, we must find the chance of obtaining a test statistic at least this extreme. For this problem, that means (why?) In Excel: =BINOMDIST(1,10,0.5,1) + 1 - BINOMDIST(8,10,0.5,1)

Conclusion: We reject the null hypothesis. There is good reason to believe that the ethosuximide does effect verbal IQ scores.

Notes Note 8. The form of the alternative hypothesis, which is based on the context of the problem, determines how the P-value is computed.

Secondhand smoke is classified as a known carcinogen by the Environmental Protection Agency (EPA). This classification is based on many scientific studies which investigated the question of whether secondhand smoke was associated with a higher incidence of cancer. The EPA conducted its study using a 5% significance level and a one-tailed test. A one-tailed test was used because it was already independently determined that first-hand smoke caused cancer and the preliminary studies indicated that second-hand smoke was a probable cause of cancer. However, the tobacco industry argued that a one-tailed test was inappropriate and that a two-tailed test should be used. They claimed that by using a one-tailed test at the 5% significance level, the EPA was essentially using a two-tailed test at the 10% significance level, since each tail would then have area of 5%. The tobacco industry argued that this doubled the probability of a type I error. Nevertheless, since there was good reason to think that secondhand smoke was a carcinogen, the EPA followed the usual scientific convention of using a one-tailed test. Reference: “Secondhand Smoke: Is it a Hazard?,” Consumer Reports, January 1995

Testing a Population Median

The sign test may also be used as a test for the value of a population median. Recall the definition of a median: half the data should lie below the median, while the other half lies above.

Example: A bank will open a new branch in a community only if it can be established that the median family income in the community is greater than $50,000. To obtain information, a random sample of 75 families is chosen. Of these, 44 had incomes over $50,000, while the other 31 had incomes below $50,000. Is this information statistically significant to establish that the median family income is more than $50,000?

Solution. H0: The median income is $50,000 (or less) (m 50,000) Ha: The median income is more than $50,000 (m > 50,000) Alternatively, let p be the probability that a randomly selected family has an income of less than $50,000. Then we may write (why?) H0: p 0.5 Ha: p< 0.5

We choose a = 0.05. Test statistic: ks = 31 for a sample size of n = 75. P-value. Assuming H0, we must find the chance of obtaining a test statistic at least this extreme. For this problem, that means (why?) In Excel: =BINOMDIST(31, 75, 0.5, 1)

Conclusion: We fail to reject the null hypothesis. There is not enough evidence to think that the median family income is more than $50,000. Notice why the phrase “fail to reject” is important. With a larger sample, it’s conceivable that the null hypothesis would then be rejected.

Conceptual Questions: 1) True or False: a) The observed significance level of 8% depends on the data (i.e. sample) b) There are 92 chances out of 100 for the alternative hypothesis to be correct.

Conceptual Questions: 2) True or False: a) A “highly statistically significant” result cannot possibly be due to chance. b) If a sample difference is “highly statistically significant,” there is less than a 1% chance for the null hypothesis to be correct.

Conceptual Questions: 3) True or False: a) If P = 43%, then the null hypothesis looks plausible. b) If P = 0.43%, then the null hypothesis looks implausible.

Binomial Exact Test

Example: A die is rolled 180 times; it lands six 45 times. Is this evidence statistically significant enough to conclude that the die is not fairly balanced? Solution. H0: Ha: We choose a = 0.05. The test statistic is ks = 45 for a sample size of n = 180.

P-value. Assuming H0, we must find the chance of obtaining a test statistic at least this extreme. For this problem, that means that (why?) Excel: =BINOMDIST(15,180,1/6,1)+1 - BINOMDIST(44,180,1/6,1) Conclusion:

Example: There is a social theory that states that people tend to postpone their deaths until after some meaningful event… birthdays, anniversaries, the World Series. In 1978, social scientists investigated obituaries that appeared in a Salt Lake City newspaper. Among the 747 obituaries examined, 60 of the deaths occurred in the three-month period preceding their birth month. However, if the day of death is independent of birthday, we would expect that 25% of these deaths (about 187) would occur in this three-month period. Does this study provide statistically significant evidence to support this theory?

Example: The following table summarizes the findings of a 1971 observational study of 5466 women who gave birth, categorized by both smoking preference and low birthweight: Low birthweight Normal Total Smokers 185 1891 2076 Nonsmokers 193 3197 3390 Total 378 5088 5466 Does this show that smoking is associated with low birthweight? (Notice we don’t say “causes” since this is not a randomized, controlled, double-blind experiment.)

Example: The following table summarizes the findings of a 1971 observational study of 5466 women who gave birth, categorized by both smoking preference and low birthweight: Low birthweight Normal Total Smokers 185 1891 2076 Nonsmokers 193 3197 3390 Total 378 5088 5466 Method of Attack: Suppose that smoking and low birthweight are not associated. Then we would expect the proportion of smoking mothers among the 378 low birthweight babies to be roughly the same as the proportion of smoking mothers of all 5466.

Solution. H0: Ha: We choose a = 0.05. The test statistic is ks = 185 for a sample size of n = 378 (roughly 49%).

P-value. Assuming H0, we must find the chance of obtaining a test statistic at least this extreme. For this problem, that means Conclusion:

Math 3680 Lecture #7 The Sign Test and the Binomial Exact Test

Math 3680 Lecture #7 The Sign Test and the Binomial Exact Test

Presentation Transcript

Math 3680 Lecture #17 Two-Sample Inference

Math 3680 Lecture #3 Probability

Math 3680 Lecture #19 Correlation and Regression

Math 3680 Lecture #4 Discrete Random Variables

Math 3680 Lecture #15 Confidence Intervals

Math 3680 Lecture #8 Continuous Random Variables

Math 3680 Lecture #15 Confidence Intervals

Math 3680 Lecture #5 Important Discrete Distributions