Loading in 2 Seconds...
Loading in 2 Seconds...
ESTIMATION AND TEST OF HYPOTHESES: One- sample, two- sample. Introduction to hypothesis Testing:
Introduction to hypothesis Testing:
Suppose you have to buy cornflakes from a salesman. The issue is not the price of cornflakes but the amount of cornflakes in each box. The salesman appears and claims that the cornflakes he is selling are packaged at 10 oz/box. You have exactly 4 alternative possible views of his claim.
If you think he is honest you would just go ahead and order your cornflakes from him. You may, however, have one the other views, he is i)CONSERVATIVE or
The position you hold regarding the salesman can be any one of these but not more than one. You can’t assume he is liar and conservative ieμ < 10 oz and μ > 10 oz , at the same time.
Proper use of scientific method will allow you to test one of these alternative positions through a sampling process. Remember you can choose only one to test.
How would you decide ?
CASE 1: Testing the salesman is conservative
Suppose the salesman is remarkably shy and seems to lack self confidence. You feel from his general conduct that he is being conservative in his claim of 10 oz/box. The situation can be summarized with a pair of hypothesis – actually a pair of predictions.
A) The salesman’s claim and the prediction we will directly test. It is usually calledHo or null hypothesis. In this case
Ho: μ=10 oz.
B) The second is called the alternative or research hypothesis which is your belief or position. The alternative hypothesis in this case is Ha: μ > 10 oz. By writing the null hypothesis as Ho: μ≤ 10 oz. Predictions take the following forms
Ho: μ≤ 10 oz (null hypothesis)
Ha: μ > 10 oz (alternative hypothesis)
And we have generated two mutually exclusive and all-inclusive possibilities. Therefore, either Ho or Ha will be true, but not both.
In order to test the salesman’s claim (Ho) against your views (Ha), you decide to do a small experiment. You select 25 boxes of cornflakes from a consignment and carefully empty each box, weigh and record its contents. This experimental sampling is done after you have formulated the two hypotheses. If the first hypothesis were true you would expect the sample mean of the 25 boxes to be close to or less than 10 oz.
If the second hypothesis were true you would expect the sample mean to be significantly greater than 10 oz. We have to think about what significantly greater means in this context. In statistics significantly less or more or different means that the result of the experiment would be a rare result if the null hypothesis were true. In other words, the result is far enough from the prediction in the null hypothesis that we feel that we must reject the truthfulness of the hypothesis.
The idea leads to the problem of what is a rare result or rare enough result to be sufficiently suspicious of the null hypothesis. For now we will say if the result could occur by chance less than 1 in 20 times if the null hypothesis were true. When we will reject the null hypothesis and consequently accept the alternative ones. Let’s now look at how this decision making criterion works in CASE 1.
Ho : μ≤ 10 oz
Ha : μ > 10 oz
n= 25 and assume and is widely known.
Suppose the mean of your 25 box sample is 10.36 oz. Is that significantly different from (>) 10 oz so that we should reject the claim of 10 oz stated in Ho. Clearly it is greater than 10 oz but is this mean rare enough under the claim of μ≤ 10 oz for us to reject the claim.
To answer this question we will use the standard normal transformation to find the probability of ≥10.36 oz when the mean of the sampling distribution of is 10 oz. If this probability is less than 0.05 (1 in 20), we consider the result to be too rare for acceptance of Ho.
CASE II: Testing that the salesman is a cheat
Suppose our salesman is a fast and smooth talker with fancy cloths and a new sports car. Your view might be that cornflakes salesman only gain this type affluence through unethical practices. You think this guy is cheat. Your null hypothesis is Ho: μ≥ 10 oz and your alternative hypothesis is Ha:μ < 10 oz . Notice that the two hypothesis are again mutually exclusive and all inclusive and that the equal sign is always in the null hypothesis.
It is the null hypothesis (the salesman’s claim) that will be tested.
Ho : μ≥ 10 oz
Ha : μ < 10 oz.
Suppose you again sample 25 boxes to determine the average weight. The question you want to answer and the predictions (Ho, Ha) stemming from that question are again formulated before the sampling is done,
n = 25, oz and again we find = 10.36 oz. How does this result fit our predictions ? If Ho is false, we expect the mean to be significantly less than 10 oz.
CASE III: Testing that the salesman is clueless
The last case is somewhat different from the first in that we really don’t know whether to expect the mean of the sample to be higher or lower than the salesman’s claim. The salesman is new on the job and does not know his product very well. The claim of 10 oz per box is what he has been told, but you don’t have a sense that he is either overly conservative (CASE I) or dishonest (CASE II). Your alternative hypothesis here is less focused.
It becomes that the mean is different from 10 oz. The prediction become
Ho: μ = 10 oz
Ha : μ ≠ 10 oz. Under Ho we expect to be close to 10 oz, while under Ha we expect to be different from 10 oz in either direction ie significantly smaller or significantly larger than 10 oz.
Ho : μ = 10 oz
Ha : μ ≠ 10 oz
3. Choose the level of significance. This means to choose the probability of rejecting a true null hypothesis. We choose 1 in 20 in our cornflakes example, that is, 5% or 0.05. When Z was so extreme as to occur less than 1 in 20 times if Ho were true, we rejected Ho.
4. Z is calculated as
Determine the appropriate test statistic. Here we mean the index whose sampling distribution is known, so that objective criteria can be used to decide between Ho and Ha. In the cornflakes example we used a Z transformation because under the Central Limit Theorem was assumed to be normally or approximately normally distributed and the value of was known.
5. Calculate the appropriate test statistic. Only after the first four steps are completed , can one do the sampling and generate the so-called test statistic.
6. Determine the critical values for the sampling distribution and appropriate level of significance. For the two tailed test and level of significance of 1 in 20 we have critical values of + 1.960 (C.3 Tab). These values or more extreme ones only occur 1 in 20 times if Ho is true. The critical values serve as cutoff points in the sampling distribution for regions to reject Ho.
7. Compare the test statistic to the critical values. In a two-tailed test, the CV’s = + 1.960 and the test statistic is 1.8, so
8. Based on the comparison in step 7, accept or reject Ho. Since Z falls between the critical values, it is not extreme enough to reject Ho.
9. State your conclusion and answer the question posed in step 1. SO WE ACCEPT HO.
Because the predictions in Ho and Ha are written so that they are naturally exclusive and all inclusive, we have a situation where one is true and the other is automatically false.
When Ho is true, then Ha is false.
This type of mistake is called a Type I error
When Ho is false , then Ha is true
The second type of mistake is called Type II error
Example 1. A forest ecologist studying regeneration of rain forest communities in gaps caused by large tree falling during storms, read the stinging (bow) tree, Dendrocnideexcelsa, seedlings will grow 1.5m/yr in direct sun light in each gap. In the gaps in her study plot she identified 9 specimens of this species and measured them in 2009and again 1 yr later. Listed below are the changes in height for the nine specimens.
Do her data support the published contention that seedlings of this species will average 1.5 m of growth per yr in direct sun light ?
1.9 2.5 1.6 2.0 1.5 2.7 1.9 1.0 2.0
Hypothesis : Ho: μ = 1.5 m/yr
Ha: μ ≠ 1.5 m/yr
If the sample mean for 9 specimens is close to 1.5 m/yr we will accept Ho. If sample mean is significantly larger or smaller than 1.5 m/yr we will accept Ha (reject Ho). To test significant difference, it means that they are so rare that they would occur by chance less than 5% of the time, if Ho is true ieα = 0.05. Test statistic will be
Here, n=9, s2 =0.260 m2 , s= 0.51 and
Clearly t-value of 2.35 is not zero but it is far enough away from zero so that we can comfortably reject Ho. With a predetermined α level of 0.05 we must get a t-value far enough from zero that would occur <5% of the time if Ho is true.
From Tab C.4 we have the following sampling distribution for t with v=n-1= 8 and α=0.05 for a two tailed test.
If Ho is true and we sample hundreds or thousands of times with samples of 9 species and each time we calculate the t-value for the sample, these t-values would form a distribution with the shape indicated above. 2.5% of the samples would generate t-values below -2.306 and 2.5% of the samples would generate t values above 2.306. So values as extreme as + 2.306 are rare if Ho is true.
The test statistic in this sample is 2.35 and since 2.35>2.306, the result would be considered rare for a true null hypothesis. We reject Ho based on this comparison and conclude that average growth of stinging trees in direct sun light is different from the published value and is, in fact, greater than 1.5 m/yr.
Rejecting Ho may lead to a Type I error.
Watching an infomercial on TV you hear the claim that without changing your eating habits, a particular herbal extract when taken daily will allow you to loose 5lb in 5 days. You decide to test this claim by enlisting 12 of your classmates into an experiment. You weigh each subject, ask them to use the herbal extract for 5 days and then weigh them again. From the results recorded below, test the infomercial’s claim of 5 lb lost in 5 days.
Solution: Because the data are paired we are not directly interested in the values presented above, but are interested in the differences or changes on the pairs of members. Think of data as in groups
For the paired data here we wish to investigate the differences or di’s where
X11-X21 = d1, X12-X22 = d2, X1n-X2n =dn
Expressing the data set in terms of these differences di’s, we have the following table. Note importance of sign of these differences
The infomercial claim of a 5 lb loss in 5 days could be written
Ho: μB- μA = 5lb but Ho: μd = 5lb is somewhat more appealing
Ho: μd = 5 lb
Ha: μd ≠ 5 lb
Choose α = 0.05, since the two columns of data collapse into one column of interest, we treat these data now as a one sample experiment.
There is no preliminary F test and our only assumption is that the di’s are approximately normally distributed. The test statistic for the paired sample t test is
With v = n-1, where n is number of pairs of data points.
Here = 3.8 lb, sd = 4.1 lb, n=12. We expect this statistic to be close to 0 if Ho is true ie the herbal extract allows you to loose 5 lb in 5 days. We expect this statistic to be significantly different from 0 if the claim is false.
With v= n-1= 12-1 =11. The critical value for this left tailed test from Tab C.4 is t0.05(11)= -1.796. Since -1.796<-1.01 the test statistic does not deviate enough from expectation under a true Ho that you can reject Ho. The data gathered from your classmates support the claim of an average loss of 5 lbs in 5 days with the herbal extract. Because you accept Ho here, you may be making a Type II error (accepting a false Ho), but we have no way of quantifying the probability of this type of error.
An expt. was conducted to compare the performance of two varieties of wheat, A and B. Seven farms were randomly chosen for the expt. and the yields in metric tons per hectare for each variety on each farm were as follows;
Solution: The expt. was designed to test both varieties on each farm because different farms may have significantly different yields due to differences in
i) soil characteristics
ii) micro climate
iii) cultivation practices
“Pairing” the data points accounts for most of the “between farm” variability and should make any difference in yield due solely to what variety.
The hypotheses are
Ho : μA – μB or μd = 0
Ha : μd ≠ 0
Let α = 0.05.Then ton/hectare n =7
and andsd = 0.41 ton/hectare.
With v=7-1=6 . The critical values from Tab C.4 are t0.025(6)= -2.447 and t0.975(6) = 2.447. Since
-2.447<1.94<2.447 the test statistic does not deviate enough from 0, the expected t value if Ho is true, to reject Ho. From the data given we can not say that the yields of varieties A and B are significantly different.
Example: A geneticist interested in human population has been studying growth patterns in US males since 1900. A monograph written in 1902 states that the mean height of adult US males is 67.0 inch with a standard deviation of 3.5 inch. Wishing to see if these values have changed over the 20th century the geneticists measured a random sample of adult US males and found that = 69.4 inch and s = 4.0 inch. Are these values significantly different from the values published in 1902?
Solution: There are two questions here – one about the mean and the second about the standard deviation or variance. Two questions require two sets of hypotheses and two test statistics. For the question about means, the hypotheses are
Ho : μ = 67.0 inch
Ha : μ ≠ 67.0 inch
With n = 28 and α = 0.01. This is a two tail test with the question and hypotheses (Ho and Ha) formulated before the data were collected or analyzed.
Using an α level of 0.01 for v= n-1= 27, we find the critical values to be ± 2.771 (Tab C.4).
Since 3.16>2.77, we reject Ho and say that modern mean is significantly different from that reported in 1902 and , in fact, is higher than the reported value (because the t-value falls in the right hand tail). P (Type I error)< 0.01.
For the question about variance, the hypotheses are Ho: Ha :
Here n=28. Then
The question about variability is answered with a Chi-square statistic. The value is expected to be close to 27 (n-1), if Ho is true and significantly different from 27, if Ha is true.
From Table C.5 using an alpha level of 0.01 for v = 27, we find the critical values for to be 11.8 and 49.6. Since 11.8<35.3<49.6 we do not reject Ho here. There is not statistical support for Ha. The p value here for p
is between 0.500(31.5) and 0.250(36.7) indicating the calculated value is not a rare event under the null hypothesis.
We would conclude that the mean height of adult US males is higher now than reported in 1902, but the variability in heights is not significantly different today than in 1902.
Assumptions for the test for goodness of fit are that
The hypothesis test takes only one form
Ho : The observed frequency distribution is the same as the hypothesized frequency distribution
Ha : The observed and hypothesized frequency distributions are different
Generally speaking, this is an example of a statistical test where one wishes to confirm the null hypothesis.
Let Oi denote the observed frequency of the i-th category. The test statistic is based on the difference between the observed and expected frequencies, Oi- Ei.
The intuition for the test is that if the observed and expected frequencies are nearly equal for each category, then each
Oi– Eiwill be small and, hence,
will be small. Small values of Chi-squares should lead to acceptance of Ho while large values lead to rejection. The test is always right tailed. Ho is rejected only when the test statistic exceeds a specified value.
The statistic has an approximate Chi-square distribution where Ho is true; the approximation improves as sample size increases. The values of the Chi-square distribution are tabulated in C.5.
The progeny of self-fertilized four-o’clocks were expected to flower red, pink and white in the ratio of 1:2:1. There were 240 progeny produced with 55 red plants, 132 pink plants, and 53 white plants. Are these data reasonably consistent with the Mendelian 1:2:1 ratio?
Solution: The hypotheses are
Ho: The data are consistent with a Mendelian model (1:2:1)
Ha: The data are inconsistent with a Mendelian model (1:2:1)
The THREE colours are the THREE categories. In order to calculate frequencies, no parameters need to be estimated. The Mendelian ratios are given; 25% red, 50% pink and 25% white. Using the fact that there are 240 observations, the number of expected red four-o’clock is 0.25 × 240 = 60 ieEi = 60. Similar calculations for pink and white yield the following table:
v = df = no. of categories-1 = 3-1 = 2 Let α = 0.05
Because the test is right tailed, the critical value occurs when . Thus in Table C.5 for df=2 and p=1-α = 0.95, the critical value is found to be 5.99. Since 2.44<5.99, Ho is accepted. This support Mendelian 1:2:1 ratio.