- 193 Views
- Updated On :
- Presentation posted in: Sports / Games

Chapter 6. Introduction to Inference. Motivational Scenario. A market research agency has been given the task to estimate the average number of hours per week that young adults spend surfing the web.

Chapter 6

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Chapter 6

Introduction to Inference

- A market research agency has been given the task to estimate the average number of hours per week that young adults spend surfing the web.
- The agency surveys a random sample of 100 young adults and obtains a mean of 20 hours and a standard deviation of 5 hours
- Can the agency conclude that the true mean number of hours per week spent by all young adults surfing the web is exactly 20 hours?

- Because the market research agency recognizes that the 20 hours was obtained from just one of many possible samples of the population they are unwilling to say the population mean is exactly equal to 20 hours.
- To allow for the variation in the sample estimate they may cautiously conjecture that the true mean is somewhere between 18 and 22 hours, between 15 and 25 hours, etc.

- How wide should they make the interval?
- How confident should they be that the named interval does indeed contain the true mean?
- On what basis should the choice be made?
- They can use an established fact about how sample means vary when random samples are repeatedly drawn from any population – the central limit theorem

Based on this relative frequency idea, if only one random sample of size n is drawn we can express 95% confidence that the intervalx + 1.96 sx will contain m.

This interval is called a 95% confidence interval for m.

Confidence level C refers to probability the interval will contain the true mean before the sample data are collected

Margin of error given by z* times std. error; z* is determined from the Normal curve based on C, the confidence level

For a given sample size, the interval width is narrower for lower levels of confidence.

For a given level of confidence the interval width is narrower for larger samples.

Suppose you have collected a sample of 20

observations, your sample mean is 5.5 and

the assumed population standard deviation is 1.7

The 95% CI for m is 5.5 + .7451

yielding 4.755 to 6.245

Suppose you originally thought the mean was actually 5.0 Do your data support your belief?

In the previous case with the 95% C.I. for

m being 4.755 to 6.245, suppose, instead,

you originally thought the mean was 6.5.

Do your data support this belief?

- Formal way to determine whether or not the data support a belief or hypothesis.

- In our judicial system we have the following hypotheses:
The accused is innocent; The accused is guilty

- We can make two errors:
Convicting the innocent; Letting the guilty go free

- It is desirable to minimize the chance of committing either error. But guarding against one usually results in increasing the chance of committing the other.
- Society favors guarding against convicting the innocent.

- Assume accused is innocent.
- Gather evidence to prove guilt
- Convict only if evidence is strong enough

To test the belief m > 20 (alternative hypothesis, Ha)

- Assume m not > 20; (Null hypothesis, Ho: m 20)
- Gather random sample from population & compute sample mean, x
- Conclude m > 20 (Ha) only if evidence is strong enough, i.e. if x is so many standard deviations away from 20, the probability of this occurring by pure random chance is very small

Hypothesis Testing Definitions

- Type I error : concluding Ha when Ho is true (convicting the innocent)
- Type II error: concluding Ho when in fact it is false (letting the guilty go free)
- = Prob (Type I error) – significance level
- b = Prob (Type II error)

- Establish the Null Hypothesis and the Alternative Hypothesis :
H0: = $20.00 Ha: ≠ $20.00 (two tailed)

OR

H0: = $20.00 Ha: > $20.00 (right tailed)

OR

H0: = $20.00 Ha: < $20.00 (left tailed)

Ho must always have an equal sign and Ha must be what you want to prove

- Select , the probability of a Type I error.
For example, =.05.

Yousetyour standard for how extreme the sample results must be (in support of the alternative hypothesis in order for you to reject the null.

Here, the sample results must be strong enough in favor of Hathat you would falsely reject the null only 5% of the time.

- Select , the probability of a Type I error.
For example, =.05

- Compute the test statistic (z if s known) from the sample. This tells you how many standard errors above or below the null value the sample mean is.

- Select , the probability of a Type I error.
For example, =.05

- Compute the test statistic z.
- Compute the probability of obtaining such an extreme test statistic z by pure chance, if the null hypothesis were true. This is called the p-value of the test.

- Select , the probability of a Type I error.
For example, =.05

- Compute the test statistic z.
- Compute the probability of obtaining such an extreme test statistic z by pure chance, if the Null hypothesis were true. This is called the p-value of the test.
- Reject or fail to reject the Null Hypothesis by determining whether the p-value is less than or greater than a.

- All statistical packages give p-values in the standard output.
- When we reject Ho we say the test is significant.
- If p-value < .01, highly significant (overwhelming evidence in support of research hypothesis)
- If p-value between .01 and .05, significant (strong evidence)
- If p-value between .05 and .10, slightly significant (weak evidence)
- If p-value > .10, not significant (no evidence)

If the alternative is 20, the test is two-tailed.

Since a is shared between both tails of the z-curve, the p-value = twice the area cut off at the tail by the computed z. This p-value is then compared with a.

a/2 = .025

/2=.025

½ p-value

z

If the alternative hypothesis is >20, this is a right-tailed test with all the a = .05 at the right tail. The p-value (to compare with a) is the area cut off at the right tail by the calculated z.

=.05

p-value

z

If the alternative hypothesis is < 20 and a = .10,

this is a left-tailed test with all the .10 at the left tail. The p-value (to compare with a) is the area cut off at the left tail by the calculated z.

=.1

p-value

z (usually negative)

What are we given? n = 20; s = 30; x = 135.2; = .01;

- Step 1, establish hypotheses
H0: = 115 vs. Ha: > 115

- Step 2, set significance level. a = .01 (given)
- Step 3, compute the test statistic
z = (135.2-115)/6.71 = 3.01

- Step 4, determine the p-value. Z-table gives P(Z < 3.01) = 0.9987. So, P(Z > 3.01) = 1- 0.9987 = .0013.
- Step 5, decision; reject Ho since p-value (.0013) < = .01
- Step 6, conclusion within context: Conclude older students appear to have better study attitude

What are we given? n = 40; s = 10; x = 138.8; = 0.01;

- Step 1, establish hypotheses
H0: = 135 vs. Ha: 135

- Step 2, set significance level. a = .01 (given)
- Step 3, compute the test statistic
z = (138.8-135)/1.58 = 2.40

- Step 4, determine the p-value. Z-table gives P(Z < 2.40) = 0.9918. So, P(Z > 2.40) = 1- 0.9918 = .0082. Since 2-tailed test, p-value = 2*.0082 = .0164
- Step 5, decision; do not reject Ho since p-value (.0164) > = .01
- Step 6, conclusion within context: insufficient evidence that national mean yield is not 135. But if we used a = .05 we would reject Ho since p-value of .0164 < .05. Conclusion would be there IS evidence mean yield is not 135.