- 83 Views
- Uploaded on
- Presentation posted in: General

Today in Class

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Last time we discussed statistical reasoning and Type I and Type II errors
- Today we’ll discuss Type I and Type II errors in more depth
- We’ll also discuss the necessity of sampling distributions and how to find the sampling distribution for a sample proportion

Hypothesis Testing Example

- I know I have 5 eggs, but I don’t know if they’re good or bad.
- I’ll make a guess that 3 are good.
- Then I can get all possible samples of 3 from that scenario.
- I note that for this hypothetical pop, it is impossible to get 3 bad eggs out of 3.

- It is also unlikely (but still possible) to get 3 good eggs out of 3.
- I’ll take a real sample, if I get either of these cases, I won’t believe the hypothesized pop.

- Recall that a Type I error is rejecting a true null hypothesis.
- If the null hypothesis (3/5 good eggs) is true, my decision rule will reject this hypothesis for 1/10 samples. Therefore, the probability of a Type I error is 0.10.
- Type II errors depend on what the true population is.

- If there are no bad eggs in the pop of 5, then all sample of 3 will have all bad eggs. I’ll reject the null hypothesis - correct decision. In this case, I can’t make a Type II error.
- If there is 1 bad egg in the pop of 5, then of the 10 possible samples, 6 samples have at least one bad egg and at least one good egg. I’ll fail to reject the false null hypothesis, and make a Type II error. Thus for this case, I have a 0.6 probability of a Type II error.

- If there are really 3 bad eggs in the pop of 5, then there is one sample (of 10 possible samples) for which I reject the null hypothesis. Thus, the probability of a Type II error is 0.90.
- If there are really 4 bad eggs in the pop of 5, then there are 4 samples (of 10) for which I will reject the null hypothesis. Probability of a Type II is 0.60.

- If there are 5 bad eggs out of 5 in the pop, then every sample has 3 bad eggs and I reject the null hypothesis. Thus, the probability of a Type II error is 0 for this case.
- I’ll demonstrate this with the coin-flip challenge.

- I make the real flips my null hypothesis, because I can characterize all the possible sets of 200 flips and their probabilities for real flips
- I’ll make a decision rule to decide whether a set of 200 flips is real or not.

- Since we must rely on samples to make inference about the population, we want to consider every possible sample from a hypothetical population.
- The sampling distribution is the characterization of a sample statistic based on every possible sample from a hypothetical population.
- Finding sampling distributions is central to statistics.

Mathematical

Use of mathematics and systematic reasoning to derive sampling distribution

Results in normal, t, c2, and F distributions (which we will study later)

Simulation

Uses a computer to mimick sampling process

Take 1000’s of samples

Relies on a sample of samples

Mathematical approach should be used whenever possible

- To determine the distribution of the longest run in 200 coin flips, I used a simulation
- Program to simulate flipping a fair coin 200 times
- Repeat the 200 flips 1000 times
- Note how often each run occurs.

- Suppose we’re drawing from a very large population and asking person if they’re a Democrat
- Suppose 50% are Democrats
- If we ask just one person, then we’ll get either a “yes” or “no”
- Ask 2 people: (Y,Y), (Y,N), (N,Y), (N,N)

- Ask 3 people, you get (YYY), (YYN), (YNY), (YNN), (NYY), (NYN), (NNY), (NNN)
- Ask 4 people, continue
- Keep going and for a large enough sample you get a bell-shaped curve!

- Symmetric and Bell-Shaped
- Total Area = 1 since it covers all possible samples
- Characterized by two quantities: the mean m and the standard deviation s
- Represents all possible samples for hypothetical population
- The mean m is the center
- The sd s is how spread the curve is

_

s

m

Increasing s makes the curve shorter and fatter

Increasing m moves the curve to the right

Areas represent probabilities of certain samples for the hypothetical population