Sampling Distribution Models

Sampling Distribution Models Ch. 18

Your Projects They are all wrong

Your Response Variable Categorical Variable Quantitative Variable You were trying to find: the actual mean for your population of interest the population parameter You actually found: the mean for your sample the sample statistic • You were trying to find: • the actual proportion for your population of interest • the population parameter • You actually found: • the proportion for your sample • the sample statistic

Sample vs. Population Population Sample a part of the population that we actually examine in order to gather information • the entire group of individuals that we want information about

Statistic vs. Parameter Parameter Statistic Summary values for a sample • Summary values for a population or model

Why Your Projects Are Wrong • sampling error – random samples drawn from the same population are likely going to give summary statistics Better term: • sampling variability

Sampling Distribution • The sampling distribution of a statistic is the distribution of proportions (for a categorical variable) or means (for quantitative variable) of all possible samples of the same size from the population.

Sampling Distribution Model • Under the right conditions, we can approximate the Sampling Distribution with a Sampling Distribution Model • We will start with proportions…

Assumptions and Conditions for Proportions • Independence Assumption: The sampled values must be independent of each other. • Randomization Condition: The sample should be a simple random sample of the population. • 10% Condition: the sample size, n, must be no larger than 10% of the population. • Success/Failure Condition: The sample size has to be big enough so that both np (number of successes) and nq (number of failures) are at least 10.

Sampling Distribution Model for Proportions • If and only if the conditions are satisfied, we can approximate the sampling distribution with a Normal model!!! Mean: Standard Deviation

The Fundamental Theorem of Statistics • The sampling distribution of any mean becomes more nearly Normal as the sample size grows. • All we need is for the observations to be independent and collected with randomization. • We don’t even care about the shape of the population distribution! • The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

The Fundamental Theorem of Statistics (cont.) • The CLT is surprising and a bit weird: • Not only does the histogram of the sample means get closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution. • The CLT works better (and faster) the closer the population model is to a Normal itself. It also works better for larger samples.

The Fundamental Theorem of Statistics (cont.) The Central Limit Theorem (CLT) The mean of a random sample is a random variable whose sampling distribution can be approximated by a Normal model. The larger the sample, the better the approximation will be.

CLT Summary • 1. The mean of the population (what we want to find) will be the same as the mean of all your many samples. • 2. The Standard Deviation of all your many samples will be the population standard deviation divided by (your sample size) • 3. The histogram of the samples will appear normal (bell shaped). • 4. The larger the sample size (n), the smaller the standard deviation will be and the more constricted the graph will be.

Assumptions and Conditions for Means • Independence Assumption: The sampled values must be independent of each other. • Randomization Condition: The sample should be a simple random sample of the population. • 10% Condition: the sample size, n, must be no larger than 10% of the population. • Large Enough Sample Condition: The CLT doesn’t tell us how large a sample we need. For now, you need to think about your sample size in the context of what you know about the population.

Sampling Distribution Model for Means • If and only if the conditions are satisfied, we can approximate the sampling distribution with a Normal model!!! Mean: Standard Deviation

Sampling Distribution Models