1 / 35

STAT 111 Introductory Statistics

STAT 111 Introductory Statistics. Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004. Today’s Topics. More on the binomial distribution Mean and variance Sample proportion Normal approximation of the binomial Continuity correction

kaemon
Download Presentation

STAT 111 Introductory Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT 111 Introductory Statistics Lecture 8: More on the Binomial Distribution and Sampling Distributions June 1, 2004

  2. Today’s Topics • More on the binomial distribution • Mean and variance • Sample proportion • Normal approximation of the binomial • Continuity correction • Sampling distribution of sample means • Central Limit Theorem

  3. Recall: The Binomial Setting • There are a fixed number n of trials. • The n trials are all independent. • Each trial has one of two possible outcomes, labeled “success” and “failure.” • The probability of success, p, remains the same for each trial.

  4. Recall: The Binomial Distribution • The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameter n and p, where • n is the number of trials • p is the probability of a success on any trial • The count X is a discrete random variable, typically abbreviated as X ~ B(n, p). • Possible values of X are the whole numbers from 0 to n.

  5. The Binomial Distribution • If X ~ B(n,p), then • Examples: Let n = 3.

  6. Developing Binomial Probabilities for n = 3 S3 P(SSS) = p3 P(SSF) = p2(1 – p) P(SFS) = p2(1 – p) P(SFF) = p(1 – p)2 P(FSS) = p2(1 – p) P(FSF) = p(1 – p)2 P(FFS) = p(1 – p)2 P(FFF) = (1 – p)3 S2 p S1 p F3 1-p F2 S3 p p 1-p F3 1-p S2 S3 p p F3 1-p 1-p S3 p F1 1-p F2 1-p F3

  7. Binomial Probabilities for n = 3 • Let X be the number of successes in three trials. P(FFF) = (1 – p)3 P(SSF) = p2(1 – p) P(SFS) = p2(1 – p) P(SFF) = p(1 – p)2 P(FSS) = p2(1 – p) P(FSF) = p(1 – p)2 P(FFS) = p(1 – p)2 P(SSS) = p3 P(X = 0) = (1 – p)3 P(X = 1) = 3p(1 – p) 2 P(X = 2) = 3p2(1 – p) P(X = 3) = p3 X=0 X=1 X=2 X=3

  8. Example: Rolling a Die • Roll a die 4 times, let X be the number of times the number 5 appears. • “Success” = get a roll of 5, so P(Success) = 1/6.

  9. Example: Rolling a Die • Find the probability that we get at least 2 rolls of 5.

  10. Expected Value and Variance of a Binomial Random Variable • If X~B(n,p),then

  11. Set-up for Derivation • Let Xiindicate whether the i th trial is a success or failure by, • X1, …, Xn are independent and identically distributed with probability distribution Xi =1, if ith trial is a success i = 1,2,….,n. Xi =0, if ith trial is a failure.

  12. Binomial Example: Checkout Lanes • A grocery store has 10 checkout lanes. During a busy hour the probability that any given lane is occupied (has at least one customer) is 0.75. Assume that the lanes are occupied or not occupied independently of each other. • What is the probability that a customer will find at least one lane unoccupied? • What is the expected number of occupied lanes? • What is the standard deviation of the number of occupied lanes?

  13. Sample Proportions • In statistical sampling we often want to estimate the proportion p of “successes” in a population. • The sample proportion is defined as • If the count X is B(n, p), then the mean and standard deviation of the sample proportion are

  14. Sample Proportions • Our sample proportion is an unbiased estimator of the population proportion p. • The variability of our estimator decreases as sample size increases. • In particular, we must multiply the sample size by 4 if we want the cut the standard deviation in half.

  15. Sample Proportions • The histogram of the distribution of the sample proportion when n = 1000, p = 0.6

  16. Normal Approximation for Counts, Proportions • Let X be the number of successes in a SRS of size n from a large population having proportion p of successes, and let the sample proportion of successes be denoted by • Then for large n, • X is approximately normal with mean np and variance np(1 – p). • is approximately normal with mean p and variance p(1 – p) / n.

  17. Normal Approximation: Rule of Thumb • The accuracy of the approximation generally improves as the sample size n increases. • For any fixed sample size, the approximation is most accurate when p is close to 0.5, and least accurate when p is near 0 or 1. • As a general rule of thumb, then, we use the normal approximation for values of n and p such that np ≥ 10 and n(1 – p) ≥ 10.

  18. Example • The Laurier Company’s brand has a market share of 30%. Suppose that in a survey, 1,000 consumers of the product are asked which brand they prefer. What is the probability that more than 32% of the respondents will say they prefer the Laurier brand?

  19. Another Example • A quality engineer selects an SRS of size 100 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. The actual binomial probability that no more than 9 of the switches in the sample fail inspection is P(X ≤ 9) = .4513. • How accurate is the normal approximation for this probability?

  20. Another Example (cont.) • Let X be the number of bad switches; then X ~ B(100, 0.1). • It’s not that accurate. Note that np = 10, so n and p are on the border of values for which we are willing to use the approximation.

  21. Continuity Correction • While the binomial distribution places probability exactly on X = 9 and X = 10, the normal distribution spreads probability continuously in that interval. • The bar for X = 9 in a probability histogram goes from 8.5 to 9.5, but calculating P(X ≤ 9) using the normal approximation only includes the area to the left of the center of this bar. • To improve the accuracy of our approximation, we should let X = 9 extend from 8.5 to 9.5, etc.

  22. Continuity Correction • Use continuity correction to approximate the binomial probability P(X=10) when n=100, p=0.1 • Using the normal approximation to the binomial distribution, X is approximately distributed as N(10, 3).

  23. Continuity Correction The exact binomial probability is P(X=10)=0.13187 P(9.5<Xnormal<10.5)=0.13237 9.5 10 10.5 P(Xbinomial=10)=0.13187

  24. Continuity Correction 8.5 8 Q: what about continuity correction for P(X<8)?

  25. Continuity Correction 14 13.5 Q: what about continuity correction for P(X>14)?

  26. Example Re-visited • Using the continuity correction, the probability that no more than 9 of the switches in the sample fail inspection is

  27. Example: Inspection of Switches • Find the probability that at least 5 but at most 15 switches fail the inspection.

  28. Sampling Distributions • Counts and proportions are discrete random variables; used to describe categorical data. • Statistics used to describe quantitative data are most often continuous random variables. • Examples: sample mean, percentiles, standard deviation • Sample means are among the most common statistics.

  29. Sampling Distributions • Regarding sample means, • They tend to be less variable than individual observations. • Their distribution tends to be more normal than that of individual observations. • We’ll see why later.

  30. Sampling Distributions of Sample Means • Let be the mean of an SRS of size n from a population having mean µ and standard deviation σ. • The mean and standard deviation of are • Why?

  31. Sampling Distributions of Sample Means • The shape of the distribution of the sample mean depends on the shape of the population distribution itself. • One special case: normal population distribution • Because: any linear combination of independent normal random variables is normal distributed.

  32. Example • The foreman of a bottling plant has observed that the amount of soda pop in each “32-ounce” bottle is actually a normally distributed random variable, with a mean of 32.2 ounces and a standard deviation of .3 ounce. • If a customer buys one bottle, what is the probability that that bottle contains more than 32 ounces? • If that same customer instead buys a carton of 4 bottles, what is the probability that the mean of those 4 bottles is greater than 32 ounces?

  33. Example • The starting salaries of M.B.A.s at Wilfrid Laurier Univ.(WLU) are normally distributed with a mean of $62,000 and a standard deviation of $14,500. The starting salaries of M.B.A.s at the University of Western Ontario (UWO) are normally distributed with a mean of $60,000 and a standard deviation of $18,300. • A random sample of 50 WLU M.B.A.s and a random sample of 60 UWO M.B.A.s are selected • What is the probability that the sample mean of WLU graduates will exceed that of the UWO graduates?

  34. Central Limit Theorem • When the population distribution is normal, so is the sampling distribution of • What about when the population distribution is non-normal? • For large sample sizes, it turns out that the distribution of gets closer to a normal distribution. • As long as the population has finite standard deviation, this will be true regardless of the actual shape of the population distribution

  35. Central Limit Theorem • Formally, draw an SRS of size n from any population with mean µ and finite standard deviation σ. • As n approaches infinity (gets very large) • This can hold even if the observations are not independent or identically distributed. • This is why normal distributions are common models for observed data.

More Related