1 / 60

45-733: lecture 7 (chapter 6)

45-733: lecture 7 (chapter 6). Sampling Distributions. Samples from populations. There is some population we are interested in: Families in the US Products coming off our assembly line Consumers in our product’s market segment Employees. Samples from populations.

atalkington
Download Presentation

45-733: lecture 7 (chapter 6)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 45-733: lecture 7 (chapter 6) Sampling Distributions William B. Vogt, Carnegie Mellon, 45-733

  2. Samples from populations • There is some population we are interested in: • Families in the US • Products coming off our assembly line • Consumers in our product’s market segment • Employees William B. Vogt, Carnegie Mellon, 45-733

  3. Samples from populations • We are interested in some quantitative information (called variables) about these populations: • Income of families in the US • Defects in products coming off our assembly line • Perception of consumers of our product • Productivity of our employees William B. Vogt, Carnegie Mellon, 45-733

  4. Samples from populations • All the information (accessible to statistics) about a quantity in a population is contained in its distribution function • Real-world distribution functions are complicated things • In real life, we usually know little or nothing about the distribution functions of the variables we are interested in William B. Vogt, Carnegie Mellon, 45-733

  5. Samples from populations • Because distribution functions are complex, we only try to find out about certain aspects of them (parameters): • Average income of families in the US • Rate of defects coming off our production line • % of customers who view our product favorably • Average pieces/hour finished by a worker William B. Vogt, Carnegie Mellon, 45-733

  6. Samples from populations • Of course, we do not begin by knowing even these quantities • One possibility is to measure the whole population • Allows us to answer any question about the distribution or parameters, using the techniques of chapter 2 • However, this is almost always expensive and often infeasible William B. Vogt, Carnegie Mellon, 45-733

  7. Samples from populations • Instead, we take a sample • Taking a sample • We select only a few of the members of the population • We measure the variables of interest for those members we select • Examples • Phone survey • Take 1 out of each 10,000 units off our prod line William B. Vogt, Carnegie Mellon, 45-733

  8. Samples from populations • The whole of statistics is figuring out what we can learn about the population from a sample: • What can we say about the distribution of a variable from the information in a sample? • What can we say about the parameters we are interested in from our sample? • How good is the information in our sample about the population? William B. Vogt, Carnegie Mellon, 45-733

  9. Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of our 5 good friends and ask them if they view our product favorably or unfavorably • All 5 say favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733

  10. Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of 500 people who have purchased our product before and ask them if they view our product favorably or unfavorably • 466 say they view our product favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733

  11. Samples from populations • Example: • We are interested in how favorably our product is viewed by customers • We do a phone survey of 500 random adults and ask them if they view our product favorably or unfavorably • 351 say they view our product favorably • What can we conclude? William B. Vogt, Carnegie Mellon, 45-733

  12. Samples and statistics • As a practical matter, we are usually interested in using our sample to say something about a parameter of the distribution we care about • To get at this parameter, we construct a variable called an estimator or statistic William B. Vogt, Carnegie Mellon, 45-733

  13. Samples and statistics • Example: • If we want to know the average income of families in the US, we draw a sample from a random phone survey of 1000 families • We ask, among other things, for their family income • To estimate E(I), we calculate the estimator or statistic called sample mean: William B. Vogt, Carnegie Mellon, 45-733

  14. Samples and statistics • Example: • But, what does the sample mean of income tell us about E(I)? • Answering this question is the subject of the rest of the course, and of statistics in general William B. Vogt, Carnegie Mellon, 45-733

  15. Random sampling • There are different ways to sample a population, different sampling schemes • The simplest sampling scheme is called “simple random sampling” or just “random sampling” • If there is a population of size N from which we are to draw a sample of size n, random sampling just says that the probability of any one of the N members of the population being drawn is just 1/N, and that the draws are independent. William B. Vogt, Carnegie Mellon, 45-733

  16. Statistic or estimator • A statistic (or estimator) is any function of a sample • It is an algorithm which tells us what we would do given a sample • Example: • Sample mean: • Sample variance: William B. Vogt, Carnegie Mellon, 45-733

  17. Statistic as random variable • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! • A statistic is a random variable!! William B. Vogt, Carnegie Mellon, 45-733

  18. Statistic as random variable • A simple example • Consider the Bernoulli random variable X with parameter p • We are interested in p, the probability of a success • To estimate p, we will calculate the sample mean of X: William B. Vogt, Carnegie Mellon, 45-733

  19. Statistic as random variable • A simple example • First, with a sample size of n=1: William B. Vogt, Carnegie Mellon, 45-733

  20. Statistic as random variable • A simple example • Next, with a sample size of n=2: William B. Vogt, Carnegie Mellon, 45-733

  21. Statistic as random variable • A simple example • Next, with a sample size of n=3: William B. Vogt, Carnegie Mellon, 45-733

  22. Statistic as random variable • The statistic is a random variable • It has a distribution • Probability function or density • Cumulative distribution function • It has an expectation • It has a variance / standard deviation William B. Vogt, Carnegie Mellon, 45-733

  23. Statistic as random variable • For the Bernoulli example • Expectation, variance with n=1 William B. Vogt, Carnegie Mellon, 45-733

  24. Statistic as random variable • For the Bernoulli example • Expectation, variance with n=2 William B. Vogt, Carnegie Mellon, 45-733

  25. Statistic as random variable • For the Bernoulli example • Expectation, variance with n=3 William B. Vogt, Carnegie Mellon, 45-733

  26. Statistic as random variable • For the Bernoulli example • Probability function, n=1 p 1-p 0 p 1 William B. Vogt, Carnegie Mellon, 45-733

  27. Statistic as random variable • For the Bernoulli example • Probability function, n=2 0 p 1/2 1 William B. Vogt, Carnegie Mellon, 45-733

  28. Statistic as random variable • For the Bernoulli example • Probability function, n=3 0 1/3 2/3 1 p William B. Vogt, Carnegie Mellon, 45-733

  29. Sample mean • As we have discussed before, the sample mean of a random variable X from a sample of size n is: William B. Vogt, Carnegie Mellon, 45-733

  30. Sample mean • The sample mean is a random variable!! • Sample mean is made out of n random variables; therefore, it is a random variable: William B. Vogt, Carnegie Mellon, 45-733

  31. Sample mean • Let’s suppose X is a random variable with mean X and standard deviation X, and let’s consider the sample mean: William B. Vogt, Carnegie Mellon, 45-733

  32. Sample mean • Since the sample mean is a random variable, we can ask about its expectation: William B. Vogt, Carnegie Mellon, 45-733

  33. Sample mean • Since the sample mean is a random variable, we can ask about its expectation: William B. Vogt, Carnegie Mellon, 45-733

  34. Sample mean • The expectation of the sample mean is equal to the expectation of the underlying random variable • On average, the sample mean is equal to the underlying random variable William B. Vogt, Carnegie Mellon, 45-733

  35. Sample mean • We can also ask about the variance of the sample mean: William B. Vogt, Carnegie Mellon, 45-733

  36. Sample mean • If it is an independent, random sample then the covariances are all zero: William B. Vogt, Carnegie Mellon, 45-733

  37. Sample mean • The variance of the sample mean is less than the variance of the underlying random variable • The variance of the sample mean gets smaller as the sample size increases • The variance of the sample mean goes to zero as the sample size goes to infinity William B. Vogt, Carnegie Mellon, 45-733

  38. Sample mean • Our two results: William B. Vogt, Carnegie Mellon, 45-733

  39. Sample mean • Say that: • On average, the sample mean is equal to the mean of the underlying random variable, regardless of sample size • As the sample size grows, the variance of the sample mean shrinks, eventually approaching zero William B. Vogt, Carnegie Mellon, 45-733

  40. Sample mean • What would happen if the sample size “got to” infinity? • Then the sample mean would no longer be a random variable, it would literally equal the population mean, E(X): William B. Vogt, Carnegie Mellon, 45-733

  41. Sample mean • Suppose X~N(1,1). n=100 n=1 William B. Vogt, Carnegie Mellon, 45-733

  42. Sample mean • Suppose X~N(1,1). n=1000 n=100 n=1 William B. Vogt, Carnegie Mellon, 45-733

  43. Sample mean • Finite sample correction • What has gone before has assumed either that you sample with replacement or that the population you are sampling from is very large (infinite) • Just as we needed to use hypergeometric rather than binomial when sampling from a small pop without replacement, so here: William B. Vogt, Carnegie Mellon, 45-733

  44. Sample mean • Finite sample correction • For a population of size N, sampled without replacement by a sample of size n: William B. Vogt, Carnegie Mellon, 45-733

  45. Sample mean • Normal variables and • If X is normal, then so is X-bar • If X is normal, then: William B. Vogt, Carnegie Mellon, 45-733

  46. Sample mean • Central limit theorem and: • As long as X comes from an independent random sample: William B. Vogt, Carnegie Mellon, 45-733

  47. Sample proportion • Consider W a Bernoulli and an independent random sample of size n • Observe that X= W1+ W2+…+ Wn is distributed Binomial (and therefore approx normal) William B. Vogt, Carnegie Mellon, 45-733

  48. Sample proportion • The sample mean (I.e. sample proportion) is: • Just a binomial divided by n • Also approx normal William B. Vogt, Carnegie Mellon, 45-733

  49. Sample proportion • To emphasize that we are estimating the p parameter of the Bernoulli, we may write: William B. Vogt, Carnegie Mellon, 45-733

  50. Sample proportion • Just as before, the sample mean has the same expectation as the underlying Bernoulli random variable: William B. Vogt, Carnegie Mellon, 45-733

More Related