1 / 62

Sampling

Sampling. One application of statistics is to determine the “readability” of various books and articles. One simple way to do this is to measure the average word length. Consider, the Gettysburg Address by Abraham Lincoln, one of the most famous speeches of all time.

Download Presentation

Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling One application of statistics is to determine the “readability” of various books and articles. One simple way to do this is to measure the average word length. Consider, the Gettysburg Address by Abraham Lincoln, one of the most famous speeches of all time. Method 1: Use 5 words of your choice to estimate the average word length. (Then we will make a dotplot of class results.)

  2. Sampling Method 2: Now, instead of you choosing the words, we will use random chance to select them. Each word is numbered from 1 to 268.

  3. Sampling TI-83: Repeatedly use Math: Prb: RandInt (1,268) to get five random numbers (no repeats!). Find the average length of the corresponding words. (Then we will make a dotplot of class results.) Note: For new calculators, seed the random number generator first: TI-83: your phone number STO─> Math: Prb: Rand Enter

  4. Sampling How do the dotplots compare? (center, shape, spread) Which method is more accurate? Why? Second method is more accurate. (true average: 4.287) In the first method there is a tendency to pick large words.

  5. Sampling When a statistician is using a sample to estimate something about a population, there are 2 major problems. Def: _____ is the tendency for a sample to differ from the corresponding population in some systematic way. Bias can be a major problem when conducting an observational study. To eliminate bias, we need to let chance do the choosing! When we chose which words to use, we were drawn to the larger words and our samples were therefore biased.

  6. Sampling Def: The _____________ of an estimate refers to the range of values that estimate can take in repeated sampling. If there is a lot of variability it is difficult to be precise in our estimation. Find the average length of two randomly selected words and compare the variability with n = 5. (Then we will make a dotplot of class results.) What happens when we increase the sample size? Find the average length of 10 random words and compare the dotplots.

  7. Sampling Bias and Variability: Examples Suppose we are trying to estimate a population mean (μ) by taking a sample and finding the sample mean ( ). Draw dot plots showing sample distributions with • high bias and low variability • low bias and high variability

  8. Sampling The size of the _____________ has no effect on the variability of an estimate. In the discrimination activity in the first lesson, you picked 10 beads from a bag. Do you think it would change the results if instead you picked beads from a barrel (with the same proportion of black beads to white beads)?

  9. Bias in Sampling Def: ________ is the tendency for a sample to differ from the population of interest in some systematic way. Def: ______________ (often called undercoverage bias) is introduced when some part of the population is systematically underrepresented in the sample. Ex: • phone survey: • Gettysburg Address:

  10. Bias in Sampling In all sampling procedures, it is very important that every member of the population is given an equal chance to be chosen for the sample! For example, suppose we wanted to estimate the average GPA of students at WHS. If we took a random sample of students entering the front gate from 6:20-7:20 am, would there be a selection bias here?

  11. Bias in Sampling • What part of the population is underrepresented in such a sample? • Will our estimate be accurate?

  12. Bias in Sampling Selection bias also occurs when volunteers self-select themselves for a sample. People who voluntarily respond to surveys tend to have different and stronger opinions than the rest of the population. Ex: • phone-in surveys • magazine mail-in surveys • internet surveys • American Idol

  13. Bias in Sampling Def: ______________ (or measurement bias) occurs when our method of collecting the data tends to produce values that systematically differ from the true population value in some way. Ex: • using a faulty instrument a miscalibrated scale, thermometer, etc.

  14. Bias in Sampling • wording of questions: changes in wording may influence the response - The Holocaust Poll Error:

  15. Bias in Sampling • wording of questions: changes in wording may influence the response - Use of “welfare” (negative connotations) vs. “assistance to the poor” (more neutral) - A survey conducted several years ago included the following questions “Do you support freedom of speech?” “Do you think that any group, no matter how extreme should be allowed to promote their agenda in a public space?” (paraphrase). The response to the first question was much more positive although they basically ask the same question.

  16. Bias in Sampling • characteristics of the interviewer:subjects may want to impress the interviewer, may not want to offend the interviewer, etc. - “How much did you exercise this past month?” A doctor asking this question might get different answers than a friend. - A company survey asking the question “How much do you surf the internet at work?” gets different results than a home phone survey.

  17. Bias in Sampling • human nature: people often lie to avoid giving embarrassing answers, to impress people around, … - “How many sex partners have you had?” - “How much money did you make last year?”

  18. Bias in Sampling Def: ___________________ occurs when responses are not actually obtained from subjects chosen for the sample. Ex: “Polling’s Dirty Little Secret”

  19. Bias in Sampling Very few surveys, if any, have a 100% response rate, but every effort should be made to make this rate as high as possible. Personal interviews have a better response rate, but are more costly than mail or phone surveys. In all three methods, it is important to follow up on subjects who do not respond the first time. Note: Increasing the sample size is usually a good idea, but if there is bias present, even a very large sample will probably be worthless.

  20. Simple Random Sampling As we discovered with the Gettysburg Address, it is very important to _____________________ members of the sample to avoid selection bias. There are many random sampling procedures, the most basic being a simple random sample.

  21. Simple Random Sampling Def: A _____________________ (SRS) of size n is a sample from the population that is selected in a way that ensures that every member of the population has an equal chance of being selected AND every sample of size n has the same chance of being chosen.

  22. Simple Random Sampling For example, to select a SRS of size 4 from this class, we could write each name on a slip of paper, mix them up, and select 4 names. In this way, each member of the population has the same chance of being chosen, as does each possible group of size 4.

  23. Simple Random Sampling Suppose that a class is half boys and half girls. To get a sample of size 4 from this class, we could write the name of each boy on a slip of paper, mix them up, and select 2. Do the same for the girls. Why is not this an SRS? Note: This procedure is called is called a stratified random sample. More about this later…

  24. Simple Random Sampling Def: A ________________ is a list of all the objects or individuals in the population. If the sampling frame does not include every member, what kind of bias is this?

  25. Simple Random Sampling For example, if this class is the population, then I can use my roll sheet as a sampling frame. To choose an SRS, I could assign each member a number, and then use • random number generator or • a random digit table to select the sample

  26. Simple Random Sampling When choosing a sample in this way, occasionally the same number will be selected twice. If we allow this, it is called _________________________. In most cases, statisticians do not want to use the same person more than once. This is called ____________ _________________ because after a person is selected, he is not replaced in the sampling frame. However, when the sample size is small relative to the population size (< 10%), there is little practical difference.

  27. Simple Random Sampling What are some advantages to using an SRS? • each person and each group is given the same chance to be chosen • most of the inference procedures we will learn are based on an SRS

  28. Simple Random Sampling What are some disadvantages to using a SRS? • you need a sampling frame. • an SRS does not guarantee that the sample will be representative of the population. It is possible that certain groups are over- or under-represented, simply by chance. • if the population is large, you would need a lot of slips of paper and a really large hat! ;-)

  29. Stratified Random Sampling Def: ___________________________ is a method of random sampling which seeks to reduce the variability of a SRS by selecting a random sample from each subgroup of the population. This guarantees that each subgroup, or stratum, is properly represented in the overall sample.

  30. Stratified Random Sampling Note: To be most effective, the members of each stratum should be as similar as possible with regard to the question of interest and very different than the members of the other strata.

  31. Stratified Random Sampling Suppose we wanted to get a stratified random sample of WHS to answer a question about rallies. Since freshman may have different views than seniors, sophomores have different views than juniors, etc., we want to make sure each group is properly represented in our sample.

  32. Stratified Random Sampling Suppose there are 600 freshman, 500 sophomores, 500 juniors, and 400 seniors. If we wanted to take a stratified random sample of size 100, how many of each class should be included? Once we determine the number of subjects to select from each stratum, we take an SRS within each stratum.

  33. Stratified Random Sampling What are the advantages to this method? • It helps to ensure that the sample is representative of the various subgroups within the population. No group will be over- or under-represented. • If strata are chosen correctly, stratifying reduces the variability that is possible in an SRS of the same size. Thus, we can either keep the sample size the same and have more precision OR keep the same precision and reduce the sample size (and costs).

  34. Stratified Random Sampling The River Problem Suppose we wanted to estimate the yield of a corn field. The field is square and divided into 16 equally sized plots (4 rows x 4 columns). A river runs along the eastern edge of the field. We want to take a sample of 4 plots.

  35. Stratified Random Sampling Using a random number generator, pick a simple random sample of 4 plots. Place an X in the 4 plots that you choose.

  36. Stratified Random Sampling Now, randomly choose one plot from each horizontal row. This is called a stratified random sample.

  37. Stratified Random Sampling Finally, randomly choose one plot from each vertical column. This is also a stratified random sample.

  38. Stratified Random Sampling Now, its time for the harvest! For each of your three samples above, calculate the average yield.

  39. Stratified Random Sampling Make dotplots for each of the three cases.

  40. Stratified Random Sampling What happened? When we stratified by columns, the V___________ of our estimate was greatly reduced. Also note that each plot was centered in the same place, suggesting all 3 methods are U___________________.

  41. Stratified Random Sampling Why does this work? With an SRS, it is possible that we randomly choose 4 plots near the river (giving an estimate that is way too high) or that we choose 4 plots far from the river (giving an estimate that is way too small). However, when we use each column as a stratum, we are guaranteed to get one plot close to the river (high yield), one plot far from the river (low yield), etc. This guarantees that we will have a representative sample with respect to the river.

  42. Stratified Random Sampling When should we stratify? If you think there are groups within the population who may be D____________ with regard to the question of interest, you should take an appropriately sized simple random sample from each group.

  43. Stratified Random Sampling In our example, we should anticipate that the river will have an effect on the yield of the plots. Thus, since the plots near the river are similar to each other (but different than the rest of the plots) stratifying by columns is the best method.

  44. Stratified Random Sampling Ex: population: United States adults question of interest: affirmative action possible strata: non-effective strata:

  45. Stratified Random Sampling Ex: population: WHS question of interest: AP Program possible strata: non-effective strata:

  46. Stratified Random Sampling Note: The reason why we stratify is to get a representative sample and reduce the variability that is possible in an SRS. The purpose is NOT to compare the results between strata, although this is a secondary benefit.

  47. Stratified Random Sampling What are the disadvantages to this method? We need a sampling frame which includes characteristics for the entire population to use when stratifying. This could be difficult when the population is large. The statistical analysis is more difficult with a stratified random sample.

  48. Other Sampling Methods Next

More Related