1 / 76

Understanding Sampling and Surveys

Learn about the importance of sampling in statistical studies and how to avoid biased results. Explore different sampling methods and the concept of random sampling.

lwinkler
Download Presentation

Understanding Sampling and Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4Designing Studies • 4.1 Samples and Surveys Chapter 4 2, 4, 6, 8, 12, 18, 22, 24, 26, 36, 46, 52, 56, 58, 64, 68, 72, 74, 78, 80, 84, 88, 102, 104 Go to http://bcs.whfreeman.com/tps4e And set up a student account

  2. You can hardly go a day without hearing the results of a statistical study. Here are some examples: Social Media and Teens Underage Drinking • According to a Common Sense Media study, nine out of ten 13- to 17-year-olds have used some form of social media. Three out of four teenagers currently have a profile on a social networking site, and one in five has a current Twitter account. 68% of all teens say Facebook is their main social networking site, compared to 6% for Twitter, 1% for Google Plus, and 1% for MySpace. • During the past month (30 days), 26.4% of underage persons (ages 12-20) used alcohol, and binge drinking among the same age group was 17.4%. SAMHSA Can we trust the results? That depends on how the data were produced.

  3. Sampling and Surveys • Population and Sample The distinction between population and sample is basic to statistics. To make sense of any sample result, you must know what population the sample represents Definition: The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. Population Collect data from a representative Sample... Sample Make an Inference about the Population.

  4. ACU#1 • Sampling Students and Soda • Problem: Identify the population and sample in each of the following settings. • The student government at a high school surveys 100 of the students at the school to get their opinions about a change to the bell schedule. • (a) Population is all students at the school, sample is the 100 students surveyed. • (b) The quality control manager at a bottling company selects a sample of 10 cans from the production line every hour to see if the volume of the soda is within acceptable limits. • (b) Population is all cans produced that hour, sample is 10 cans inspected.

  5. Sampling and Surveys • The Idea of a Sample Survey We often draw conclusions (inferences) about a whole population on the basis of a sample. Choosing a sample from a large, varied population is not that easy. Step 1:Define the population we want to describe. Step 2: Say exactly what we want to measure. Step 3: Decide how to choose a sample from the population. A “sample survey” is a study that uses an organized plan to choose a sample that represents some specific population.

  6. Sampling and Surveys • How to Sample Badly How can we choose a sample that we can trust to represent the population? There are a number of different methods to select samples. Definition: Choosing individuals who are easiest to reach results in a convenience sample. Convenience samples often produce unrepresentative data…why? Definition: The design of a statistical study shows bias if it systematically favors certain outcomes.

  7. Sampling and Surveys • How to Sample Badly • Convenience samples are almost guaranteed to show bias. So are voluntary response samples, in which people decide whether to join the sample in response to an open invitation. Definition: A voluntary response sample consists of people who choose themselves by responding to a general appeal. Voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond.

  8. CU#1 • For each of the following situations, identify the sampling method used. Then explain how the sampling method could lead to bias. • A farmer brings a juice company several crates of oranges each week. A company inspector looks at 10 oranges from the top of each crate before deciding whether to buy all the oranges. • The ABC program Nightline once asked whether the United Nations should continue to have its headquarters in the US. Viewers were invited to call one telephone number to respond “Yes” and another for “No”. There was a charge for calling either number. More that 186,000 callers responded and 67% said, “No”.

  9. Sampling and Surveys • How to Sample Well: Random Sampling • The statistician’s remedy is to allow impersonal chance to choose the sample. A sample chosen by chance rules out both favoritism by the sampler and self-selection by respondents. • Random sampling, the use of chance to select a sample, is the central principle of statistical sampling. Definition: A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. In practice, people use random numbers generated by a computer or calculator to choose samples. If you don’t have technology handy, you can use a table of random digits.

  10. Definition: A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. Select a sample of size 4 from our class to illustrate the definition of SRS Need two recorders Let’s force the number the sample to have 2 girls and 2 boys. Why is this not an SRS? Since samples of 4 that do not contain 2 girls and 2 boys have no chance of occurring. In an SRS, each possible sample of 4 has the same chance of occurring. In practice, people use random numbers generated by a computer or calculator to choose samples. If you don’t have technology handy, you can use a table of random digits.

  11. Definition: • A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: • • Each entry in the table is equally likely to be any of the 10 digits 0 - 9. • • The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. Sampling and Surveys • How to Choose an SRS How to Choose an SRS Using Table D Step 1: Label. Give each member of the population a numerical label of the same length. Step 2: Table. Read consecutive groups of digits of the appropriate length from Table D. Your sample contains the individuals whose labels you find.

  12. Sampling and Surveys • Example: How to Choose an SRS • Problem: Use Table D at line 130 to choose an SRS of 4 hotels. 01 Aloha Kai 08 Captiva 15 Palm Tree 22 Sea Shell 02 Anchor Down 09 Casa del Mar 16 Radisson 23 Silver Beach 03 Banana Bay 10 Coconuts 17 Ramada 24 Sunset Beach 04 Banyan Tree 11 Diplomat 18 Sandpiper 25 Tradewinds 05 Beach Castle 12 Holiday Inn 19 Sea Castle 26 Tropical Breeze 06 Best Western 13 Lime Tree 20 Sea Club 27 Tropical Shores 07 Cabana 14 Outrigger 21 Sea Grape 28 Veranda 69051 64817 87174 09517 84534 06489 87201 97245 69 05 16 48 17 87 17 40 95 17 84 53 40 64 89 87 20 Our SRS of 4 hotels for the editors to contact is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club.

  13. ACU#2

  14. SRS using our Graphing Calculators (Tech Corner pg 214) Seed the random number generator for TI-83/84, enter in a unique number on your screen (cell phone number, student number). Then, press the STO button (just above the ON button). Finally, have them press the MATH button, scroll to the PRB menu, choice the first option, rand, and press ENTER 01 Aloha Kai 08 Captiva 15 Palm Tree 22 Sea Shell 02 Anchor Down 09 Casa del Mar 16 Radisson 23 Silver Beach 03 Banana Bay 10 Coconuts 17 Ramada 24 Sunset Beach 04 Banyan Tree 11 Diplomat 18 Sandpiper 25 Tradewinds 05 Beach Castle 12 Holiday Inn 19 Sea Castle 26 Tropical Breeze 06 Best Western 13 Lime Tree 20 Sea Club 27 Tropical Shores 07 Cabana 14 Outrigger 21 Sea Grape 28 Veranda

  15. A university’s financial aid office wants to know how much it can expect students to earn from summer employment. This information will be used to set the level of financial aid. The population contains 3478 students who have completed at least one year of study but have not yet graduated. A questionnaire will be sent to an SRS of 100 of these students drawn from an alphabetized list. • Describe how you will select the sample. • (b) Starting at line 135, use the portion of the random digits table below to select the first three students in the sample. • 135 66925 55658 39100 78458 11206 19876 87151 31260 • 136 08421 44753 77377 28744 75592 08563 79140 92454 • 137 53645 66812 61421 47836 12609 15373 98481 14592

  16. The basic idea of good sampling is straightforward: take an SRS from the population and use your sample results to gain information about the population. • SRS: • every individual has = chance of getting picked, • every sample of the size you are drawing has • = chance of getting picked Unfortunately, it’s usually very difficult to actually get an SRS from the population of interest. Sometimes, there are also statistical advantages to using more complex samplings methods.

  17. Definition: To select a stratified random sample, first classify the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. If the individuals in each stratum are less varied than the population as a whole, a stratified random sample can often produce better information about the population than an SRS of the same size. Strata Same Within and Different Between EX. The population of students in a large high school might be divided into freshman, sophomore, junior, and senior strata.

  18. Activity: Sampling Sunflowers pg. 216 • Use Table D or technology to take an SRS of 10 grid squares, then using the rows as strata. Then, repeat using the columns as strata.

  19. Cell Phones Only A growing percentage of Americans are dropping their traditional land line phones and using their cell phones exclusively. This has caused a problem for polling organizations who in the past only used land lines when randomly selecting people for their polls. Since the group of people who use cell phones exclusively are different in many ways than people who do not rely exclusively on cell phones, polling organizations now use a stratified sampling design that selects a random sample of cell phone users and a random sample of land line users.

  20. Sampling and Surveys • Other Sampling Methods • Although a stratified random sample can sometimes give more precise information about a population than an SRS, both sampling methods are hard to use when populations are large and spread out over a wide area. • In that situation, we’d prefer a method that selects groups of individuals that are “near” one another. Definition: To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters should mirror the characteristics of the population. Then choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample. Cluster Different Within and Same Between

  21. Sampling Sunflowers • Which, rows or columns, could be used for clusters? Why?

  22. Strata (Same Within group/Different Between groups) Rows or Section? Cluster(Different Within group/Same Between groups) Rows or Section?

  23. A Hotel on the Beach The manager of a beach-front hotel wants to survey guests in the hotel to estimate overall customer satisfaction. The hotel has two towers, an older one to the south and a newer one to the north. Each tower has 10 floors of standard rooms (40 rooms per floor) and 2 floors of suites (20 suites per floor). Half of the rooms in each tower face the beach, while the other half of the rooms face the street. This means there are (2 towers)(10 floors)(40 rooms) + (2 towers)(2 floors)(20 suites) = 880 total rooms. Problem: (a) Explain how to select a simple random sample of 88 rooms. (b) Explain how to select a stratified random sample of 88 rooms. (c) Explain why selecting 2 of the 24 different floors would not be a good way to obtain a cluster sample. (b) Since customer satisfaction will vary based on room location (e.g. a suite in the north tower facing the ocean is much nicer than a room in the south tower facing the street), we should take a sample from each type of room. We will take an SRS of 20 from the 200 south tower rooms facing the beach SRS of 20 from the 200 south tower rooms facing the street SRS of 20 from the 200 north tower rooms facing the beach SRS of 20 from the 200 north tower rooms facing the street SRS of 2 from the 20 south tower suites facing the beach SRS of 2 from the 20 south tower suites facing the street SRS of 2 from the 20 north tower suites facing the beach SRS of 2 from the 20 north tower suites facing the street (c) Although it would be easy to collect the data once the floors were selected (since you only need to visit two floors), each floor is more homogenous than heterogeneous, which is a bad things for clusters. Ideally, each cluster should include all the different types of rooms. If you only selected 2 floors, it is fairly likely that you would get no suites in the sample or only get one of the towers in the sample. a) Number each of the rooms from 001 to 880. Using a random number generator, select 88 unique random integers from 001 to 880. Use the selected rooms for the sample

  24. 1. Your school will send a delegation of 35 seniors to a student life convention. 200 girls and150 boys are eligible to be chosen. If a sample of 20 girls and separate sample 15 boys are each selected randomly, it gives each senior the same chance to be chosen to attend the convention. Is it an SRS? Explain.

  25. 2.

  26. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (a) Explain why it wouldn’t be practical for scientists to obtain an SRS in this setting. To obtain an SRS, every tree would need to have an equal chance of being included in the sample. It is not practical to even identify every tree in the park.

  27. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (b) A possible alternative would be to use every pine tree along the park’s main road as a sample. Why is this sampling method biased? This sampling method is biased because these trees are unlikely to be representative of the population. Trees along the main road are more likely to be damaged by cars and people and may be more susceptible to infestation.

  28. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (c) Suppose that a more complicated random sampling plan is carried out, and that 35% of the pine trees in the sample are infested by the pine beetle. Can scientists conclude that 35% of all the pine trees on the west side of the park are infested? Why or why not? The scientists can be confident that the actual percent of pine trees in the area that are infected by the pine beetle is near 35%, although there is always some error associated with using sampling to estimate population parameters. Parameter-is a number that describes some characteristic of the population

  29. “Margin of error” does not mean that a mistake has been made but rather compensates for the variability that results from taking a random sample from a population. Sampling and Surveys • Inference for Sampling • The purpose of a sample is to give us information about a larger population. • The process of drawing conclusions about a population on the basis of sample data is called inference. • Why should we rely on random sampling? • To eliminate bias in selecting samples from the list of available individuals. • The laws of probability allow trustworthy inference about the population • Results from random samples come with a margin of error that sets bounds on the size of the likely error. • Larger random samples give better information about the population than smaller samples.

  30. Sampling Variability is described by the margin of error that comes with most poll results. • Most sample surveys are affected by errors in addition to sampling variability. • These errors can introduce bias that make a survey result meaningless. • Two main sources of errors in sample surveys: • Sampling errors • Non-sampling errors

  31. Sampling errors Sampling errors are mistakes made in the process of taking a sample that could lead to inaccurate information about the population. • Random sample margin of error • bad sampling methods, such as voluntary response and convenience samples. We can control these. • Undercoverage which occurs when some members of the population are left out of the sampling frame

  32. Non-sampling errors Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate. Response bias is when people intentionally answer wrong Poor wording of questions

  33. Don’t misuse the term “voluntary response” to explain why certain individuals don’t respond in a sample survey. Nonresponse can occur only after a sample has been selected. In voluntary response sample, every individual has opted to take part, so there won’t be any nonresponse.

  34. Section 4.2Experiments Learning Objectives After this section, you should be able to… • DISTINGUISH observational studies from experiments • DESCRIBE the language of experiments • APPLY the three principles of experimental design

  35. http://apcentral.collegeboard.com/apc/public/repository/ap10_statistics_form_b_q2.pdfhttp://apcentral.collegeboard.com/apc/public/repository/ap10_statistics_form_b_q2.pdf

  36. Experiments • Observational Study versus Experiment In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response. Definition: An observational study observes individuals and measures variables of interest butdoes not attempt to influence the responses. An experiment deliberately imposes some treatment on individuals to measure their responses. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. Let me repeat… When our goal is to understand cause and effect, experiments are the only source of fully convincing data. The distinction between observational study and experiment is one of the most important in statistics.

  37. We think that car weight helps explain accident deaths and that smoking influences life expectancy. Car weight accident deaths Smoking life expectancy In these relationships, the two variables play different roles. Car weight and number of cigarettes smoked are the explanatory variables Accident death rate and life expectancy are the response variables An explanatory variable may help explain or influence changes in a response variable A response variable measures an outcome of a study .

  38. Soy Good For You? The November 2009 issue of Nutrition Action discusses what the current research tells us about the supposed benefits of soy. For a long time, scientists have believed that the soy foods in Asian diets explains the lower rates of breast cancer, prostate cancer, osteoporosis, and heart disease in places like China and Japan. However, when experiments were conducted, soy either had no effect or a very small effect on the health of the participants. For example, several different studies randomly assigned elderly women either to soy or placebo, and none of the studies showed that soy was more beneficial for preventing osteoporosis. So what explains the lower rates of osteoporosis in Asian cultures? We still don’t know. It could be due to genetics, other dietary factors, or any other difference between Asian cultures and non-Asian cultures. An explanatory variable may help explain or influence changes in a response variable. Whether the woman consumed soy. A response variable measures an outcome of a study. The overall health of the woman

  39. Weight and Height Julie wants to know if she can predict a student’s weight from his or her height. Information about height is easier to obtain than information about weight! Jim wants to know if there is a relationship between height and weight . Problem: For each student, identify the explanatory variable and response variable, if possible Solution: Julie is treating a student’s height as the explanatory variable and the student’s weight as the response variable. Jim is just interested in exploring the relationship between the two variables, so there is no clear explanatory or response variable.

  40. Experiments • Observational Study versus Experiment Observational studies of the effect of one variable on another often fail because of confounding between the explanatory variable and one or more lurking variables. Definition: A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Well-designed experiments take steps to avoid confounding.

  41. For example, ice cream consumption and murder rates are highly correlated. Now, does ice cream incite murder or does murder increase the demand for ice cream? Neither: they are joint effects of a common cause or lurking variable, namely, hot weather. Another look at the sample shows that it failed to account for the time of year, including the fact that both rates rise in the summertime

  42. What could be a lurking variable in these examples? • There is a strong positive correlation between the foot length of K-12 students and reading scores. • Students who use tutors have lower test scores than students who don’t. • A survey shows a strong positive correlation between the percentage of a country's inhabitants that use cell phones and the life expectancy in that country

  43. Experiments • The Language of Experiments An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response. Here is the basic vocabulary of experiments. Definition: A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. The experimental units are the smallest collection of individuals to which treatments are applied. When the units are human beings, they often are called subjects. Sometimes, the explanatory variables in an experiment are called factors. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a level) of each of the factors.

  44. Experiment • How to Experiment Badly • Experiments are the preferred method for examining the effect of one variable on another. By imposing the specific treatment of interest and controlling other influences, we can pin down cause and effect. Good designs are essential for effective experiments, just as they are for sampling. Example, page 236 A high school regularly offers a review course to prepare students for the SAT. This year, budget cuts will allow the school to offer only an online version of the course. Over the past 10 years, the average SAT score of students in the classroom course was 1620. The online group gets an average score of 1780. That’s roughly 10% higher than the long- time average for those who took the classroom review course. Is the online course more effective? Students -> Online Course -> SAT Scores

  45. Experiment • How to Experiment Badly • Many laboratory experiments use a design like the one in the online SAT course example: Experimental Units Treatment Measure Response Students ---> Online Course -----> SAT Scores In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badly designed experiments often yield worthless results because of confounding.

  46. Experiments • How to Experiment Well: The Randomized Comparative Experiment • The remedy for confounding is to perform a comparative experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments. • Comparison alone isn’t enough, if the treatments are given to groups that differ greatly, bias will result. The solution to the problem of bias is random assignment. Definition: In an experiment, random assignment means that experimental units are assigned to treatments at random, that is, using some sort of chance process.

  47. ACU # Does Caffeine Affect Pulse Rate? Many students regularly consume caffeine to help them stay alert. Thus, it seems plausible that taking caffeine might increase an individual’s pulse rate. Is this true? One way to investigate this is to have volunteers measure their pulse rates, drink some cola with caffeine, measure their pulses again after 10 minutes and calculate the increase in pulse rate. Unfortunately, even if every student’s pulse rate went up, we couldn’t attribute the increase to caffeine. Suppose you have a class of 30 students who volunteer to be subjects in the caffeine experiment described earlier. Problem: Explain how you would randomly assign 15 students to each of the two treatments. Solution: Using 30 identical slips of paper, write A on 15 and B on the other 15. Mix them thoroughly in a hat and have each student select one paper. Have each student who received an A drink the cola with caffeine and each student who received a B to drink the cola without caffeine.

  48. Experiments • The Randomized Comparative Experiment Definition: In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Treatment 1 Group 1 Compare Results Random Assignment Experimental Units Group 2 Treatment 2

  49. ACU # Dueling Diets A health organization wants to know if a low-carb or low-fat diet is more effective for long-term weight loss. The organization decides to conduct an experiment to compare these two diet plans with a control group that is only provided with brochure about healthy eating. Ninety volunteers agree to participate in the study for one year. Problem: Outline a completely randomized design for this experiment. Write a few sentences describing how you would implement your design.

More Related