200 likes | 358 Views
Sampling. The basic problem. You want to make a general statement about a large group of people (a population ). The population size makes studying everyone impractical.
E N D
The basic problem • You want to make a general statement about a large group of people (a population). • The population size makes studying everyone impractical. • You select a part of the group (a sample) to study. You measure numerical facts of interest (parameters) for the sample. • Use statistics to generalize (infer) from the sample to the population.
1936 Presidential Election Alf Landon (Republican) Franklin Roosevelt (Democrat)
1936 Presidential Election • To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations. • 2.4 million responded: 57% said they’d vote for Landon • The election result:
1936 Presidential Election • To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations. • 2.4 million responded: 57% said they’d vote for Landon • The election result: Roosevelt won 62%-38%. • (Literary Digest soon went bankrupt)
1936 Presidential Election • How was the LD sample different from the population of all voters? • Consider what kind of people had phones and cars in 1936, and which party those kind of people tended to vote for. • The LD sample systematically favored wealthier people, and wealthier people tended to vote Republican.
Bias • Selection bias: a systematic tendency of the sampling procedure to exclude a portion of the population • Example: randomly choosing from a phone book • Non-response bias: a tendency of survey respondents to be different from those who didn’t respond. • Sometimes indicated by a large non-response rate
Bias • If a sampling procedure is biased, a larger sample size won’t help. • Bias can’t always be detected by looking at data. You have to ask how the sample was chosen. • So…did pollsters fix the bias issue?
1948 Presidential Election Thomas Dewey (Republican) Harry Truman (Democrat)
1948 Presidential Election • Three major polls covered the election. All used large sample sizes. • These polls all used a different method of sampling than Literary Digest.
1948 Presidential Election • Three major polls covered the election. All used large sample sizes. • These polls all used a different method of sampling than Literary Digest.
Quota Sampling • Goal: Create a sample which faithfully represents the target population with respect to key characteristics. • Implementation: Define categories of interest (e.g. residence, sex, age, race, income, etc.). Establish a fixed number of subjects to interview overall and in each category. Interviewers select freely within categories.
Quota Sampling • Example: A Gallup poll interviewer was required to interview 13 people. • 6 from suburbs, 7 from city • 7 men, 6 women • Of the men (and similarly for women) • 3 under age 40, 4 over age 40 • 1 black, 6 white • Of the white men, • 1 paid over $44 monthly rent, 2 paid less than $18
Quota Sampling • The Gallup poll seems to guarantee the sample will be like the voting population in every meaningful way. What happened? • The interviewers were free to select within categoriesand this introduced bias. • In 1948, Republicans (in each category) were marginally easier to reach for interviews because they tended to be wealthier, better educated, own telephones, have addresses, etc.
Quota Sampling • The bias in quota sampling is generally unintentional on the part of interviewers. • Prior to 1948, Democratic majority was so large, this bias didn’t show up. In a close race, the bias was significant. • Can we remove this bias from an otherwise sensible approach to sampling?
Probability Methods • Interviewers have no discretion at all as to whom they interview • Sampling procedure intentionally involves chance variation. • Investigators can compute the probability that any particular individual will be selected. • Quota sampling fails these tests.
Probability Methods • Simple Random Sampling: Each individual is given a number. Numbers are drawn at random without replacement. • Each person has an equal chance of being selected • As sample size increases, the sample proportion for each parameter approaches the population proportion (Law of Averages) • Still impractical for very large populations
Probability Methods • Cluster sampling: • Divide population into “natural” groups. • Randomly choose which groups to study. • Randomly select individuals from the chosen groups. • Can be done in stages, dividing each group into subgroups several times
Probability Methods • Post-1948 Gallup Poll sampling method
Do Probability Methods Work? • A degree of bias is inevitable in any survey. • Using probability introduces chance error (also called sampling error). • Nonetheless, improvements are noticeable.