Sampling

Sampling

The basic problem • You want to make a general statement about a large group of people (a population). • The population size makes studying everyone impractical. • You select a part of the group (a sample) to study. You measure numerical facts of interest (parameters) for the sample. • Use statistics to generalize (infer) from the sample to the population.

1936 Presidential Election Alf Landon (Republican) Franklin Roosevelt (Democrat)

1936 Presidential Election • To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations. • 2.4 million responded: 57% said they’d vote for Landon • The election result:

1936 Presidential Election • To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations. • 2.4 million responded: 57% said they’d vote for Landon • The election result: Roosevelt won 62%-38%. • (Literary Digest soon went bankrupt)

1936 Presidential Election • How was the LD sample different from the population of all voters? • Consider what kind of people had phones and cars in 1936, and which party those kind of people tended to vote for. • The LD sample systematically favored wealthier people, and wealthier people tended to vote Republican.

Bias • Selection bias: a systematic tendency of the sampling procedure to exclude a portion of the population • Example: randomly choosing from a phone book • Non-response bias: a tendency of survey respondents to be different from those who didn’t respond. • Sometimes indicated by a large non-response rate

Bias • If a sampling procedure is biased, a larger sample size won’t help. • Bias can’t always be detected by looking at data. You have to ask how the sample was chosen. • So…did pollsters fix the bias issue?

1948 Presidential Election Thomas Dewey (Republican) Harry Truman (Democrat)

1948 Presidential Election • Three major polls covered the election. All used large sample sizes. • These polls all used a different method of sampling than Literary Digest.

Quota Sampling • Goal: Create a sample which faithfully represents the target population with respect to key characteristics. • Implementation: Define categories of interest (e.g. residence, sex, age, race, income, etc.). Establish a fixed number of subjects to interview overall and in each category. Interviewers select freely within categories.

Quota Sampling • Example: A Gallup poll interviewer was required to interview 13 people. • 6 from suburbs, 7 from city • 7 men, 6 women • Of the men (and similarly for women) • 3 under age 40, 4 over age 40 • 1 black, 6 white • Of the white men, • 1 paid over $44 monthly rent, 2 paid less than $18

Quota Sampling • The Gallup poll seems to guarantee the sample will be like the voting population in every meaningful way. What happened? • The interviewers were free to select within categoriesand this introduced bias. • In 1948, Republicans (in each category) were marginally easier to reach for interviews because they tended to be wealthier, better educated, own telephones, have addresses, etc.

Quota Sampling • The bias in quota sampling is generally unintentional on the part of interviewers. • Prior to 1948, Democratic majority was so large, this bias didn’t show up. In a close race, the bias was significant. • Can we remove this bias from an otherwise sensible approach to sampling?

Probability Methods • Interviewers have no discretion at all as to whom they interview • Sampling procedure intentionally involves chance variation. • Investigators can compute the probability that any particular individual will be selected. • Quota sampling fails these tests.

Probability Methods • Simple Random Sampling: Each individual is given a number. Numbers are drawn at random without replacement. • Each person has an equal chance of being selected • As sample size increases, the sample proportion for each parameter approaches the population proportion (Law of Averages) • Still impractical for very large populations

Probability Methods • Cluster sampling: • Divide population into “natural” groups. • Randomly choose which groups to study. • Randomly select individuals from the chosen groups. • Can be done in stages, dividing each group into subgroups several times

Probability Methods • Post-1948 Gallup Poll sampling method

Do Probability Methods Work? • A degree of bias is inevitable in any survey. • Using probability introduces chance error (also called sampling error). • Nonetheless, improvements are noticeable.

Sampling

Sampling

Presentation Transcript

Sampling

Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling Design Sampling Procedures

SAMPLING

Sampling

Sampling

Sampling...

Sampling

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling

Sampling

Sampling

Sampling dan Distribusi Sampling()

SAMPLING

Sampling

Sampling