This presentation is the property of its rightful owner.
1 / 20

# Sampling PowerPoint PPT Presentation

Sampling. The basic problem. You want to make a general statement about a large group of people (a population ). The population size makes studying everyone impractical.

Sampling

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Sampling

### The basic problem

• You want to make a general statement about a large group of people (a population).

• The population size makes studying everyone impractical.

• You select a part of the group (a sample) to study. You measure numerical facts of interest (parameters) for the sample.

• Use statistics to generalize (infer) from the sample to the population.

### 1936 Presidential Election

Alf Landon (Republican)

Franklin Roosevelt (Democrat)

### 1936 Presidential Election

• To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations.

• 2.4 million responded: 57% said they’d vote for Landon

• The election result:

### 1936 Presidential Election

• To predict the winner, Literary Digest magazine mailed out 10 million questionnaires to addresses from telephone books and vehicle registrations.

• 2.4 million responded: 57% said they’d vote for Landon

• The election result: Roosevelt won 62%-38%.

• (Literary Digest soon went bankrupt)

### 1936 Presidential Election

• How was the LD sample different from the population of all voters?

• Consider what kind of people had phones and cars in 1936, and which party those kind of people tended to vote for.

• The LD sample systematically favored wealthier people, and wealthier people tended to vote Republican.

### Bias

• Selection bias: a systematic tendency of the sampling procedure to exclude a portion of the population

• Example: randomly choosing from a phone book

• Non-response bias: a tendency of survey respondents to be different from those who didn’t respond.

• Sometimes indicated by a large non-response rate

### Bias

• If a sampling procedure is biased, a larger sample size won’t help.

• Bias can’t always be detected by looking at data. You have to ask how the sample was chosen.

• So…did pollsters fix the bias issue?

### 1948 Presidential Election

Thomas Dewey (Republican)

Harry Truman (Democrat)

### 1948 Presidential Election

• Three major polls covered the election. All used large sample sizes.

• These polls all used a different method of sampling than Literary Digest.

### 1948 Presidential Election

• Three major polls covered the election. All used large sample sizes.

• These polls all used a different method of sampling than Literary Digest.

### Quota Sampling

• Goal: Create a sample which faithfully represents the target population with respect to key characteristics.

• Implementation: Define categories of interest (e.g. residence, sex, age, race, income, etc.). Establish a fixed number of subjects to interview overall and in each category. Interviewers select freely within categories.

### Quota Sampling

• Example: A Gallup poll interviewer was required to interview 13 people.

• 6 from suburbs, 7 from city

• 7 men, 6 women

• Of the men (and similarly for women)

• 3 under age 40, 4 over age 40

• 1 black, 6 white

• Of the white men,

• 1 paid over \$44 monthly rent, 2 paid less than \$18

### Quota Sampling

• The Gallup poll seems to guarantee the sample will be like the voting population in every meaningful way. What happened?

• The interviewers were free to select within categoriesand this introduced bias.

• In 1948, Republicans (in each category) were marginally easier to reach for interviews because they tended to be wealthier, better educated, own telephones, have addresses, etc.

### Quota Sampling

• The bias in quota sampling is generally unintentional on the part of interviewers.

• Prior to 1948, Democratic majority was so large, this bias didn’t show up. In a close race, the bias was significant.

• Can we remove this bias from an otherwise sensible approach to sampling?

### Probability Methods

• Interviewers have no discretion at all as to whom they interview

• Sampling procedure intentionally involves chance variation.

• Investigators can compute the probability that any particular individual will be selected.

• Quota sampling fails these tests.

### Probability Methods

• Simple Random Sampling: Each individual is given a number. Numbers are drawn at random without replacement.

• Each person has an equal chance of being selected

• As sample size increases, the sample proportion for each parameter approaches the population proportion (Law of Averages)

• Still impractical for very large populations

### Probability Methods

• Cluster sampling:

• Divide population into “natural” groups.

• Randomly choose which groups to study.

• Randomly select individuals from the chosen groups.

• Can be done in stages, dividing each group into subgroups several times

### Probability Methods

• Post-1948 Gallup Poll sampling method

### Do Probability Methods Work?

• A degree of bias is inevitable in any survey.

• Using probability introduces chance error (also called sampling error).

• Nonetheless, improvements are noticeable.