STAT 101 Dr. Kari Lock Morgan 8/30/12. Collecting Data: Sampling. SECTION 1.2 Sample versus Population Statistical Inference Sampling Bias Simple Random Sample Other Sources of Bias. Sample versus Population. A population includes all individuals or objects of interest.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Dr. Kari Lock Morgan
Collecting Data: Sampling
A population includes all individuals or objects of interest.
A sampleis all the cases that we have collected data on (a subset of the population).
Statistical inferenceis the process of using data from a sample to gain information about the population.
The Big Picture
Which of the following is most important to you?
Sampling bias occurs when the method of selecting a sample causes the sample to differ from the population in some relevant way.
GOAL: Select a sample that is similar to the population, only smaller
“Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they here gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.”
Take a RANDOM sample!
(Note: When choosing a real sample, you should use technology to generate random numbers. This is simply for illustrative purposes in class.)
Think of tasting a bowl of soup…
Suppose you want to estimate the average number of hours that Duke students spend studying each week. Which of the following is the best method of sampling?
Source: Chesher, G., Dauncey, H., Crawford, J. and Horn, K, “The Interaction between Alcohol and Marijuana: A Dose Dependent Study on the Effects of Human Moods and Performance Skills,” Report No. C40, Federal Office of Road Safety, Federal Department of Transport, Australia, 1986.
Data Collection and Bias
Other forms of bias?
Tax Cut: 60%Programs: 40%
Tax Cut: 22%Programs: 78%
“If you had it to do over again, would you have children?
If we were to run the question all by itself in the newspaper with a request for responses, could we trust the results?
This would suffer from volunteer bias. We need a random sample.
Newsday conducted a random sample of all US adults, and asked them the same question, without any additional leading material
Do you think the true proportion of US parents who are happy they had children is close to 91%?
Because this is a random sample, the population proportion should be close to the sample proportion.
Svenson, O. (February 1981). "Are we all less risky and more skillful than our fellow drivers?" ActaPsychologica 47 (2): 143–148.
Always think critically about how the data were collected, and recognize that not all forms of data collection lead to valid inferences