330 likes | 474 Views
Sampling. Representing populations. Let’s say you wanted to know whether people over 60 used the Internet for medical information.
E N D
Sampling Representing populations
Let’s say you wanted to know whether people over 60 used the Internet for medical information • You could save a bundle on providing medical information by putting up a web page with the necessary information rather than contacting people directly or having them call their doctors for it
But how could you determine whether they would use it? • Track them all down and ask them? • Practically impossible • Prohibitively expensive • Not really necessary
So • Talk to some of them and estimate what the rest would say • But which ones should be talked to? • Sampling theory guides us in the choice of people to measure as well as estimating what the entire population would have answered
Samples and Sampling • A sample is a subgroup drawn from a larger population that is meant to represent all members • Sampling refers to the actions taken to draw a sample from a population
Examples of Sampling • Small portions of food are given away in supermarkets in order to get you to buy the product (I made it through grad school this way) • Geologists drill out deep cylinders of rock to determine whether to drill for oil • Farmers pick ears of corn from many parts of the field to check for insects • Short portions of songs are downloaded from the Internet by prospective buyers
Sampling frame • A list of the units of the population used to draw the sample • A sampling frame must closely reflect population • (e.g., telephone books, voter registration lists)
Parameters and statistics • Parameter • A true characteristic of a population • Average age of Lexingtonians • Statistic • A numeric summary of a variable in a sample • Mean age of a sample of Lexingtonians • Sample statistics are computed in order to estimate population parameters.
Random sample • The best method for representing the entire population with a sample is to use arandomsample • In a random sample, each person in the population of interest has an equal and known chance of being selected • allows researchers to calculate sampling error
Nonrandom samples • In nonrandom samples, the likelihood of inclusion of any individual elements from the population into the sample is not known • Means that many of the advantages of statistical analyses are lost
The researcher may choose a nonrandom sample for several reasons: • Purpose of the study • explore variable relationships (experiment) • exploratory research • Cost versus value • probability sample may be too expensive • Low incidence of preferred respondents • black lawyers • Willingness to participate • focus groups • Time constraints • Exploratory study
Types of nonrandom samples • Convenience sample (also called ‘haphazard’ or ‘accidental’ sample) • Volunteer sample • Purposive sample • Quota sample • Network sample
Convenience sample • Respondents are included based on availability • students in introductory courses • mall intercepts • movie studio tours
Volunteer sample • Respondents choose to participate in the study • clinical trials • consumer juries • extra-credit psych experiments
Volunteers are different: • higher educational status • higher occupational status • greater need for approval • higher IQ • lower authoritarianism • more sociable • more ‘arousal-seeking’ • less conventional • tend to be first children • younger
Purposive sample • Subjects selected on the basis of specific characteristics or qualities • users of a particular brand • young mothers with small children • doctors • members of a fan club • target market members
Quota sample • “respondents are selected nonrandomly according on the basis of their known proportion in a population” (Frey et al., 2000) • Large/medium/small hospitals • Caucasian/Black/Asian • Heavy/medium/light users • Responses may be weighted according to population proportion
Network sample • ‘Snowball’ sample • ask respondents to recommend additional sources/respondents • cheaper • helps identify people with certain characteristics • aids in respondent compliance • identify networks of people
Random samples • Simple random sample • Systematic random sample • Stratified random sample • Cluster sample
Simple random sample (SRS) • The simple random sample is a case where each element has an equal chance of being selected into the sample • Lottery • Random number table • Roulette wheel • Random digit dialing • Statistics often assume a “SRS”
Systematic random sampling • “A random sample that chooses every nth person/text from a complete list of a population after starting at a random point.” (Frey et al., 2000) • For example, if you have a sampling frame of 600 elements and you need a sample of 100, then you would have to pick every 6th name. You randomly choose the first name--it turns out to be the 4th element. You then choose the 4th, 10th, 16th, 22nd, etc.
Stratified random sample • A sample developed by first splitting the population based on some important characteristic and sampling randomly from within categories • e.g. age, gender, race, income • random samples are taken from within each of the subpopulations
Cluster sampling • Larger groupings of individual sample elements are chosen, then the elements are measured • Usually geographic areas
Cluster sampling • Advantages: • Only part of the population needs to be enumerated • Costs reduced • Cluster estimates can be compared to population numbers
Cluster sampling • Disadvantages • Sampling errors are likely • Clusters may not be representative of the population • Number and size of clusters is important • Each subject or unit must be assigned to a specific cluster
Multi-stage sampling • Sample large groups/clusters, then sample smaller units within the groups, and so on • metropolitan area • county • block • residence • individual
Sample Size • Generally speaking, the larger the better • But quality is most important • Though people find it hard to believe, you can make some pretty good estimates of very large populations from rather small samples • National polls can be pretty accurate with 600 respondents
Sample size • There is a law of diminishing returns: • additional units add less and less precision • The first respondent is the most valuable, the second is second-most, etc. • Will often be determined by time and cost considerations
Sampling error • “A number that expresses how much the characteristics of a sample probably differ from the characteristics of its population” (Frey et al., 2000) • Sampling error can be estimated for random samples • this is nonsystematic error variance
Sampling error • Two key components of sampling error estimates are confidence levels and confidence intervals • “We express the accuracy of our sample statistics in terms of a level of confidence that the statistics fall within a specified interval from the parameter.” (Babbie) • tradeoff between confidence level and confidence interval
Example: • Research finds that 45% of males say that they have broken the speed limit by 15 mph in the last two months. • The researcher is 99% confident that the actual percent is between 42% and 48%. • That is, if the researcher took 100 samples, she would expect that in 99 of them the estimate of the % of males speeding by 15 mph would fall between 42% & 48%.
So • We use samples to estimate population parameters because our estimates can be pretty close while drastically reducing the costs of carrying out the research • Samples are either random or nonrandom • Random samples allow us to estimate the sampling error attached to statistics describing the sample
Nonrandom samples are used when random samples are too expensive or impractical • They employ methods other than randomization meant to increase their representativeness • A number of different types of random and nonrandom sampling can be used to reduce costs or improve sample quality