Sampling

Sampling SO201, Warwick University 19th October 2009

Thinking about what you’re researching: Case, Population, Sample Case: each empirical instance of what you’re researching • So if you’re researching celebrities who have been in trouble with the law Michael Jackson would be a case, as would Winona Ryder, Kate Moss, Boy George, Pete Doherty and Amy Winehouse… • If you were interested in Fast Food companies McDonalds would be a case, Burger King would be a case, as would Subway… • If you were interested in users of a homeless shelter, each person who came to the shelter would be a case.

Thinking about what you’re researching: Case, Population, Sample • Population – all the theoretically relevant cases (i.e. “Tottenham supporters”). • Note: This may be different to the study population, which is all of the theoretically relevant cases which are actually available to be studied (i.e. “all Tottenham football club members or season ticket holders”).

Sometimes you can study all possible cases(the total population in which you are interested) For example if your study population is: • Post WW2 Prime Ministers • Homeless people using a particular shelter on Christmas Day 2009 • Countries in the European Union • The US Economy over the 20th Century • Community groups in Coventry

Often you cannot research the whole populationbecause it’s too big and doing so would be too costly, too time consuming, or impossible. For example, if your study population is: • Voters in the UK • All the homeless people in the UK on Christmas Day 2009 • The Economy of all countries in the world over the 20th Century • Community groups in the UK On these occasions you need to select some cases to study. Selecting some cases from the total population is called Sampling

How you sample depends (among other things) on some linked issues: • What you are especially interested in (what you want to find out) • The frequency with which what you are interested in occurs in the population • The size/complexity of the population • What research methods you are going to use. • How many cases you want (or have the resources/time) to study

Sample and population • Much statistical analysis is done on a sample. • BUT we generally are interested in population parameters (i.e. whether women in the UK earn more or less than men, not whether the 3,452 women in our study earn more on average than the 2,782 men in our study). • Therefore a lot of statistical analysis involves techniques for inferring from the sample to the population.

Probability and Non-Probability Sampling Probability Samples Have a mathematical relationship to the total population: we can work out mathematically the likelihood (probability) of what is found within the sample being the same as what would be found within the whole population (if we were able to analyze the whole population). Probability sampling allows us to make inferences about the whole population. Non-Probability Samples • Do not formally allow us to make inferences about the whole population. However there are often logistical reasons for their use, and (despite being statistically dodgy) inferential statistics are frequently employed (and published!).

Types ofNon-probability Sampling: 1. Reliance on available subjects: • Only justified if less risky sampling methods are not possible. • Researchers must exercise caution in generalizing from their data when this method is used.

Types ofNon-probability Sampling: 2. Purposive or judgmental sampling • Selecting a sample based on knowledge of a population, its elements, and the purpose of the study. • Used when field researchers are interested in studying cases that don’t fit into regular patterns of attitudes and behaviors (i.e. deviance). • Relies totally on the researcher’s prior ability to determine ‘suitable’ subjects.

Types ofNon-probability Sampling: 3. Snowball sampling • Appropriate when members of a population are difficult to locate. • Researcher collects data on members of the target population she can locate, then asks them to help locate other members of that population. • By definition respondents who are located by snowball sample will be connected to one another and so likely to be more similar to one another than other members of the population.

Types ofNon-probability Sampling: 4. Quota sampling • Begin with a matrix of the population (i.e. that it’s 50% female, 9% minority, with a particular age structure). • Data is collected from people with the characteristics of a given cell. • Each group is assigned a weight appropriate to their portion of the population. (so if you were going to sample 1,000 people you would want 500 of them to be female and 45 to be minority women). • Data should provide a representation of the total population. • However the data may not represent the population in terms of criteria that were not factored in to the initial matrix. • You cannot measure response rates • And the selection may be biased.

The Logic of Probability Sampling • Representativeness: A sample is representative of the population from which it’s selected if it has the same aggregate characteristics (i.e. same percentage of women, of immigrants, of poor and rich…) • EPSM (Equal Probability of Selection Method): Every member of the population has the same chance of being selected for the sample.

Random Selection Each element has an equal chance of selection independent of any other event in the selection process. Tables of random numbers are often used (in print form or generated by computer). • Sampling Frame: List of every element/case from which a probability sample is selected. Sampling frames may not include every element. It is the researcher’s job to assess the extent of omissions and to correct them if possible.

Types of Probability Sampling: 1. Simple Random Sample • Feasible only with the simplest sampling frame. • Enumerate sampling frame, and randomly select people. • Despite being the ‘pure’ type of random sampling this actually rarely occurs.

A Simple Random Sample

Types of Probability Sampling: 2. Systematic Random Sample • Random start and then every kth element selected (i.e. if you wanted to select 1,000 of 10,000 people you’d select every 10th person). • Arrangement of elements in the list can result in a biased sample (i.e. example of picking corner apartments only).

Types of Probability Sampling: 3. Stratified Sampling • Rather than selecting sample from population at large, researcher draws from homogenous subsets of the population (i.e. random sampling from a set of undergraduates, and from a set of postgraduates). • Ensures that key sub-populations are included in the sample. • Results in a greater degree of representativeness by decreasing the probable sampling error.

Types of Probability Sampling: 4. Multistage Cluster Sampling • Used when it's not possible or practical to create a list of all the elements that compose the target population. • Involves repetition of two basic steps: creating lists of clusters and sampling. • Highly efficient but less accurate.

Example of Cluster Sampling Sampling Coventry residents • Write a list of all neighbourhoods in Coventry • Randomly select (sample) 5 neighbourhoods • Write a list of all streets in each selected neighbourhood • Randomly select (sample) 2 streets in each neighbourhood • Write a list of all addresses on each selected street • Randomly select (sample) every house/flat. • Write a list of all residents in each selected house/flat • Randomly select (sample) one person to interview.

Types of Probability Sampling: 5. Probability Proportionate to Size (PPS) Sample • Sophisticated form of cluster sampling. • Used in many large scale survey sampling projects. • Like cluster-sampling, but here clusters are selected with a probability proportionate to their size (i.e. a city 10 times larger than another is 10 times more likely to be selected in the first stage of clustering).

Note • The sampling strategy used in real projects often combines elements of cluster sampling and elements of stratification. See example of Peter Townsend’s survey of poverty (Buckingham and Saunders, p. 120)

Weighting • Used when you have “over-sampled” a particular group. This is called “disproportionate sampling” • It assigns some cases more weight than others on the basis of the different probabilities each case had of selection • The simplest form of weighting is to give each case a weight that’s the inverse of the case’s probability of selection

Exercise Imagine that you are going to conduct a ‘smoking survey’ (similar to that in the textbook), and want to get as accurate as possible a sample of Warwick students. • What sampling strategy would you choose and why? • What biases might this strategy produce?

Sampling Error • A Parameter is the summary description of a given variable in a population (i.e. percent of women in the US population) • When researchers generalize from a sample they’re using sample observations to estimate population parameters • Sampling Error is the degree of error to be expected from a given sample design in making these estimations

Sampling Error The most carefully selected sample will never provide a perfect representation of the population from which it was selected. There will always be somesampling error The expected error in a sample is expressed in terms of confidence levels (i.e. that you’re 95% confident of being right about the proportion of the population that is Catholic, based on how many people in your sample were Catholic)

A population of ten peoplewith $0 - $9

The Sampling Distribution of Samples of 1

The Sampling Distribution of Samples of 2

The sampling Distribution of Samples of 3,4,5, and 6

Sample Size(reducing sampling error) Sample Size Depends on: • Heterogeneity of the population – the more heterogeneous, the bigger the sample • Number of sub-groups – the more sub groups, the bigger the sample • Size of the phenomenon you’re trying to detect – the closer to 50% (of the time) that it occurs, the bigger the sample • How accurately you want your sample statistics to reflect the population – the more accurate, the bigger the sample

Other considerations when you’re thinking about Sample Size • Response Rate – when you think that a lot of people will not respond, you need to start off with a larger sample • Analysis – some forms of statistical analysis require a large number of cases. If you plan on doing these you will need to ensure you’ve got enough cases Generally (given a choice): Bigger is Better!

Sampling

Sampling

Presentation Transcript

Sampling

Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling Design Sampling Procedures

SAMPLING

Sampling

Sampling

Sampling...

Sampling

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling

Sampling

Sampling

Sampling dan Distribusi Sampling()

SAMPLING

Sampling

Sampling