1 / 34

# Unit 4: Sampling approaches - PowerPoint PPT Presentation

Unit 4: Sampling approaches. After completing this unit you should be able to:. Outline the purpose of sampling Understand key theoretical concepts in sampling Understand the need for more complex sampling designs Understand the main sampling issues and primary sampling options for BSS

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Unit 4: Sampling approaches' - andeana-munoz

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Unit 4: Sampling approaches

• Outline the purpose of sampling

• Understand key theoretical concepts in sampling

• Understand the need for more complex sampling designs

• Understand the main sampling issues and primary sampling options for BSS

• Understand the criteria for choosing a sampling approach

• We sample when we desire to measure characteristics of a specified population (e.g., the proportion of the general population who have unsafe sex) but lack the time and resources to obtain information from all members of the population.

• Concentrating survey time and resources on a sample may also result in better quality data than if resources were spread over the whole population.

• The target population is the population that is the ideal one for meeting a survey’s measurement objective. (For example, all commercial sex workers in a city.)

• The survey population is the target population modified to take into account practical considerations (For example, all commercial sex workers in a city over the age of 15, excluding those who are home-based.)

### What do we want from our sample?

Unbiased estimates of our indicators for the survey population

This requires a random/probability sample. Use the class as an example.

In summary:

• A probability sample is one in which each person in the survey population has a known, non-zero probability of selection.

• Statistical tests are based on the assumption that the sample is a probability sample.

• A probability sample ensures that our sample is like, or can be weighted to be like, the population from which it was drawn, and the estimates of our indicators can be generalised to the larger population.

• Probability sampling requires a sample frame, which is a list of ‘units’ from which a sample may be selected.

Probability

Non

-

probability

sample

sample

Prone to selection bias

No

Yes

Can generalise results to survey population

Yes

No

Can estimate precision of survey estimates (i.e.,

Yes

No

use statistical techniques)

Results considered credible

No

Yes

Requires sample frame

Yes

No

Requires following fixed procedures that are

Yes

No

sometimes costly or unfeasible

Method replicable (important for measuring

Yes

No

trends)

A summary of probability and non-probability sampling

This requires an adequate sample size.

In summary:

• There are many possible samples that could be selected from the population. Because of chance, each sample would produce a different estimate.

• In real life we only select one sample from the population. If we use probability sampling, we can estimate how precisely the population measure is estimated by the sample estimate.

• We can increase the precision of our estimate by ensuring an adequate sample size. Standard equations are available to calculate sample size.

### Problems with simple random sampling population

Problem 1: populationCan require the selection of a large number of random numbers.

Solution:Use systematic sampling (i.e., sample people at regular intervals down the sample frame).

Problem 2: Sample frames for an entire target population rarely exist and are too impractical to construct.

Solution:Develop a sampling frame of larger units (clusters). Randomly select clusters and construct a sample frame of individuals in the selected clusters. Randomly sample individuals within those clusters.

Notes on cluster sampling population

1. All members of the target population must be included in one of the clusters on the sample frame in order to have a chance of being selected.

2. If clusters are unequal sizes, we need to take this into account to ensure that our sample is not biased by the fact that people in smaller clusters have a higher probability of being selected than those in larger clusters. We can do this by:

• making the probability that a cluster is sampled dependent on its size

• adjusting for cluster size during the analysis.

Notes on cluster sampling, cont. population

3. Cluster sampling results in less precise estimates of our indicators than simple random sampling. As respondents within clusters may be similar to each other, we need to compensate for this by increasing the sample size.

Problem 3: populationPopulations can be spread over a wide area, making logistics difficult.

Solution: Use cluster sampling, as it concentrates fieldwork in specific clusters.

Problem 4:The population consists of distinct sub-groups that we are interested in.

Solution:Make precise estimates for each sub-group (‘strata’) by using stratified sampling (i.e., take a sample of adequate size from each strata). If we want an estimate for the entire population, we can combine the estimates for the strata if we know the proportion of the population in each strata.

• Consistent sampling is required across survey rounds: If sampling changes between rounds, we don’t know if any observed changes are real or a result of changes in methodology.

• General populations can rarely be used to access high-risk groups: Group members may not be found in households in sufficient numbers and may not want to talk in household settings. Instead, the locations where group members congregate can be defined as clusters.

High-risk group population

Possible cluster

Brothel-based sex workers

Brothels

Non-brothel-based sex workers

Streets, bars, hotels, guesthouses

Men who have sex with men

Cruising sites

Intravenous drug users

Shooting galleries, injecting sites

Truckers

Migrants

Households, workplaces

Examples of possible clusters for high-risk groups

3. Cluster sampling is difficult when clusters are not stable.

• A measure of cluster size is needed for cluster sampling. It is difficult to estimate cluster size when we use locations like sex worker sites as clusters, because the people in each cluster are rarely fixed.

• The risk behaviour in a cluster may also vary by time of day. This makes it difficult to select a sample that is representative of the entire target population using conventional cluster sampling.

4. Members of high-risk groups may be difficult to identify and access.

5. Cluster sampling is impossible if group members do not congregate. Some groups do not congregate at all. In others, only some members of the population congregate and important sections of the group may be missed.

• Use different sampling strategies for different groups.

• Use conventional sampling methods in unconventional ways.

• Consider using experimental sampling techniques such as Respondent Driven Sampling (RDS).

Conventional cluster sampling population

Appropriate for the general population, youth and a few high-risk groups, such as prisoners.

Time location sampling population

• Usewhen high-risk groups congregate, but their clusters are not stable.

• Allows locations to be included as clusters more than once (e.g., at different times of the day or on different days of the week). Clusters are defined by both location and time.

• For example:

Cluster 1= Site 1 weekday afternoon

Cluster 2= Site 2 weekday evening

Cluster 3= Site 1 weekend

Cluster 4= Site 2 weekday afternoon

Cluster 5= Site 1 weekday evening

Cluster 6= Site 2 weekend

Time location sampling population, cont.

• This means:

• The fact the cluster size is not fixed is not a problem, as we only need to know the number of individuals associated with the cluster at the sampling time interval.

• The fact that the type of person in the location varies by time is not a problem, as the location is included at different times.

Respondent-Driven Sampling population

• Use when high-risk groups do not congregate

• Steps:

• Start with initial contacts or ‘seeds,’ who are surveyed and then become recruiters.

• Each recruiter invites up to three people they know in the high-risk group to be interviewed.

• The new recruits become the recruiters.

• Five to six recruitment waves occur.

Theory behind respondent- populationdriven sampling

• Given sufficiently long referral chains (five to six of the people you started with), the final sample will be like the network from which we recruit.

• By keeping track of the links between recruiters and recruits and the size of people’s networks, we can calculate the probability of selection and estimate how precisely the population measure is estimated by the sample estimate.

surveillance

Criterion

Sampling Approach

Is the sub

-

population of interest the

Yes

Cluster sampling

general population or youth?

No

Does the sub

-

populat

ion congregate in

No

RDS

identifiable and accessible locations

in high proportions?

Yes

Is creating a list of group members

No

TLS or RDS

associated with each site feasible?

Yes

Are a high proportion of

sub

-

No

TLS or RDS

population group members likely

to be accessible at data collection

sites on randomly chosen days/times?

Yes

Cluster sampling

Sample size calculation population

The sample size can be based on the number of participants needed to detect a change in each round (or year) in the proportion of an indicator from one round to the next.

[Z1- 2P (1-P) + Z1- P1 (1- P1) + P2 (1-P1)]2

(P2 – P1)2

Where:

Z1-α = The z score for the desired confidence level

Z1-β = The z score for the desired power

P1 = The proportion of the sample reporting indicator in year 1

P2 = The proportion of the sample reporting indicator in year 2

P = (P1 + P2)/2

n= D

• Sample size calculation, cont. population

• D design effect. The design effect can be thought of as a correction factor for how much a cluster sample differs from a simple random sample. The design effect accounts for the similarities people have when they are sampled within the same cluster.

• The bigger the D, the larger the sample size needed.

Sample size calculation, cont. population

• P1 and P2. P1 and P2 are the measures of interest for which you wish to see a change between survey rounds.

• The smaller the change you wish to detect, the larger the sample size you will need.

• The closer P1 and P2 are to 50%, the larger the sample size you will need.

Sample size calculation, cont. population

• Z1-α. The Z1-α score is a statistic that corresponds to the level of significance desired.

• The smaller the significance level (i.e., higher confidence level), the larger the sample size you will need.

• Z1-β. The Z1-β score is a statistic that corresponds to the power desired.

• The higher the power, the larger the sample size you will need.

Indicator level in wave 1 (P1) population

Indicator level in wave 2 (P2)

Sample size needed each wave with a design effect of 1.25

Sample size needed each wave with a design effect of 2.0

.10

.20

.10

.25

247

395

.20

.30

123

197

.20

.35

363

581

.30

.40

171

274

.30

.45

441

706

.40

.50

201

322

.40

.55

480

768

.50

.60

214

343

.50

.65

480

768

.60

.70

210

336

.60

.75

441

706

.70

.80

188

301

.70

.85

363

581

.80

.90

149

239

.80

.95

247

93

395

149

Table 4.5. Pre-calculated sample size estimates

Example of sample size calculation population

• Suppose you are planning a survey of sex workers using a two-stage cluster design. You wish to show that condom use will increase from 20% in the baseline survey (this year) to 30% or greater in the survey wave next year.

How many sex workers do you need to include each year?

Solution:

D = 2 (moderate)

Z1-α =1.96 (95% confidence level)

Z1-β = 0.83 (80% power)

P1 = 20% condom use in year 1

P2 = 30% condom use in year 2

P = (.20 + .30)/2 = .25

N = 2 {1.96 SQT[2x.25(1 - .25)] + 0.83 SQT[.20(1-.20) + .30(1-.3))]}2/(.30 - .20) 2

= 582 sex workers per survey wave

Small group discussion population

a. What sampling strategies have you had experience with?

b. What difficulties and successes did you have with the strategy?

Case study population

• For each of the following groups, decide what is the best sampling strategy.

• Why this is the best strategy?

• What are the strong and weak points of using this method for the group?

a. Group 1: Youth

b. Group 2: MSM