Sampling:
Download
1 / 30

Sampling: What you don’t know can hurt you - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Sampling: What you don’t know can hurt you. Juan Muñoz. Outline of presentation. Basic concepts Scientific sampling Simple Random Sampling Sampling errors and confidence intervals Sampling errors and sample size Sample size and population size Non-sampling errors Sampling for rare events

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Sampling: What you don’t know can hurt you' - elda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Sampling:What you don’t know can hurt you

Juan Muñoz


Outline of presentation
Outline of presentation

  • Basic concepts

    • Scientific sampling

    • Simple Random Sampling

    • Sampling errors and confidence intervals

    • Sampling errors and sample size

    • Sample size and population size

    • Non-sampling errors

    • Sampling for rare events

    • Two-stage sampling and clustering

    • Stratification

    • Design effect

  • Implementation issues

    • Planning the survey

    • Sample frames

    • Excluded strata

    • Paneling

    • Nonresponse


Random sampling
Random Sampling

  • Random Sampling (a.k.a. Scientific Sampling) is a selection procedure that gives each element of the population a known, positive probability of being included in the sample

  • Random Sampling permits establishing Sampling Errors and Confidence Intervals

  • Other sampling procedures (purposive sampling, quota sampling, etc.) cannot do that

  • Other sampling procedures can also yield biased conclusions


Simple random sampling
Simple Random Sampling

  • In a Simple Random Sample, households are chosen

    • With the same probability

    • Independently of each other

  • In a Simple Random Sample, the selection probability of each household is p = n / N, where

    • n = sample size

    • N = size of the population

  • A Simple Random Sample is self-weighted


Simple random sampling1
Simple Random Sampling

  • A simple random sample would be hard to implement...

    • A list of all households in the country is generally not available to select the sample from

    • In other words, we don’t have a good sample frame

    • High transportation costs

    • Difficult management

  • ...but can be used to illustrate some basic facts about sampling

    • Sampling Errors and Confidence Intervals

    • The relationship between sampling error and sample size

    • The relationship between sample size and population size

    • Sampling vs. non-sampling errors


Sampling error and sample size
Sampling error and sample size

Standard error e when estimating a prevalence P in a sample of size ntaken from an infinite population


Confidence intervals
Confidence intervals

In a sample of 1,000 households, 280 households (28 percent) have preschool children.

Standard error is 1.42 percent.


Confidence intervals1

Standard error

95 percent confidence interval:

28 ± 1.42 • 1.96

99 percent confidence interval: 28 ± 1.42 • 2.58

Confidence intervals

In a sample of 1,000 households, 280 households (28 percent) have preschool children. Standard error is 1.42 percent.

24 25 26 27 28 29 30 31 32


Sampling error and sample size1

To halve sampling error...

...sample size must be quadrupled

Sampling error and sample size

Standard

error

Sample size


Sample size and population size

finite population correction

Sample size and population size

Standard error e when estimating a prevalence P in a sample of size ntaken from a population of size N


Sample size and population size1
Sample size and population size

Sample

size needed for a given precision

Population size


Sampling vs non sampling errors
Sampling vs. non-sampling errors

Total error

Sampling error

Non-sampling error

Sample size


Absolute and relative errors
Absolute and relative errors

Formula gives the absolute error

But we are often interested in the relative error

For rare events (small p,) the relative error can be large, even with very big samples

This may be the case of some of the MDG’s

  • Infant / maternal mortality

  • HIV/AIDS prevalence

  • Extreme poverty


Two-stage sampling

  • The country is divided into small Primary Sampling Units (PSUs)

  • In the first stage, PSUs are selected

  • In the second stage, households are chosen within the selected PSUs


Two stage sampling
Two-stage sampling

  • Solves the problems of Simple Random Sampling

  • Provides an opportunity to link community-level factors to household behavior

  • The sample can be made self-weighted if

    • In the first stage, PSUs are selected with Probability Proportional to Size (PPS)

    • In the second stage, a fixed number of households are chosen within each of the selected PSUs

  • The price to pay is cluster effect


Cluster effect

Intra-cluster

correlation

coefficient

Cluster effect

Two Stage Sample

Simple Random Sample

Cluster effect

Standard error grows when the sample of size nis drawn from kPSUs, with m households in each PSU (n=k•m)


Cluster effects
Cluster effects

For a total sample size of 12,000 households

Intra-cluster correlation coefficient

0.01 0.02 0.05 0.10

Number

of PSUs

Number of

households

per PSU

3000

2000

1500

1000

800

600

400

300

200

150

100

1.03 1.06 1.15 1.30

1.05 1.10 1.25 1.50

1.07 1.14 1.35 1.70

1.11 1.22 1.55 2.10

1.14 1.28 1.70 2.40

1.19 1.38 1.95 2.90

1.29 1.58 2.45 3.90

1.39 1.78 2.95 4.90

1.59 2.18 3.95 6.90

1.79 2.58 4.95 8.90

2.19 3.38 6.95 12.90

4

6

8

12

15

20

30

40

60

80

120


Stratified sampling
Stratified Sampling

These objectives are often contradictory in practice

  • The population is divided up into subgroups or “strata”.

  • A separate sample of households is then selected from each stratum.

  • There are two primary reasons for using a stratified sampling design:

    • To potentially reduce sampling error by gaining greater control over the composition of the sample.

    • To ensure that particular groups within a population are adequately represented in the sample.

  • The sampling fraction generally varies across strata.

Sampling weights need to be used

to analyze the data


Design effect
Design effect

  • In a two-stage sampleCluster effect = e²TSS / e²SRS

  • In a more complex sample (with two or more stages, stratification, etc.)Design effect = Deff = e²CS / e²SRS

  • It can be interpreted as an apparent shrinking of the sample size, as a result of clustering and stratification.

  • It can be estimated with specialized software (such as the Stata’s svy commands)


First stage sample frame the list of census enumeration areas
First stage sample frame:The list of Census Enumeration Areas

  • Exhaustive

  • Unambiguous

  • Linked with cartography

  • Measure of size (for PPS selection)

  • Up to date (?)

  • Area Units of adequate size


Second stage sample frame the household listing operation

What is involved?

How long does it take?

How much does it cost?

How much earlier than the survey?

Is it always needed?

Dwellings or households?

Who draws the sample?

Asking extra questions during listing

Can new technologies help?

Training, organization, supervision, forms

50-80 households per enumerator/day

~15% of the total cost of fieldwork

As close as possible

Yes (almost)does

A dwelling listing is more permanent

Ideally, central staff

Not recommended

Yes (GPS)

Second stage sample frame:The household listing operation


Planning the survey

  • Selected PSUs should be allocated

    • Among teams

    • During the survey period


Excluded strata
Excluded strata

  • Parts of the country may need to be excluded from the sample for security or other reasons


Panel surveys can measure change better
Panel Surveys can measure change better

It seems that Y2001 > Y2005 but…

…both measures are affected by sampling errors (e2001 et e2005)

The error of the difference Y2005 - Y2001 is…

…√ (e²2001 + e²2005) if the two samples are independent

…only √(e²2001+e²2005–2ρ[Y2001,Y2005]) if the sample is the same


Advantages and disadvantages of panels

Analyticaladvantages

Can measure changes better

Permit understanding better why things changed

Permits correlating past and present behavior

Analyticaldisadvantages

Become progressively less representative of the population

Advantages and disadvantagesof panels

  • Practicaladvantages

    • No sampling design needed for the second and subsequent surveys

  • Practicaldisadvantages

    • Sample attrition

    • Much harder to manage

    • Better to design them prospectively rather than in afterthought


Nonresponse
Nonresponse

  • Possible solutions…

    • Replace nonrespondents with similar households

    • Increase the sample size to compensate for it

    • Use correction formulas

    • Use imputation techniques (hot-deck, cold-deck, warm-deck, etc.) to simulate the answers of nonrespondents

    • None of the above


The best way to deal with nonresponse is to prevent it

The best way to deal with nonresponse is to prevent it

Lohr, Sharon L. Sampling: Design & Analysis (1999)


Training

Motivation

Work Load

Qualification

Data collection method

Interviewers

Availability

Type of survey

Respondents

Socio-economic

Burden

Economic

Motivation

Demographic

Proxy

TotalNonresponse

Source: “Some factors affecting Non-Response.” by R. Platek. 1977. Survey Methodology. 3. 191-214


Case study: The IHSES

Iraq Household Socio-Economic Survey

Presenter: Ms Najla Murad - COSIT

  • Total sample size: 18,144 households

  • 56 Strata = 18 governorates x 3 zones (5 in Bagdad)( Urban Center / Other Urban / Rural )

  • No explicitly excluded strata

  • Within each stratum: 324 households, selected in two-stages:

    • 54 Blocks, selected with PPS

    • In each block: 6 households (a cluster,) selected with EP

  • The 162 clusters of each governorate were allocated

    • To fieldworkers: 3 teams x 3 interviewers x 18 clusters

    • In time: 18 waves x 9 clusters (randomly)One wave = 20 days  fieldwork period = 12 months


Case study: The IHSES

Iraq Household Socio-Economic Survey

Performance of the contingency plans

  • If a cluster could not be visited at the scheduled time, it was swapped with one of the selected clusters not yet visited, chosen at random.

  • At the end of fieldwork, 75 of the 3,024 originally selected clusters could not be visited (2.5 percent)

  • However, over 30 percent of the clusters were not visited at the scheduled time

  • In the clusters that could be visited, non-response was negligible (~1.5 percent)


ad