Sampling weights: an appreciation

Sampling weights: an appreciation (Sessions 19)

Learning Objectives By the end of this session, you will be able to • explain the role of sampling weights in estimating population parameters • calculate sampling weights for very simple sampling designs • appreciate that calculating sampling weights for complex survey designs is non-trivial and requires professional expertise

What is meant by sampling weights? • Real surveys are generally multi-stage • At each stage, probabilities of selecting units at that stage are not generally equal • When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population • This scaling-up factor, applied to each unit in the sample is called its sampling weight.

A simple example • Suppose for example, a simple random sample of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line • Hence total in population living below the poverty line = (140/500)*7349 =2058 • Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. • Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. • i.e. sampling weight for each HH = 14.7

Why are weights needed? • Above was a trivial example with equal probabilities of selection • In general, units in the sample have very differing probabilities of selection, i.e. rare to get a self-weighting design • To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection • Thus sampling weight=(1/prob of selection)

Weights in stratified sampling • Consider “To the Woods” example data set discussed in Session 10. • Mean number of large trees were: • 97.875 in region 1, based on n1=8 • 83.500 in region 2, based on n2=6 • Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = 15408 • So what are the sampling weights used for each unit (plot)?

Self-weighting again • The sampling weights are the same for all plots, whether in region 1 or region 2. Why is this? • What are the probabilities of selection here? • In region 1, each unit is selected with prob=8/96 • In region 2, each unit is selected with prob=6/72 • Recall that a design where probabilities of selection are equal for all selected units is called a self-weighting design. • So regarding the sample as a simple random sample should give us the correct mean.

Results for means • The mean number of large trees, using the formula for stratified sampling, gives [(96/168)*97.875 ] + [(72/168)*83.5] = 91.71 • Regarding the 14 observations pretending they were drawn as a simple random sample gives 91.71 as the answer. • The results for variances however differ • Variance of stratified sample mean=1.28 • Variance of mean ignoring stratification = 2.18

Results for means • Important to note that the weights used in computing a mean, i.e. • (96/168)*(1/8) = 1/14 for plots in region 1, & • (72/168)*(1/6) = 1/14 for plots in region 2, are not sampling weights • Sampling weights refer to the multiplying factor when estimating a total. • Essentially they represent the number of elements in the population that an individual sampling unit represent.

Other uses of weight • Weights are also used to deal with non-responses and missing values • If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this. • e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.

Computation of weights • General approach is to find the probability of selecting a unit at every stage of the sample selection process • e.g. in a 3-stage design, three set of probabilities will result • Probability of selecting each final stage unit is then the product of these three probabilities • The reciprocal of the above probability is then the sampling weight

Difficulties in computations • Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys • Complex sampling designs are common • Computing correct probabilities of selection can then be very challenging • Usually professional assistance is needed to determine the correct sampling weights and to use in correctly in the analysis

Software for dealing with weights • When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights • Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights • However, need to be careful that the approaches used are appropriate for your own survey design

References • Brogan, D. (2004) Sampling error estimation for survey data. Chapter XII, pp.447-490, of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at http://unstats.un.org/unsd/hhsurveys/index.htm. (accessed 10th September 2007) • Lohr, S.L. (1999) Sampling: Design and Analysis. International Thomson Publishing. ISBN 0-534-35361-4 • Rao, P.S.R.S. (2000) Sampling Methodologies: with applications. Chapman and Hall, London.

Sampling weights: an appreciation

Sampling weights: an appreciation

Presentation Transcript

Expenditure weights

Survey Weights: An Example from Honduras

Invest in an appreciation worthy lifestyle

VERMONT: An Appreciation of the Seasons

Appreciation

Assignment Weights

An Appreciation of Culture

Health and Safety Resource – An Appreciation

Atomic Weights

An Introduction To Compressive Sampling

Nick Shackleton: an appreciation

Vehicle Weights

Supervising an Employer-Employee Appreciation Event

Formula Weights

SI Weights

An example of longitudinal LFS weights

Appreciation

Appreciation

weights gym

CJBS 250 GAIN AN APPRECIATION / TUTORIALOUTLETDOTCOM

Bobby Charlton; an appreciation

Atomic Weights