Multiple Indicator Cluster Surveys Survey Design Workshop

Multiple Indicator Cluster SurveysSurvey Design Workshop Sampling: Advanced Sampling MICS Survey Design Workshop

Major steps in designing MICS sample • Define objectives • Key indicators • Desired level of precision • Sub-national domains of estimation • Identify most appropriate sampling frame • Most recent census of population and housing • Master sample or sample for another survey conducted recently

Major steps in designing MICS sample • Determine sample size and allocation • Determine availability of previous MICS or DHS results to provide measures of sampling parameters

Sampling Frame • Sampling frame: • Nationally-representative • Complete coverage • Measures of size (households or population) for small area units • Generally most recent census is the most effective sampling frame

Sampling Frame • In some cases more recent pre-census listing may be available • When no census is available, identify most complete geographic frame available (e.g. list of villages/localities with estimated population)

Sampling Frame • Common problems with area frames: • Coverage issues • Census maps of poor quality • Errors and changes in area boundaries • Inappropriate type and size of area units • Lack of auxiliary information

Sample Size Determination

n is the required sample size (number of households) • 4 is a factor to achieve the 95 percent level of confidence • r is the predicted or estimated value of the indicator in target population • deffis the design effect

RR is the response rate • pb is the proportion of the target subpopulation in total population (upon which the indicator, r, is based) • AveSize is the average household size (that is, average number of persons per household)

e is the margin of error to be tolerated at the 95% level of confidence • Currently, note that e = 0.12r [defined as 12% of r, in this case the relative standard error of r is 6% because e = 2 standard error (r)]

Previously in MICS2 • 2 different values for margin of error • Margin of error was 5 percentage points for high values of r (over 25%) • Margin of error was 3 percentage points for low values of r (25% or less) • Difficulty for users in deciding on the sample size for their surveys.

MICS template for sample size calculation - EXCEL FILE

Selection of key indicators • Choose an important indicator that will yield the largest sample size • Step 1: Select 2 or 3 target populations representing each a small percentage of the total population (pb); typically • Children 12-23 months: 2-4% or • Children under 5 years: 7%-20%

Selection of key indicators • Step 2: Review important indicators for these target groups but ignore indicators with very low or very high prevalence (less 10% or over 40%, respectively) • Do not choose from the desirably low coverage indicators an indicator that is already acceptably low • Do no choose childhood and maternal mortality ratios

Explicit Stratification • Explicit stratification: dividing the sampling frame into sub-groups (called strata) of homogeneous (similar) PSUs. • Advantages: • Better precision because reduced variance within stratum given similarity of units • Flexible design, sub-national estimates for smaller domains (differential sampling rates) • Example of stratification: region, urban/rural

Implicit Stratification • Sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc., then select a systematic ppssample. • Ensures a representative sample for each subgroup • Automatically provides proportional allocation by size of subgroup

Allocation of sample to strata/domains • Proportional allocation • Effective for precision of estimates at the national level • Equal allocation to each domain • Used when each domain requires same level of precision • Optimum allocation – takes into account differential variance and costs by stratum • For example, variability may be higher in urban areas and enumeration costs may be higher in rural areas – use higher sampling rate for urban areas

Subnational estimates • Number of separate areas (domains) for which separate, equally reliable estimates are wanted affects sample size • For example, if 10 regional estimates are wanted, theoretically the sample should be increased by factor of 10 • As a compromise, larger sampling errors accepted for subnational estimates • One proposal (by Dr. Vijay Verma) – increase national sample size by factor of D0.65, where D is the number of domains • Results in an average increase in the sampling errors for domain estimates by a factor of about 1.5

Sampling Stages • Ideal to have two-stage sample design, with EAs defined as PSUs • In some countries only frame of larger administrative units available • Three-stage sample design: larger area units selected as PSUs • Necessary to delineate smaller segments in each sample PSU

Number of PSUs and Cluster Size • Survey costs depend not only on number of households but their distribution among primary sampling units (PSUs) • Important to determine effective balance between number of sample PSUs and number of sample households per cluster • In general, the more PSUs the better for reliability but the greater the cost (mostly costs of travel and listing)

Number of PSUs and Cluster Size • Example: 8000 households selected in 400 PSUs of 20 sample households each is a much more reliable sample than 200 PSUs of 40 households each, but more expensive • Number of sample households per cluster should be as small as practical for reliability • A range of 15-25 households for MICS appears to be effective

Design Effect (DEFF) • Deff - ratio of variance of estimate based on stratified multi-stage sample design and corresponding variance from simple random sample of same size • Measure of the relative efficiency of the sample design • Effective stratification reduces the deff • Cluster sampling increases the deff

Design Effect (DEFF) • In case of cluster sampling, deff generally measures effect of clustering • δ = intraclass correlation coefficient, or measure of homogeneity within cluster • = average number of households per cluster • Design effect increases with intraclass correlation and cluster size

First Stage Selection of PSUs • Standard methodology for MICS and other household surveys – select EAs or clusters systematically with PPS • Important to sort frame before selection, in order to ensure effective implicit stratification • Traditional procedure – cumulate measures of size, determine sampling interval and random start, generate selection numbers

Large sample PSUs in PPS sampling • Sometimes a PSU may have a measure of size larger than the sampling interval • PSU may be selected more than once in the systematic PPS selection • Option 1 – if the PSU is selected two or more times, multiply the number of households to be selected by the number of “hits” • Option 2 – separate the large PSUs and include in sample with a probability of 1

MICS Sampling Option 1 – new sample with household listing • Design new MICS sample • Two stages with census as frame • Use of implicit stratification, systematic selection of census EAs at first stage with pps • List households in selected EAs/segments • Select households systematically from listing • Interview selected households, no replacement will be allowed

Sampling Option 1 - continued • Advantages of option 2 - simple design - probability-based - if possible self-weighting (national level) • Limitations of option 2 - expense of listing households - time necessary to list households [Example, sample size of 5000 households may require 25000 to 50000 households to be listed]

MICS Sampling Option 2 – use an existing sample • Design MICS as a rider to another survey if timely and feasible • Use sample from a previous survey and re-interview households for MICS • Or, use old survey sample EAs and construct new listing of households to select for MICS • Old sample must be probability-based, national in scope • Possibilities – DHS, other national health survey, recent labour force survey • Important: design parameters must be known (such as selection probability, stratification, etc.)

Sampling option 2 - continued • Use of existing master sampling frame • Some countries use master sample design for intercensal national household surveys • Master samples generally sufficiently large for MICS; subsample of PSUs can be selected • Advantage – updated maps may be available for master sample of PSUs, and perhaps updated listing

Sampling option 2 - continued • Advantages of using previous sample - cost savings - maps available for interviewers - appropriate sampling plan available - simplicity • Limitations of using old sample - burden on respondents - sample design may need modification * sample size * sub-national coverage * number of PSUs or clusters • Balance between loss and gain

Listing and Selection of Households • Household listing manual is available • Importance of new listing to represent current population • Problems with using previous listing (older than 1 year) • Does not represent newer households • Distribution of sample population by age group distorted, generally with higher median age • Difficulty of finding households in old list

Listing and Selection of Households • MICS recommends a separate household listing operation • More reliable as listing staff are less likely than interviewers to bias the sample by excluding households that are difficult to reach • Allows household selection to be done in a single central location using reliable and uniform procedures

Listing and Selection of Households • Household selection in the office: • Advantages – conducted by specialized staff, possible to avoid selection bias in the field, possible to control overall sample size • Disadvantage – increased costs from having two field visits • Selection in the field: use household selection table • Advantage – cost savings of having one integrated field operation • Disadvantage - correct sampling may be difficult for field staff, selection may be biased

Listing and Selection of Households • Excel template for generating automatically the sample of households based on the number of households listed(see spreadsheet) • Common problems found in listing operations • Problem with quality of sketch maps – difficult to determine segment boundaries • Sometimes large differences found between number of households in frame (census) and number listed.

Sampling strategy for low fertility countries • In MICS 4 and 5, some low fertility countries are using second-stage stratification of listing by households with and without children under 5 • Higher sampling rate used for households with children • Increases number of households with children in MICS sample, and therefore number of sample children

Sampling strategy for low fertility countries (continued) • Improves the reliability of the child indicators without increasing the sample size to a very high level • This procedure also increases the variability in the weights and the design effects for the overall sample • Important to avoid very large variability in the weights for households with and without children • Differential weights between households with and without children generally should not exceed a factor of about 4

Implications of sampling strategy on sample size calculations • One parameter in the sample size calculation template is the proportion of the indicator subpopulation • Using a higher sampling rate for households with children increases the proportion of children under 5 in the sample • The proportion of children under 5 (or smaller age groups) should be multiplied by a factor that reflects the increase in sample households with children

Implications of sampling strategy on weighting procedures • Under normal MICS sample design, weights vary by sample cluster • With second stage stratification by households with and without children, two weights need to be calculated for each cluster: for households with and without children

Survey weighting procedures • Survey data collected using a complex design featuring clustering, unequal probabilities of selection and stratification: • All analyses must apply survey weights in order to prevent biased results • Formulas for calculating weights depend on the exact sample design used in each country • MICS has 4 set of weights: households, women, men and children

Survey weighting procedures • Components of MICS survey weights: • Design weight: inverse of the final probability of selection for households • Adjustment factors for nonresponse (cluster, household, woman, child level) • Normalized weights so that the total weighted number of observations is equal to the total unweighting number (sample size)

Survey weighting procedures

Sampling Error Estimation • Necessary to evaluate reliability of survey estimates • Possible only when probability sampling is used • Should be done for 30-50 important indicators • Methodology is complex and design-specific • Several software packages: • SPSS Complex Samples module – used in MICS • SAS, Stata, SUDAAN, Clusters,WesVar, CENVAR, PCCarp, etc. • Standard error, confidence intervals and DEFF

Sampling Error Estimation SPSS Complex Samples module • Advantages: • Simple to use • Template syntax available for standard indicators • Supported by MICS Global and Regional staff • Steps: • Set up sampling parameter specifications file (csplan) • Define variables for stratum, PSU and weight

Sampling Error Estimation SPSS Complex Samples module • Stratum should be lowest level of explicit stratification (for example, province, urban/rural) • Necessary to have minimum of two sample PSUs per stratum

Reducing bias • Accuracy of survey results depends on both variance and bias (mostly from nonsampling errors) • Bias should be minimized with quality control for all survey operations • Basic data quality determined during enumeration • Important to have good training and supervision in the field • Data capture should include 100% or sample verification • Important to have quality control for editing and coding procedures • Computer consistency and range checks

Multiple Indicator Cluster Surveys Survey Design Workshop