420 likes | 593 Views
Presentation Outline. Health Information RoadmapOrigin of the CCHSObjectives / ContentCCHS two-year planCCHS Cycle 1.1 - Sample DesignAllocation, frameSelection - OversamplingData CollectionImputationWeighting, sampling errorBootstrap Variance EstimationData QualityData DisseminationCCH
E N D
1.
Canadian Community Health Survey
A new program for collecting health information
Interuniversity Research Data Seminar
University of British Columbia
Béland Yves
Household Survey Methods Division
Statistics Canada
February 19, 2002
2. Presentation Outline Health Information Roadmap
Origin of the CCHS
Objectives / Content
CCHS two-year plan
CCHS Cycle 1.1 - Sample Design
Allocation, frame
Selection - Oversampling
Data Collection
Imputation
Weighting, sampling error
Bootstrap Variance Estimation
Data Quality
Data Dissemination
CCHS Cycle 1.2 - Overview
Future Cycles of CCHS
3. Health Information Roadmap Four-year action plan to strengthen Canada’s health information system
Earmarks funds for specific priorities/activities based on national vision and provincial/regional consultations
Partners: Health Canada, Canadian Institute on Health Information (CIHI) and Statistics Canada
Key elements:
fill critical data gaps in health services and address population health data gaps at a sub-provincial level
foster common data and technical standards
develop indicators and conduct special studies
4. Canadian Community Health Survey Results of the Consultation Process Assess health measure variations at many levels of geography
Collect data on issues unique to a health region or province
Respond quickly to emerging issues
Explore certain key health issues in-depth
Analyse the effects of shocks including policy changes
5. Canadian Community Health Survey Two-year Plan Cycle 1.1 - Health region-level survey
Produce reliable estimates for sub-provincial areas
Continuous monthly collection : Sept. 2000 - Nov. 2001
Sample size : 133,300 respondents
Questionnaire content
health determinants
health status
utilization of health services
socio-demographic / socio-economic characteristics
Cycle 1.2 - Provincial-level survey
Produce reliable provincial estimates from a sample of 30,000 respondents
Monthly collection : May 2002 - Dec. 2002
In-depth focus content: 90-100 minute interviews on mental health and well-being
6. CCHS and NPHS A More Robust Health Survey Program CCHS
cross-sectional
sample of 160,000 respondents over two years
national, provincial and regional level estimates
customized questionnaires at regional level
built-in flexibility for buy-in sample and/or content
continuous development of in-depth health content
NPHS - Household
« goes longitudinal » only, starting in wave 4
sample of 20,000 persons
national and provincial level estimates
NPHS - Health Care Institutions
longitudinal and cross-sectional
sample of 2,500
national level estimates
7. CCHS - Cycle 1.1 Health Region-level survey Produce timely cross-sectional estimates for 136 health regions
Target population
individuals living in private occupied dwellings aged 12 years old or over
Exclusions: those living on Indian Reserves and Crown Lands, residents of institutions, full-time members of the Canadian Armed Forces and residents of some remote areas
CCHS 1.1 covers ~98% of the Canadian population
8. CCHS - Questionnaire content 45-minute interview questionnaire
30 minutes of common modules common to all health regions
10 minutes of optional items selected by health regions from a predefined list of modules
5 minutes of standard socio-economic items
27 different versions of the questionnaire
The complete questionnaire can be found at www.statcan.ca/health_surveys
9. CCHS - Sample Allocation to Provinces Prov Pop # of 1st Step 2nd Step Total
Size HRs 500/HR X-prop Sample
NFLD 551K 6 *2,780 1,230 4,010
PEI 135K 2 1,000 1,000 2,000
NS 909K 6 3,000 2,040 5,040
NB 738K 7 3,500 1,650 5,150
QUE 7,139K 16 8,000 16,280 24,280
ONT 10,714K 37 18,500 23,760 42,260
MAN 1,114K 11 5,500 2,500 8,000
SASK 990K 11 *5,400 2,320 7,720
ALB 2,697K 17 *8,150 6,050 14,200
BC 3,725K 20 10,000 8,090 18,090
CAN 29,000K 133 65,830 64,920 130,750
* The sampling fraction in some small HRs was capped at 1 in 20 households
10. CCHS - Sample Allocation to Health Regions Pop. Size # of Mean
Range HRs Sample Size
Small less than 75,000 41 525
Medium 75,000 - 240,000 60 900
Large 240,000 - 640,000 25 1,500
X-Large 640,000 and more 7 2,500
11. CCHS - Sample Allocation to Territories Population Sample
Yukon 25,000 850
NWT 36,000 900
Nunavut 22,000 800
12. CCHS - Sample Frame CCHS sample selected from three frames:
Area frame (Labour Force Survey structure)
RDD frame of telephone numbers (Random Digit Dialling)
List frame of telephone numbers
Three frames are needed for CCHS for the following reasons:
1. To yield the desired sample sizes in all health regions
2. Have a telephone data collection structure in place to quickly address provincial/regional requests for buy-in sample and/or content at any point in time
3. Optimize collection costs
13. Area frame - Sampling of households
83% of CCHS sampled households
Stratified multistage sample design
14. RDD frame of telephone numbers Sampling of households Elimination of non-working banks method
7% of CCHS sampled households
Telephone bank: area code + first 5 digits of a 7-digit phone #
1- Keep the banks with at least one valid phone #
2- Group the banks to encompass as closely as possible the health region areas - RDD strata
3- Within each RDD stratum, first select one bank at random and then generate at random one number between 00 and 99
4- Repeat the process until the required number of telephone numbers within the RDD stratum is reached
15. List frame of telephone numbers Sampling of households Simple random sample of telephone numbers
10% of CCHS sampled households
Telephone companies’ billing address files and Telephone Infobase (repository of phone directories)
1- Create a list of phone numbers
2- Stratify the phone numbers by health region using the residential postal codes
3- Select phone numbers at random within a health region
4- Repeat the process until the required number of telephone numbers is reached
16. CCHS - Sampling of persons Area frame
SRS of one person aged 12 years of age or older (82% of households)
SRS of two persons aged 12 years of age or older (18%)
RDD / List frames
SRS of one person aged 12 years of age or older
17. CCHS - Sampling of persons Age 1996 LFS * CCHS
group Census sample simulated (all persons) sample
( only 1 person)
12-19 13.2 13.7 8.5
20-29 16.4 14.4 14.3
30-44 30.8 28.7 29.1
45-64 25.8 28.0 27.9
65 + 13.8 15.2 20.2
* averaged distribution over 100 repetitions using the May 99 LFS sample
18. CCHS - Representativity of sub-populations To address users’ needs, two sub-population groups needed larger effective sample sizes:
Youths (12-19 years old)
Decision > Oversample youths by selecting a second person (12-19) in some households based on their composition
Elderlies (65 years old and +)
Decision > Do not oversample - let the general sample selection process address the issue by itself
19. Sampling strategy based on household composition Number of persons aged 20 or over
Number 0 1 2 3 4 5+ of 12-19
0 - A A A A B
1 A A C C C B
2 A C C C C C
3+ A C C C C C
A: SRS of one person aged 12+
B: SRS of two persons aged 12+
C: SRS of one person in the age group 12-19 and SRS of one person 20+
20. CCHS - Sample Distribution after Oversampling
Age 1996 * CCHS * CCHS
group Census simulated simulated sample sample
( only 1 person) ( some 2 persons)
12-19 13.2 8.5 14.9
20-29 16.4 14.3 13.1
30-44 30.8 29.1 28.1
45-64 25.8 27.9 26.3
65 + 13.8 20.2 17.6
* averaged distribution over 100 repetitions using the May 99 LFS sample
21. CCHS - Initial data collection plan 12 monthly samples
12 collection months + 1
Area frame
CAPI
STC field interviewers
targeted response rate: 90%
anticipated vacancy rate: 13%
(09 / 2000 - 08 / 2001) + 09 / 2001
RDD / List frames
CATI
STC call centres
targeted response rate: 85%
telephone hit rate: 15-60%
22. CCHS data collection - Observed situation
Field interviewers
workload exceeded field staff capacity
Call centres
new collection infrastructure
unequal allocation of work among call centres
23. CCHS - Final response rates
Field Call centres Total
NFLD 86.6 89.3 86.8
PEI 87.7 82.6 84.7
NS 88.8 89.3 88.8
NB 88.4 92.4 88.5
QUE 85.7 84.8 85.6
ONT 82.8 79.5 82.0
MAN 90.0 85.0 89.5
SASK 87.0 85.4 86.8
ALB 85.2 84.9 85.1
BC 83.9 86.7 84.7
YUK 79.3 95.6 82.7
NWT 89.6 85.4 89.2
NUN * 66.3 34.6 62.5
CAN 85.1 83.1 84.7
24. CCHS - Proxy interviews Higher number of proxy interviews than expected
~ 6% instead of 2-3%
Major consequence: one third of the questionnaire is missing which could be proble- matic for small health regions
Solution : Imputation
25. CCHS - Imputation 3-step strategy
common modules / mental health related optional modules / other optional modules
more than 2,000 imputation classes (region, age, sex, questionnaire type, skip patterns, etc…)
hot-deck imputation using nearest neighbour approach according to 12-16 key characteristics
26. CCHS - Weighting and Estimation Three separate weighting systems:
Area frame design
RDD frame design
List frame design
Several adjustments
non-response (household and person)
seasonal factor
etc...
Integration of the two weighting systems based on Deffs
Calibration using a one-dimensional poststratification adjustment of ten age/sex poststrata within each health region
Variance estimation : bootstrap re-sampling approach
set of 500 bootstrap weights for each individual
27. CCHS Weighting Strategy
28. Weighting & Estimation
29. CCHS - Special Weights For various reasons, many other weights are produced
Quarter 4 special weight
PEI special weight
Share weights (master, Q4 and PEI special)
Link weights (master, Q4 and PEI special)
30. Sampling Error Difference in estimates obtained from a sample as compared to a census
The extent of this error depends on four factors:
sample size
variability of the characteristic of interest
sample design
estimation method
Generally, the sampling error decreases as the size of the sample increases
31. Sampling Error Measure of precision, reliability of the estimates
Variance (standard deviation)
Coefficient of variation
Standard deviation of estimate x 100% / estimate itself
CV allows comparison of precision of estimates with different scales
Example:
24% of population are daily smokers, std dev. = 0.003
CV=0.003/0.24 x 100%=1.25%
32. Sampling Variability Guidelines Type of estimate CV Guidelines
Acceptable 0.0-16.5 General unrestricted release
Marginal 16.6-33.3 General unrestricted release but with warning cautioning users of the high sampling variablitity. Should be identified by letter M.
Unacceptable > 33.3 No release.
Should be flagged with letter U.
33. Sampling Error Measuring sampling error for complex sample designs:
Simple formulas not available
Most software packages do not incorporate design effect (and weights adjustments) appropriately for calculations
Solution for CCHS: the Bootstrap method
34. Bootstrap method Principle:
You want to estimate how precise is your estimation of the number of smokers in Canada
You could draw 500 totally new CCHS samples, and compare the 500 estimations you would get from these samples. The variance of these 500 estimations would indicate the precision.
Problem: drawing 500 new samples is $$$
Solution: Use your sample as a population, and take many smaller subsamples from it.
35. Bootstrap method How CCHS Bootstrap weights are created(the secret is now revealed!!!)
36. Bootstrap Method How Bootstrap replicates are built (cont’d)
The “real” recipe
1- Subsampling of clusters (SRS) within strata
2- Apply (initial design) weight
3- Adjust weight for selection of n-1 among n
4- Apply all standard weight adjustments (nonresponse, share, etc.)
5- Post-stratification to population counts
The bootstrap method intends to mimic the same approach used for the sampling and weighting processes
37. Bootstrap Method Sampling weight vs. Bootstrap weights
Sampling weight used to compute the estimation of a parameter (e.g.: number of smokers)
Bootstrap weights used to compute the precision of the estimation (e.g.: the CV of the number of smokers estimation)
38. CCHS - Data Dissemination Strategy Wide range of users and capacity
136 health regions
13 provincial/territorial Ministries of Health
Health Canada and CIHI
Internal STC analysts
Academics
Others
Data products
Microdata
Analytical products (Health Reports, How Healthy are Canadians, etc…)
Tabular statistics (ePubs, Cansim II, community profiles, etc…)
Client support (head and regional offices, CCHS website, workshops, etc…)
39. CCHS - Access to microdata Master file
all records, all variables
Statistics Canada
university research data centres
remote access
Share / Link files
respondents who agreed to share / link
provincial/territorial Ministries of Health
health regions (through the STC third-party share agreement)
Public Use Microdata File (PUMF)
all records, subset of variables with collapsed response categories
free for 136 health regions
cost recovery for others
40. CCHS - Overview of Cycle 1.2
Produce provincial cross-sectional estimates from a sample of 30,000 respondents
Area frame sample only / one person per household
CAPI only
90-100 minute in-depth interviews on mental health and well-being based on WMH2000 questionnaire
Scheduled to begin collection in May 2002
41. CCHS - Future Plans Same two-year cycle approach:
health region level survey starting in January 2003
provincial level survey starting in January 2004
New consultation process with provincial and regional authorities
Flexible sample designs (adaptable to regional needs)
Development of an in-depth nutrition focus content (Cycle 2.2)
42. CCHS Web site
www.statcan.ca/health_surveys
www.statcan.ca/enquetes_santé
43. Contacts in Methodology Yves Béland:yves.beland@statcan.ca
François Brisebois: francois.brisebois@statcan.ca