170 likes | 177 Views
Improvements in methodology for matching the 2021 Census to the Census Coverage Survey. Sarah Cummins, Shelley Gammon, Peter Jones. What is the Census Coverage Survey?. the census aims to count everybody but some people will be missed
E N D
Improvements in methodology for matching the 2021 Census to the Census Coverage Survey Sarah Cummins, Shelley Gammon, Peter Jones
What is the Census Coverage Survey? • the census aims to count everybody but some people will be missed • the Census Coverage Survey (CCS) is used to facilitate the estimation of non-response • 1% sample of postcodes • run 6 weeks after census • match to the census to estimate non-response CCS Census
Census to CCS matching • quality requirements for matching very high since errors in matching will impact the coverage adjustment: • false positive (FP) or incorrect link • false negative (FN) or missed match
2011 Census to CCS matching • automatic match rate: 70% of person matches • clerical resource: equivalent of 30 FT staff over 30 weeks • methods: • exact and probabilistic matching (high threshold) • hierarchical approach matching households first and then individuals within households • batch processed by geography • quality: FP rate <0.01%, FN rate <0.25%
Purpose of research • Can automated matching be increased in 2021without incurring unacceptable numbers of false positives? trade off between clerical review FP error
Automated matching • Progress made in automated matching methods to deal with large pseudonymised datasets • match-keys (deterministic) • automated probabilistic matching - Fellegi-Sunter • other experimental methods, i.e. associative matching
Methods • 2011 Census and CCS matching used as ‘gold standard’ dataset • links made in 2011 were treated as true matches due to high quality standards • Census and CCS were re-matched using new methods • quality of new matching determined by comparing to links made in 2011 to estimate: • % false positive rates • % false negative rates • Research conducted in secure environment
Methods • hierarchical approach first looking at record pairs with agreement on postcode only • (1) match keys within postcode • + (2) probabilistic matching within postcode • + (3) match keys outside postcode • + (4) probabilistic matching outside postcode • aim for FP <0.25%
Results (1) - match keys within postcode each required agreement on postcode • overall FP rate = 0.07% • overall FN rate = 16.5% *cumulative
Implications – clerical resolution 2011 Census to CCS matching: • matches left ≈ 195,000 • (1) match keys at 0.07% FP error: • matches left ≈ 100,000 • (2) probabilistic at 0.1% FP error: • matches left ≈ 60,000 • (2) probabilistic at 0.25% FP error: • matches left ≈ 25,000
Implications – bias in linkage • quinary sex / age Z = sex missing 999 = age missing
Implications – bias in linkage • local authority
Further work • Error: • What is the tolerance for error? • Can we focus on adjusting for error rather than minimising it? • Clerical resource: • How can we accurately estimate potential reductions in clerical resource? • How can we minimise clerical searching? • Design changes in Census/CCS: • i.e. response channel
Future work – response channel • Can we generalise these results if the 2021 Census will be predominantly online? • comparing FP and FN rate of online forms and paper forms • comparing responses from people who have submitted both a paper and online 2011 Census form • Early results indicate that in particular forename and surname are better quality when submitted online
Thank you for listening Any questions? Feel free to contact us: sarah.cummins@ons.gov.uk data.linkage@ons.gov.uk