1 / 17

Improvements in methodology for matching the 2021 Census to the Census Coverage Survey

Improvements in methodology for matching the 2021 Census to the Census Coverage Survey. Sarah Cummins, Shelley Gammon, Peter Jones. What is the Census Coverage Survey?. the census aims to count everybody but some people will be missed

fdoris
Download Presentation

Improvements in methodology for matching the 2021 Census to the Census Coverage Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improvements in methodology for matching the 2021 Census to the Census Coverage Survey Sarah Cummins, Shelley Gammon, Peter Jones

  2. What is the Census Coverage Survey? • the census aims to count everybody but some people will be missed • the Census Coverage Survey (CCS) is used to facilitate the estimation of non-response • 1% sample of postcodes • run 6 weeks after census • match to the census to estimate non-response CCS Census

  3. Census to CCS matching • quality requirements for matching very high since errors in matching will impact the coverage adjustment: • false positive (FP) or incorrect link • false negative (FN) or missed match

  4. 2011 Census to CCS matching • automatic match rate: 70% of person matches • clerical resource: equivalent of 30 FT staff over 30 weeks • methods: • exact and probabilistic matching (high threshold) • hierarchical approach matching households first and then individuals within households • batch processed by geography • quality: FP rate <0.01%, FN rate <0.25%

  5. Purpose of research • Can automated matching be increased in 2021without incurring unacceptable numbers of false positives? trade off between clerical review FP error

  6. Automated matching • Progress made in automated matching methods to deal with large pseudonymised datasets • match-keys (deterministic) • automated probabilistic matching - Fellegi-Sunter • other experimental methods, i.e. associative matching

  7. Methods • 2011 Census and CCS matching used as ‘gold standard’ dataset • links made in 2011 were treated as true matches due to high quality standards • Census and CCS were re-matched using new methods • quality of new matching determined by comparing to links made in 2011 to estimate: • % false positive rates • % false negative rates • Research conducted in secure environment

  8. Methods • hierarchical approach first looking at record pairs with agreement on postcode only • (1) match keys within postcode • + (2) probabilistic matching within postcode • + (3) match keys outside postcode • + (4) probabilistic matching outside postcode • aim for FP <0.25%

  9. Results overview

  10. Results (1) - match keys within postcode each required agreement on postcode • overall FP rate = 0.07% • overall FN rate = 16.5% *cumulative

  11. Results (2) - probabilistic within postcode

  12. Implications – clerical resolution 2011 Census to CCS matching: • matches left ≈ 195,000 • (1) match keys at 0.07% FP error: • matches left ≈ 100,000 • (2) probabilistic at 0.1% FP error: • matches left ≈ 60,000 • (2) probabilistic at 0.25% FP error: • matches left ≈ 25,000

  13. Implications – bias in linkage • quinary sex / age Z = sex missing 999 = age missing

  14. Implications – bias in linkage • local authority

  15. Further work • Error: • What is the tolerance for error? • Can we focus on adjusting for error rather than minimising it? • Clerical resource: • How can we accurately estimate potential reductions in clerical resource? • How can we minimise clerical searching? • Design changes in Census/CCS: • i.e. response channel

  16. Future work – response channel • Can we generalise these results if the 2021 Census will be predominantly online? • comparing FP and FN rate of online forms and paper forms • comparing responses from people who have submitted both a paper and online 2011 Census form • Early results indicate that in particular forename and surname are better quality when submitted online

  17. Thank you for listening Any questions? Feel free to contact us: sarah.cummins@ons.gov.uk data.linkage@ons.gov.uk

More Related