1 / 35

Statistical Disclosure Control for the 2011 UK Census

Statistical Disclosure Control for the 2011 UK Census. Jane Longhurst, Caroline Young and Caroline Miller (ONS). Outline. Context Workplan Progress Short-listing the SDC Methods Quantitative Evaluation Description of the Methods (Advantages and Disadvantages)

turi
Download Presentation

Statistical Disclosure Control for the 2011 UK Census

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)

  2. Outline • Context • Workplan • Progress • Short-listing the SDC Methods • Quantitative Evaluation • Description of the Methods (Advantages and Disadvantages) • Example Evaluation (Risk-Utility Framework) • Summary

  3. Context The UK takes a census every 10 years. Next census due in 2011. This will comprise separate, simultaneous Censuses for England & Wales (ONS), Scotland (GROS) and Northern Ireland (NISRA).

  4. Context • SDC for 2011 Census outputs is a major concern for users • Different SDC methodologies were adopted for standard tabular 2001 Census outputs across UK • Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction • Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

  5. Workplan • Phase 1 (March ’06 – Jan ’07) • UK agreement of key SDC policy issues • Phase 2 (Jan ’07 – Sept ’08) • Evaluation of all methods complying with agreed SDC policy position in terms of risk/utility framework and feasibility of implementation • Phase 3 (Sept ’08 – Spring/Summer ’09) • Recommendations and UK agreement of SDC methodologies for 2011 Census tabular outputs • Phase 4 (Feb ’09 onwards) • Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing

  6. Progress • The UK SDC Policy Position (Nov ‘06) highlighted: • Key risk is attribute disclosure • Consideration of pre-tabular and post-tabular methods • Small cell counts can be included in tables provided uncertainty about the true value is created • Different access agreements for tabular outputs that are seriously compromised by SDC • Tolerable threshold not yet determined, but steer towards less conservative approach

  7. Progress • Development of SDC Strategy • UK SDC working group established to take forward methodological work • UKCDMAC subgroup set up to QA work • Initial stage of methodological research: • Review of SDC in census context (May ’07) • Qualitative evaluation of SDC methods for 2011 Census outputs • Focus on tabular outputs whilst considering impact on other outputs

  8. Progress • UK SDC working group met in August • Produced short-list of SDC methods • SDC methods assessed against criteria in line with Registrars General policy statement • Formal QA and sign-off of criteria and short-listed SDC methods • Short-listed methods will undergo thorough quantitative evaluation and should maximise data utility whilst minimising disclosure risk

  9. Short-listing: Criteria • Method should: • prevent new information being derived • prevent disclosure by differencing and enable flexible table generation • Could use special access arrangements if disclosure control seriously comprises some tabular outputs • Table design methods applied alongside chosen method

  10. Short-listing: Criteria • Trade off between risk and utility needs to be evaluated quantitatively • Many potential SDC methods which could be used but not possible to conduct quantitative evaluation of each method • Need to consider qualitative aspect using high-level review of advantages and disadvantages of SDC methods • Qualitative and subsequent quantitative evaluations used in combination to establish recommended SDC method(s) for 2011 Census

  11. Short-listing: Criteria • Each method assessed against a set of 7 qualitative criteria (primary and secondary): • Primary criteria • Additivity and consistency • Overall user acceptability • Protection against differencing • Feasibility of implementation • Secondary criteria • Impact on microdata releases • Simple to understand • Easy to account for in analyses

  12. Short-listing: Scoring • Following methods considered for short-listing: • Record Swapping • Over-Imputation • Data Switching • Post Randomisation Method (PRAM) • Sampling • Conventional Rounding • Random Rounding • Small Cell adjustment • Controlled Rounding • Semi-Controlled Rounding • Suppression • Barnardisation • ABS Cell Perturbation Method

  13. Short-listing: Scoring • For each criteria, method assigned score: • 0 = method not meet criteria • 1 = method partly meets criteria • 2 = method does meet criteria • Primary criteria given double weighting • Overall score and ranking assigned to each method • Methods failing on primary criteria were discounted

  14. Short-listing: Scoring • Majority of SDC methods failed primary criteria and were discounted from short-list. • For example: • PRAM - difficult to implement and not proven for Census data • Sampling – lowuser acceptance of weighted tables • Rounding – low user acceptance of rounding methods • Suppression – extremely difficult to implement to protect against differencing

  15. Short-listed SDC Methods • Record swapping • Over-imputation • ABS Cell Perturbation method • Small cell adjustment with record swapping (to provide comparison with 2001)

  16. Quantitative Evaluation • Examine how methods protect and manage risk and how they impact on data utility • Plan to use range of 2001 Census tables, varying parameters, different geographies • Information Loss software will be used to evaluate each short-listed method • Consideration will be given to other issues, e.g. comparisons over time, communal establishments, imputation rates

  17. What do the methods do? The short-list Record Swapping ABS Cell Perturbation Over-imputation

  18. Record Swapping - Summary • 2001 Random Record Swapping method: • % households swapped across OAs • Swap within LA to preserve marginal distributions at this level • Matches found using control variables • Age • Gender • Hard to Count Index (census enumeration) • Household Size • All non-geographic fields swapped • Random /Targeted

  19. Record Swapping - Summary

  20. ABS Cell Perturbation - Summary • Developed by the Australian Bureau of Statistics • In use for their 2006 Census data • Based on random numbers assigned to each record • Then each table is adjusted independently in two stages: • (1) Adding perturbations to each cell • (2) Restoring additivity of whole table

  21. ABS Cell Perturbation - Summary • Assign each microdata record a random number between 1 and m called an rkey • For each cell in a particular table: • Calculate the cell key according to a function of the rkeys • Using a look-up table, read off the perturbation to add where ckeys are the columns and original values are the rows of the lookup table • Perturbation added to original cell value • ABS additivity module not yet evaluated

  22. Example Look-up Table

  23. ABS Cell Perturbation - Summary

  24. Over-imputation - Summary • Involves randomly selecting a percentage of microdata records which then have certain variables erased. • Select donors matching on control variables and the erased variables are then imputed • Various approaches to over-imputation will be considered

  25. Over-imputation - Summary

  26. Quantitative Evaluation • An example of how the quantitative evaluation will be carried out…. • Preliminary study comparing swapping and ABS cell perturbation using ideas developed by Natalie Shlomo (framework of balancing risk and utility)

  27. Preliminary Evaluation: Tables used • 2001 UK Census Tables • EA: Southampton, Eastleigh, Test Valley (SJ)

  28. Measuring Disclosure Risk • Main risk • small cells in tables • small cells in differenced tables • Disclosure Risk = proportion of records in the small cells that have not been perturbed

  29. Disclosure Risk: OA and Ward

  30. Measuring Information Loss Utility (information loss) measures compare statistical quality of original and protected tables • Measure distortion to internal cell distributions • Compare variance of cell counts • Measure impact on rank correlations

  31. Distance Metrics at Output Area level

  32. Variance of Cell Counts: OA and Ward

  33. Impact on Rank Correlations: OA and Ward

  34. Summary • Ongoing progress made for 2011 Census • Thorough quantitative evaluation of short-list over next year, using 2001 method as benchmark • Important to strike balance between minimising disclosure risk and maximising data utility • Qualitative and quantitative evaluations used in combination to establish recommended approach to SDC for 2011 Census • User communication and consultation will take place throughout the work programme

  35. Contact Details • Jane.Longhurst@ons.gov.uk • Caroline.Miller@ons.gov.uk • Caroline.Young@ons.gov.uk

More Related