1 / 27

Progress on the SDC Strategy for the 2011 Census

Progress on the SDC Strategy for the 2011 Census. 23 rd June 2008 Keith Spicer and Caroline Young. Outline. Context Work plan Description of the short-listed methods Quantitative Evaluation – some results! Conclusions and Further Work. Context.

cole
Download Presentation

Progress on the SDC Strategy for the 2011 Census

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress on the SDC Strategy for the 2011 Census 23rd June 2008 Keith Spicer and Caroline Young

  2. Outline • Context • Work plan • Description of the short-listed methods • Quantitative Evaluation – some results! • Conclusions and Further Work

  3. Context • SDC for 2011 Census outputs is a major concern for users • Different SDC methodologies were adopted for tabular 2001 Census outputs across UK • Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction • Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

  4. Workplan • Phase 1 (March ’06 – Jan ’07) • UK agreement of key SDC policy issues • Phase 2 (Jan ’07 – Sept ’08) • Evaluation of all methods complying with agreed SDC policy position in terms of risk/utility framework and feasibility of implementation • Phase 3 (Sept ’08 – Spring/Summer ’09) • Recommendations and UK agreement of SDC methodologies for 2011 Census tabular outputs • Phase 4 (Feb ’09 onwards) • Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing

  5. Progress • Development of SDC Strategy • UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland • UKCDMAC subgroup set up to QA work • Methodological research: • Determine the short-list of SDC methods (Aug ‘07) • Quantitative evaluation of short-list (complete Sep ’08) • Focus on tabular outputs whilst considering impact on other outputs (e.g. microdata)

  6. Quantitative Evaluation • Examine how methods protect and manage risk and how they impact on data utility • Using a range of 2001 Census tables, varying parameters, different geographies • Information Loss software used to evaluate each short-listed method

  7. Short-listed Methods being considered for 2011 Census data • Applied so that ‘safe’ tabular outputs can be released • Record Swapping • Over-imputation • ABS Cell Perturbation (developed by the Australian Bureau of Statistics) • 2001 Census SDC methods used as a baseline for comparison: Record Swapping and Small Cell Adjustment (SCA)

  8. Short-listed SDC methods • Record Swapping pre-tabular (applied • Over-imputation directly to the microdata) • ABS Cell Perturbation: post-tabular (applied to tables) • SCA (a type of rounding) is also a post-tabular method

  9. Record Swapping • Swap the geographical location of a small number of households • Households are paired according to similar characteristics (to avoid too much data distortion) • Creates uncertainty in the data • Can swap unique records only (those at greater risk)

  10. Characteristics: Age: 22, Sex: Male, Marital Status: Married No of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married No of Cars: 1 Region: Area B A B Unique as only person with 3 cars in Area A Area B Matches all variables except No of Cars Swap records Record Swapping Treatment: • Find a different geographical Area • Identify another individual in a different area with virtually all the same characteristics • Swap the two records

  11. Over-Imputation • Imputation is a standard procedure for census data used to insert plausible values for those missing due to non-response • Since it is not known whether these records are true or false, can also be used for SDC • Carried out by the Edit and Imputation team at ONS using CANCEIS • Algorithm: distance based nearest neighbour to use as a donor based on a set of matching variables

  12. Over-Imputation • Blank out values for certain records in the data 2) Replace blanked out values with ‘imputed values’ using a nearest neighbour donor Blank out age from record Find a donor to impute age

  13. Over-Imputation • Which variables to impute? • Risky variables? Ethnicity, elderly, other minority populations • CANCEIS may impute exactly if using nearest neighbour donor • Impute age (all donors) and small area geography (use only donors within same local authority): get a small margin of error

  14. (ABS) Cell Perturbation • Developed by the Australian Bureau of Statistics (ABS) • Perturb each cell value in a table to create uncertainty around the true value • Two stage method: • Stage 1: Adding Perturbation • Stage 2: Restoring Additivity

  15. (ABS) Cell Perturbation • Stage 1: Each cell is always perturbed in the same way using microdata keys – CONSISTENCY • Stage 2: Restoring ADDITIVITY means consistency is lost slightly • An improved approach is being developed in collaboration with Southampton University: optimise consistency and additivity – INVARIANT cell perturbation.

  16. Results • What is the effect on statistical quality of the data? • Tendency to increase correlations? • Tendency to distort distance metrics? • etc (many ways to measure infoloss) • Impact on disclosure risk • Examine different types of data

  17. Results • Only Over-Imputation, Record Swapping and Record Swapping with SCA have been evaluated so far. • Both targeted and random approaches are being looked at. • Note there are different ways of carrying out swapping and imputation, so interpretation of the results should take this into account.

  18. Data for Analysis • SJ EA; approx. 200,000 households and 500,000 persons • Four census tables so far: • Country of birth by religion by sex Individuals at ward level (2) Number of persons by accommodation type Households at OA and ED level (3) Age by religion by gender Individuals at OA and ED level (4) Origin-destination table Flows between home and travel to work location

  19. Measures of Quality • Impact on Tests for Independence: Cramer’s V measure of association: where is the Pearson chi-square statistic Also, the same measure for entropy and the Pearson Statistic • Variance of Cell Counts: For each row : and

  20. Measures of Utility • Impact on Rank Correlations: Sort original cell counts and define deciles Repeat on perturbed cell counts where I is the indicator function and the number of rows • Log Linear Analysis: Ratio of the deviance (likelihood ratio test statistic) between perturbed table and original table for a given model:

  21. Impact on Disclosure Risk

  22. Quality Measures

  23. Quality Measures

  24. Changes to Totals / Subtotals • Swapping does not change the overall set of household locations • Totals and subtotals by geography preserved • Over-Imputation does change set of locations • Totals and subtotals by geography not preserved • Swapping has no impact on Origin-Destination total flows – NO PROTECTION • Over-Imputation does not preserve O/D total flows – POOR QUALITY

  25. Conclusions • Decide whether to drop over-imputation: test on another EA? • Quantitative Evaluation to be finished by September ’08 • ABS cell perturbation method currently being evaluated – results are looking good

  26. Further Work • Setting of parameter values for final method; e.g. level of perturbation • Protection of microdata samples • Communal establishments • Output specification / geography • System specification, development and testing

  27. Contact Details Keith.spicer@ons.gov.uk Caroline.young@ons.gov.uk Useful links: www.statistics.gov.uk/census/2011census/producingdata/outputconfidentiality.asp www.statistics.gov.uk/census2001/discloseprotect.asp

More Related