Access routes to 2001 uk census microdata issues and solutions
Sponsored Links
This presentation is the property of its rightful owner.
1 / 15

Access routes to 2001 UK Census Microdata: Issues and Solutions PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Access routes to 2001 UK Census Microdata: Issues and Solutions. Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk. UK Census context. Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions)

Download Presentation

Access routes to 2001 UK Census Microdata: Issues and Solutions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Access routes to 2001 UK Census Microdata: Issues and Solutions

Jo Wathan

SARs support Unit, CCSR

University of Manchester, UK

Jo.wathan@manchester.ac.uk


UK Census context

  • Traditional 10 yearly census at present

  • Medium length form (c. 30 person questions, c. 10 household questions)

    • Ethnicity + optional religion question

    • No income question

  • Legal framework in GB is Census Act 1920

    • No statistics Act

    • Legislation only deals with confidentiality restrictions – up to 2years imprisonment!


1991 SARs

  • Samples of Anonymised Records (SARs) from 1991 were first to be released

  • Highly successful. c. 400 research papers used the data between 1993 & 2002. Also used in teaching.

  • SARs are a commissioned output, paid for by UK Economic and Social Research Council.

  • SARs support unit at CCSR represent client, disseminate and support the data.


Disclosure Control 1991

After work had been undertaken to demonstrate low risk of disclosure

  • Users had to register to use them

  • some ‘broadbanding’ or grouping of rare categories

  • Very large household had individual detail suppressed (12+ residents)

  • 2 non-overlapping files for different interest groups:

    • One for geographers

    • One for sociologists/demographers


What did the 91 SARs look like?

Household SAR

Hhd hierarchy

1% (c. 0.6M cases)

Regional

Individual year of age

10 ethnicity categories

358 categories of occupation

Individual SAR

Individual level file

2% (c. 1.2M cases)

Geography population threshold 120k = 278 SAR areas

Individual year of age

10 ethnicity categories

73 categories of occupation


Request for 2001 SARs

  • New work on disclosure control showed that we had previously overestimated the risk of disclosure

    • Requested larger sample size

    • Slightly more geography

    • A 3rd SAR for small areas

  • However new stricter interpretation of degree of disclosure risk required

  • Initial level of detail available would not provide files of sufficient use for research


Why?

  • Census Office concerns:

    • Perceived increased levels of concern amongst respondents

    • Increased data processing power

    • Increased levels of storage of personal information that might be used to match to the data

  • Major strategic review of data stewardship issues at the time that Census outputs due for release


Principles

  • Ongoing need for user consultation

  • Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other

  • Trading different types of detail against each other: geog against socio/demographic etc.

  • Flexible approach to combining a range of access and disclosure approaches:

    • Safe Data

    • Safe Users

    • Safe Setting

  • International role models were very helpful


Where we are now

  • Have succeeded in obtaining access to

    • End User License- Safe Data2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes

    • Special License – Safe Users1 Dataset available for distribution but with extra access conditions

    • Controlled Access Microdata- Safe SettingMuch more detailed versions of 2 datasets available in a safe setting


Safe Data: End User License Files

  • Standard online application procedure for those with electronic signature (otherwise equivalent paper system)Not public data!

  • Available for very low risk files

  • Risk reduced by

    • Broadbanding (e.g. age, geography)

    • Perturbing data


EUL Files

Individual SAR

Individual level file

3% (c. 1.8M cases)

Regional (13 categories

Ages 16-74 banded

16 categories of ethnicity

81 categories of occupation

Small area microdata

Individual level file

5% (c. 3 M cases)

Local authority geography (< 90k)

13 Age bands (c. 10 years)

13 categories of ethnicity

Only broad social class variable (economic activity 3 groups)


Safe Users: The 2001 S-L Household SAR

  • Additional Complexity of a household SAR required special license

No geography at all & not available for Northern Ireland or Scotland

Age in 2-year bands of

16 categories of ethnicity

81 categories of occupation


Safe setting

  • To compensate for loss of detail in the end user and special license files

  • Same records as Individual and Household SARs but with MUCH more detail

  • Managed by the Census offices

  • Access currently at only a handful of census office sites

  • Virtual microdata laboratory environment, outputs manually checked prior to release to user

  • Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office


Controlled Access Microdata

Individual CAM

Individual level file

3% (c. 1.2M cases)

Local authority – with context at lower level

Individual year of age to 90+

16 ethncity categories

Over 200 categories of occupation

Household CAM

Hhd hierarchy

1% (c. 0.6M cases)

Local authority – with context at lower level

Individual year of age to 90+

16 ethnicity categories

Over 200 categories of occupation


Conclusion

  • Have a range of research worthy datasets by treating different user groups differently

  • Traded off:

    • Safe data

    • Safe users

    • Safe setting

  • http://www.ccsr.ac.uk/sars


  • Login