Access routes to 2001 uk census microdata issues and solutions
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Access routes to 2001 UK Census Microdata: Issues and Solutions PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Access routes to 2001 UK Census Microdata: Issues and Solutions. Jo Wathan SARs support Unit, CCSR University of Manchester, UK [email protected] UK Census context. Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions)

Download Presentation

Access routes to 2001 UK Census Microdata: Issues and Solutions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Access routes to 2001 uk census microdata issues and solutions

Access routes to 2001 UK Census Microdata: Issues and Solutions

Jo Wathan

SARs support Unit, CCSR

University of Manchester, UK

[email protected]


Uk census context

UK Census context

  • Traditional 10 yearly census at present

  • Medium length form (c. 30 person questions, c. 10 household questions)

    • Ethnicity + optional religion question

    • No income question

  • Legal framework in GB is Census Act 1920

    • No statistics Act

    • Legislation only deals with confidentiality restrictions – up to 2years imprisonment!


1991 sars

1991 SARs

  • Samples of Anonymised Records (SARs) from 1991 were first to be released

  • Highly successful. c. 400 research papers used the data between 1993 & 2002. Also used in teaching.

  • SARs are a commissioned output, paid for by UK Economic and Social Research Council.

  • SARs support unit at CCSR represent client, disseminate and support the data.


Disclosure control 1991

Disclosure Control 1991

After work had been undertaken to demonstrate low risk of disclosure

  • Users had to register to use them

  • some ‘broadbanding’ or grouping of rare categories

  • Very large household had individual detail suppressed (12+ residents)

  • 2 non-overlapping files for different interest groups:

    • One for geographers

    • One for sociologists/demographers


What did the 91 sars look like

What did the 91 SARs look like?

Household SAR

Hhd hierarchy

1% (c. 0.6M cases)

Regional

Individual year of age

10 ethnicity categories

358 categories of occupation

Individual SAR

Individual level file

2% (c. 1.2M cases)

Geography population threshold 120k = 278 SAR areas

Individual year of age

10 ethnicity categories

73 categories of occupation


Request for 2001 sars

Request for 2001 SARs

  • New work on disclosure control showed that we had previously overestimated the risk of disclosure

    • Requested larger sample size

    • Slightly more geography

    • A 3rd SAR for small areas

  • However new stricter interpretation of degree of disclosure risk required

  • Initial level of detail available would not provide files of sufficient use for research


Access routes to 2001 uk census microdata issues and solutions

Why?

  • Census Office concerns:

    • Perceived increased levels of concern amongst respondents

    • Increased data processing power

    • Increased levels of storage of personal information that might be used to match to the data

  • Major strategic review of data stewardship issues at the time that Census outputs due for release


Principles

Principles

  • Ongoing need for user consultation

  • Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other

  • Trading different types of detail against each other: geog against socio/demographic etc.

  • Flexible approach to combining a range of access and disclosure approaches:

    • Safe Data

    • Safe Users

    • Safe Setting

  • International role models were very helpful


Where we are now

Where we are now

  • Have succeeded in obtaining access to

    • End User License- Safe Data2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes

    • Special License – Safe Users1 Dataset available for distribution but with extra access conditions

    • Controlled Access Microdata- Safe SettingMuch more detailed versions of 2 datasets available in a safe setting


Safe data end user license files

Safe Data: End User License Files

  • Standard online application procedure for those with electronic signature (otherwise equivalent paper system)Not public data!

  • Available for very low risk files

  • Risk reduced by

    • Broadbanding (e.g. age, geography)

    • Perturbing data


Eul files

EUL Files

Individual SAR

Individual level file

3% (c. 1.8M cases)

Regional (13 categories

Ages 16-74 banded

16 categories of ethnicity

81 categories of occupation

Small area microdata

Individual level file

5% (c. 3 M cases)

Local authority geography (< 90k)

13 Age bands (c. 10 years)

13 categories of ethnicity

Only broad social class variable (economic activity 3 groups)


Safe users the 2001 s l household sar

Safe Users: The 2001 S-L Household SAR

  • Additional Complexity of a household SAR required special license

No geography at all & not available for Northern Ireland or Scotland

Age in 2-year bands of

16 categories of ethnicity

81 categories of occupation


Safe setting

Safe setting

  • To compensate for loss of detail in the end user and special license files

  • Same records as Individual and Household SARs but with MUCH more detail

  • Managed by the Census offices

  • Access currently at only a handful of census office sites

  • Virtual microdata laboratory environment, outputs manually checked prior to release to user

  • Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office


Controlled access microdata

Controlled Access Microdata

Individual CAM

Individual level file

3% (c. 1.2M cases)

Local authority – with context at lower level

Individual year of age to 90+

16 ethncity categories

Over 200 categories of occupation

Household CAM

Hhd hierarchy

1% (c. 0.6M cases)

Local authority – with context at lower level

Individual year of age to 90+

16 ethnicity categories

Over 200 categories of occupation


Conclusion

Conclusion

  • Have a range of research worthy datasets by treating different user groups differently

  • Traded off:

    • Safe data

    • Safe users

    • Safe setting

  • http://www.ccsr.ac.uk/sars


  • Login