1 / 49

Shortcomings of Census Interaction Data

Shortcomings of Census Interaction Data. Oliver Duke-Williams o.w.duke-williams@leeds.ac.uk. Shortcomings. Overall data quality Statistical Disclosure Control Variant geographies Lack of comparability over time. Overall data quality. Generic issues Unit non-response Item non-response

Download Presentation

Shortcomings of Census Interaction Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shortcomings of Census Interaction Data Oliver Duke-Williams o.w.duke-williams@leeds.ac.uk

  2. Shortcomings • Overall data quality • Statistical Disclosure Control • Variant geographies • Lack of comparability over time

  3. Overall data quality • Generic issues • Unit non-response • Item non-response • Interaction data issues • Problems of address recall for migration data • Problems of address accuracy for workplace data • Changing concept of usual residence

  4. Non-response • Unit non-response – under-enumeration – is a problem for all Census data • It particularly affects migration data • Migrants are 2-10 times more likely to be missed from a Census than residents who have not moved – Simpson & Middleton (1997) • Item non-response refers to those people who have completed a Census form, but not answered a specific question

  5. Patterns of non-response: 2001 • Address one year ago, non-response quantiles

  6. Patterns of non-response: 2001 • Workplace postcode, non-response quantiles

  7. Patterns of non-response: 2001 • Method of travel, non-response quantiles

  8. Item non-response • Various possibilities for former residence and workplace addresses • Address correct but no postcode • Part postcode given (e.g. ‘LS1’) • No information given • The 1991 interaction data included the categories ‘address not stated’ and ‘workplace not stated’

  9. Migrant origin not stated • Migrants with origin unstated as % of total inflow, 1990-91 • Limited spatial patterns • Significant numbers for most districts

  10. Item non-response • In 2001, unknown or incomplete addresses were imputed using donor records • First, select possible donors on the basis of predictive variables • SWS: Industry, occupation, establishment size, mode of transport • SMS: Other migrants in household, country of birth, marital status • Use partial information if available • Then, select geographically nearest donor

  11. Shortcomings • Overall data quality • Statistical Disclosure Control • Variant geographies • Lack of comparability over time

  12. Statistical Disclosure Control • Methods applied to interaction data • 1981 • 1991 • 2001

  13. SDC: 1981 • Workplace data – based on 10% sample, therefore no further modification required • Migration data • Set 1 • Within ward • Ward to rest of district (for flows > 25 persons) or ward to rest of county etc. • Set 2 • Ward level, total males and females only

  14. SDC: 1991 • Workplace data – based on 10% sample, therefore no further modification required • Migration data • Suppression applied to some tables

  15. SDC: 1991 – SMS • Set 1: Flows within and between wards • Set 2: Flows within and between districts

  16. SDC: 1991 – SMS Set 2

  17. Greater London Metro counties Other counties (sorted alphabetically) per district totals Extent of suppression • Districts are grouped by county • Shading: • Red: Total migrants >= 10 • Blue: Total migrants 0 < n < 10 • White: Total migrants = 0

  18. Effect of suppressionWhite migrants, 1990-91Published value as % of estimated correct value

  19. Effect of suppressionBlack migrants, 1990-91Published value as % of estimated correct value

  20. Effect of suppressionIndian, P‘stani, B’deshi migrants, 1990-91Published value as % of estimated correct value

  21. Effect of suppressionChinese and other migrants, 1990-91Published value as % of estimated correct value

  22. Effect of suppressionMis-reporting of largest non-white migrant group

  23. Coping with problems - 1991 • Under-enumeration • Suppression

  24. The MIGPOP data set

  25. The MIGPOP data set • MIGPOP data set • Produced by Simpson and Middleton (1999) • Available from CIDER through WICID • Allows for • ‘Missing million’ • Under-reporting of migrants • Migrants with unknown origin • Contains one age by sex table

  26. Suppression • Migration from Mid-Bedfordshire to Avon, 1990-91

  27. SMSGAPS • SMSGAPS dataset incorporates recovered and estimated data for most suppressed tables • Produced by Rees and Duke-Williams (1997) • Contains versions of all SMS Set 2 tables except 11S and 11W • Available from CIDER through WICID

  28. SDC: 2001 • Outputs of the 2001 Census were subject to Small Cell Adjustment Methodology • Initial version of cross-tabulation produced from raw data • ‘Small values’ were then modified • Sub-totals and totals for each table were then recalculated from the modified values

  29. SCAM example

  30. SCAM example

  31. SCAM example ? ? ?

  32. SCAM example

  33. SCAM example

  34. SCAM • SCAM was applied differentially across the UK • This is particularly confusing for the interaction data, as they are explicitly presented as UK level data set • SCAM was applied on the basis of where the data were collected • Migration data were collected at the destination • Flows with destinations in England, Wales and Northern Ireland were subject to SCAM • Workplace data were collected at the residence (origin) • Flows with origins in England, Wales and Northern Ireland were subject to SCAM • In addition, OA level workplace data with origins in Scotland were subject to SCAM • OA level workplace data were not published for Northern Ireland

  35. Frequency of flow totals, 2001SMS Table MG301 Frequency of flow totals, 2001SMS Table MG301: detail Frequency of flow totals, 2001 SWS Table W301: detail Effects of SCAM • Interaction data are characterised by: • Sparse matrices • Dominance of small values • 2001 data characterised by over-reporting of multiples of 3

  36. 2001 data and multiples of 3 • It is the interior cells that are modified • Flow totals are re-calculated from these modified values

  37. Contribution of interior cells to SCAM adjustment of MG301

  38. Coping with problems: 2001 • Tactics for using SCAM affected data • Use average values? • Useful in some situations, but could lead to errors if rates are calculated • Use minimum number of cells to calculate required value

  39. Shortcomings • Overall data quality • Statistical Disclosure Control • Variant geographies • Lack of comparability over time

  40. Variant geographies • Changes between Censuses • A problem that is common across all Census outputs • Differences compared to other Census products • Problems specific to the interaction data, in particular the 2001 data

  41. Differences between Census products • The 2001 interaction data have geographies that do not always match those in the other aggregate data • Level 1: Output Areas • Interaction data are the same as other outputs • Level 2: ‘Wards’ • Interaction data are an amalgam of • CAS wards in England and Wales • ST wards in Scotland • Standard wards in Northern Ireland • Level 3: ‘Districts’ • Interaction data are an amalgam of • London boroughs, metro and other districts, Unitary authorities, Scottish Council Areas • Parliamentary constituencies in Northern Ireland

  42. Problems of different geographies • When mapping data, correct boundary sets are time consuming to assemble • When constructing rates, correct denominators are time consuming to gather • Not all area data are easily available for all of these geographies

  43. Shortcomings • Overall data quality • Statistical Disclosure Control • Variant geographies • Lack of comparability over time

  44. Lack of comparability over time • As well as changes in geography, there are significant changes in data structure over time • General issues • Changes in population base, inclusion of students etc. • Handling of unknown migrant origins or workplace locations • Migration data • Handling of overseas origins • Use of ‘no usual residence’ Workplace data • Handling of off-shore workers • Handling of home-workers

  45. No usual residence in 2001 migration data • Mean: 6.9% • Minimum: 3.7% - Ribble Valley • Maximum: 19% - Newham • 19/20 districts with highest levels are in London • Percentage of all migrants 2000-1, by district, who had ‘no usual residence’ one year prior to the Census

  46. Home-workers • 1981 – Workplace at home is part of general ‘within ward’ flow • Home-workers only be distinguished from others in the ‘mode of transport’ table • 1991 – Workplace at home is a distinct workplace location • All tables can be extracted separately for home-workers • 2001 – Workplace at home is part of general ‘within ward’ flow • Home-workers only be distinguished from others in the ‘mode of transport’ table

  47. Coping with compatibility issues • Various data sets exist that attempt to bridge some of these gaps • Re-estimate for newer geographies • eg 1981 data on 1991 and 2001 boundaries (Boyle and Feng, 2002) • Create hybrid sets • eg merge home-workers into main flow for 1991 • Create best-fit geographies than span time periods • eg CIDS common geographies

  48. Summary • The interaction data suffer from problems related to • Disclosure control modifications • Changes over time • Awkward geographies in 2001 • These have been addressed by • Estimated and re-worked data sets • Data estimated for different boundary sets

  49. References Boyle PJ and Feng Z (2002) A method for integrating the 1981 and 1991 GB Census interaction data Computers, Environment and Urban Systems 26 241-56 Rees, P.H. and Duke-Williams, O. (1997) Methods for estimating missing data on migrants in the 1991 British Census, International Journal of Population Geography, 3: 323-368 Simpson, S. and Middleton, E. (1997) Who is missed by a national Census? A review of empirical results from Australia, Britain, Canada and the USA, CCSR Working Paper No 2 Centre for Census and Survey Research, University of Manchester Simpson, S. and Middleton, E. (1999) Undercount of migration in the UK 1991 Census and its impact on counterurbanisation and population projections, International Journal of Population Geography, 5: 387-405

More Related