1 / 27

All the answers? Statistics New Zealand’s Integrated Data Infrastructure

All the answers? Statistics New Zealand’s Integrated Data Infrastructure. Paper by Felibel Zabala, Rodney Jer, Jamas Enright and Allyson Seyb Presented by Felibel Zabala. Sept 2012. Statistics New Zealand’s Integrated Data Infrastructure (IDI).

sol
Download Presentation

All the answers? Statistics New Zealand’s Integrated Data Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. All the answers? Statistics New Zealand’s Integrated Data Infrastructure Paper by Felibel Zabala, Rodney Jer, Jamas Enright and Allyson Seyb Presented by Felibel Zabala Sept 2012

  2. Statistics New Zealand’s Integrated Data Infrastructure (IDI) • Merges data from different suppliers including Statistics NZ • Variable quality of the different datasets, both within and between

  3. Statistics New Zealand’s Integrated Data Infrastructure (IDI) • Linking clean datasets is not easy, much more difficult for variable quality in datasets • Importance of an effective and efficient editing strategy

  4. Main objective • Present some of the issues on and solutions to any linked administrative dataset with a focus on one of Statistics NZ‘s first integrated dataset, the Linked Employer-Employee Data (LEED)

  5. LEED • Provides the backbone of the IDI prototype • Links longitudinal business data from Statistics NZ’s Business Frame to a longitudinal series of payroll tax data from Inland Revenue (IRD) • Used to produce quarterly statistics that measure labour market dynamics at various levels, eg filled jobs, worker flows, and total earnings

  6. LEED Payroll data • Collected from employers for New Zealand’s taxation system through IRD’s Employer Monthly Schedule (EMS) • Information available from EMS • Employer/employee name and IRD number • taxable earnings for work performed taxed at source of income • tax deductions (pay-as-you-earn or PAYE, withholding tax, child support payment, student loan indicator amount) • start and finish dates of employment

  7. LEED – additional details • Also includes payments made to beneficiaries by the government • Contains a subset of the self-employed

  8. LEED – additional details (cont’d) • Collection unit - the legal entity that files the EMS return • Statistical unit – or the ‘employer’ in LEED is the geographical or physical location of the business

  9. Methods of integration in LEED Figure 1. Unit record links in LEED

  10. Linking employer to enterprise Figure 1. Unit record links in LEED

  11. Linking employer longitudinally Figure 1. Unit record links in LEED

  12. Linking enterprise and geo longitudinally Figure 1. Unit record links in LEED

  13. Linking employee longitudinally Figure 1. Unit record links in LEED

  14. Variables edited in LEED • IRD numbers • Gross earnings • Date of birth • Sex • Workplace of an employee • Start and end dates of employment • Editing strategy: Do not replace any IRD data unless there is strong evidence it is an error

  15. Variables edited in LEED (cont’d) • IRD numbers • Imputation of sex • Imputation of start and end dates of employment

  16. Variables edited in LEED (cont’d) • Gross earnings • Presence of systematic errors • Detection method – use of ratio edit: PAYE/gross earnings • Imputation method • Date of birth • Presence of systematic errors • Detection method – edit rules based on an employee’s age against some events • Imputation method

  17. Variables edited in LEED (cont’d) • Imputation of workplace of an employee • Uses transportation method, where • the imputed workplace of an employee is the geo that minimises the distance between an employee’s home address to the geo, subject to the constraints that • each employee is assigned to a geo and • the total number of employees allocated to a geo should equal the number of employees expected from the geo

  18. The IDI prototype Datasets linked to LEED • Benefit data • Tertiary education data • Administrative tertiary education data and student loans and allowances data • Statistics NZ’s Household Labour Force Survey (HLFS) and its supplementary surveys

  19. The IDI prototype (cont’d) Other linked dataset in IDI • The Longitudinal Business Database (LBD) prototype • includes information on business demographics, financial data, employment, goods exports, government assistance, and management practices

  20. The IDI prototype (cont’d) Figure 2. Linking in the IDI prototype

  21. Issues in linking in the IDI • Lack of a common identifier across datasets • Main variables in the Central Linking Concordance (CLC) • IRD numbers, passport numbers, and student ID, where available • Use of demographic variables as partial identifiers

  22. Issues in linking in the IDI (cont’d) • Need for a standard software for automated data linkage robust to data changes • Timing of receipt of data

  23. Editing strategy in the IDI • Focus on ensuring high-quality linking variables are used in linking. Examples: • Validity rules were used to edit names across data sources • Sex and date of birth are reformatted to ensure common coding is used across data sources • Where inconsistencies occur in records linked from two different data sources, it is important to know which of the two data sources is more reliable

  24. Editing strategy in the IDI (cont’d) • Process to resolve inconsistencies in personal details • Most common value present in the datasets should be kept • Prioritise the data sources to determine the order of retaining their values

  25. Editing strategy in the IDI (cont’d) Editing strategy should be able to • Edit inconsistencies from the same unit from different sources • Treat erroneous and missing variables in a record • Ensure consistency in variables across a record for a time period and over time

  26. Next steps • Build of the IDI with a focus on improving the linking methodology • Determine standard quality measures for outputs produced using administrative data

  27. Next steps (cont’d) • Redevelopment of LEED and SLA systems • Investigate the use of geospatial information to improve the employee allocation method • Review of the editing of gross earnings • Investigate the use of Banff

More Related