Record matching for census purposes in the netherlands
Download
1 / 21

- PowerPoint PPT Presentation


  • 290 Views
  • Updated On :

Record matching for census purposes in the Netherlands. Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section Research and Development ESLE@CBS.NL

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - paul2


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Record matching for census purposes in the netherlands

Record matching for census purposes in the Netherlands

Eric Schulte Nordholt

Senior researcher and project leader of the Census

Statistics Netherlands

Division Social and Spatial Statistics

Department Support and Development

Section Research and Development

ESLE@CBS.NL

Joint UNECE/Eurostat Meeting on Population and Housing Censuses in Astana

4-6 June 2007


Contents
Contents

  • History of the Dutch Census

  • Data sources

  • Micro linkage

  • Micro integration

  • Social Statistical Database

  • Estimation aspects

  • Statistical confidentiality

  • Conclusions


History of the dutch census
History of the Dutch Census

  • TRADITIONAL CENSUS

  • Ministry of Home Affairs:

  • 1829, 1839, 1849, 1859, 1869, 1879 and 1889

  • Statistics Netherlands:

  • 1899, 1909, 1920, 1930, 1947, 1960 and 1971

  • Unwillingness (nonresponse) and reduction expenses  no more Traditional Censuses

  • ALTERNATIVE: VIRTUAL CENSUS

  • 1981 and 1991: Population Register and surveys

  • development 90’s: more registers →

  • 2001: integrated set of registers and surveys, SSD


Data sources
Data sources

  • Registers:

  • Population Register (PR), 16 million recordsdemographic variables: sex, age, household status etc.

  • Jobs file, employees, 6.5 million records,and self-employed persons, 790 thousand recordsdates of job, branch of economic activity

  • Fiscal administration (FIBASE)jobs,7.2 million records, and pensions and life insurance benefits,2.7 million records

  • Social Security administrations, 2 million records,auxiliary information integration process

  • Surveys:

  • Survey on Employment and Earnings (SEE), 3 million records,working hours, place of work

  • Labour Force Survey (LFS),2 years: 230.000 recordseducation, occupation, (economic) activity


Matching process
Matching process

  • Matching of registers and datasets to a self constructed Central Matching File

  • Records are identified by a surrogate identifier (RIN)

  • One unique table RIN-Social Security Number

  • Minimal set of identifying variables

  • Every step in the process is a deterministic match



Matching process1
Matching process

  • Social security number matchingCheck on date of birth and genderA valid match when no more than one of the variables year, month, day of birth and gender differ

  • else

  • Matching using other variables like postal code, house number, date of birth, gender All keys must match

  • else

  • Match on social security number without any control on other variables


Micro data with surrogate identifier

RIN

employment income, jobs education social security,..

RIN

RIN

RIN

YearMonthBirth, gender, municipality, civil status

de-identification table

RIN

Selection from Municipal population register

Micro data with Surrogate Identifier

production environment SN

Municipal Population Register

Micro data Services

Social Statistics Database

Micro data Preparation and documentation

Registers

Surveys

de-identified micro data

Direct Identifier

Surrogate Identifier (RIN)



Micro integration 1
Micro integration (1)

The aim of micro integration is:

  • To check the linked data and modify incorrect records,

  • In such a way that the results that are to be published are of higher quality than the original sources


Micro integration 2
Micro integration (2)

To fulfil this demand an integrated process of:

  • data editing,

  • derivation of statistical variables,

  • and imputation

    is executed


Micro integration 3
Micro integration (3)

Constraints and limitations:

  • Only variables that are to be published are micro integrated

  • Identity rules are necessary, e.g. the same variable in two sources or a relationship between two or more variables in one or more sources

  • No mass imputation


Social statistical database ssd
Social Statistical Database (SSD)

  • Social Statistical Database (SSD): Set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits

  • No remaining internal conflicting information

  • SSD set:

  • Population Register (backbone)

  • Integrated jobs file

  • Integrated file of (social and other) benefits

  • Surveys, e.g. LFSCombining element:RIN-person


Core and satellites (1)

satellite

satellite

satellite

satellite

SSD-core

satellite

satellite

satellite

satellite


Core and satellites 2
Core and satellites (2)

  • Core:

  • contains only integral register information

  • contains the most important demographic and socio-economic information

  • contains only information that is used in at least two satellites


Core and satellites 3
Core and satellites (3)

  • Satellites are produced in two steps:

  • Copying and derivation of the relevant information from the core SSD

  • Adding of the unique information on a specific theme from registers and surveys


Conclusions ssd
Conclusions SSD

  • The SSD diminishes the administrative burden

  • The SSD increases

    • The efficiency of statistics production

    • The accuracy of statistical outputs

    • The relevance of social statistics

    • The possibilities for social policy research


Estimation aspects
Estimation aspects

  • Surveys are samples from the population

  • If surveys are enriched with register information, estimations of the register part of the enriched survey will lead to inconsistencies with the counts from the entire register

  • Statistics Netherlands developed the method of consistent and repeated weighting to solve these inconsistencies


Statistical confidentiality
Statistical confidentiality

IDs Variables

Characteristics

Administrative sources

Identifiers (PINs, sex,

date of birth, address)

IDsVariables

Household surveys

PERSONS BACKBONE

full range of all persons as from 1995

IDs in sources are replaced by random

Record Identification Numbers (RINs)


Conclusions
Conclusions

  • Matching is relatively cheap

  • Matching is relatively quick (short production time)

  • Micro integration remains important

  • The SSD has found its place in the organisation

  • Repeated weighting method guarantees consistent estimates

  • Statistical confidentiality aspects have become very important


Time for questions and discussion

Thank you for your attention!

Time for questions and discussion


ad