1 / 22

Determination of Administrative Data Quality : Recent results and new developments

Determination of Administrative Data Quality : Recent results and new developments. Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands May 6, 2010, Helsinki, Finland. Overview. Introduction View on quality Framework developed for admin. data sources

emmly
Download Presentation

Determination of Administrative Data Quality : Recent results and new developments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determination of Administrative Data Quality: Recent results and new developments Piet J.H. Daas, Saskia J.L. Ossen, and Martijn Tennekes Statistics Netherlands May 6, 2010, Helsinki, Finland

  2. Overview • Introduction • View on quality • Framework developed for admin. data sources • Construction and composition • Application (first part) • Checklist and results • New developments • Ideas and future work • BLUE-ETS

  3. Introduction • Statistics Netherlands increases the use of data (sources) collected and maintained by others • To decrease response burden and costs • As a result, Statistics Netherlands becomes: • More dependent on administrative data sources • Must be able to monitor the quality of those data sources • What is ‘quality’ in this context?

  4. View on quality • Statistics Netherlands defines quality of administrative data sources as: “Usability for the production of statistics” • Differs from ‘quality’ as used by the data source keeper • Often does not have statistical use in mind • Can’t use the quality report of the data source keeper (if available) • And it is quality of the input !

  5. Framework developed • No standard framework available for input quality of administrative data sources • Quality of administrative data is only occasionally observed in the literature • Majority of studies on quality and statistics focus on: • output quality • quality of survey data • Framework for the determination of the quality of administrative data sources based on: • Statistics Netherlands experiences and ideas • Including the results published by others

  6. Framework overview (1) • Many quality indicators were identified • In total 57! • Many dimensions were identified • In total 19 • How to combine and structure these indicators? • Distinguish different views on quality • Alternative name is Hyperdimensions • 3 Hyperdimensions were required to combine all quality indicators into a single framework !! • First step towards a structured approach

  7. Framework overview (2) • Three high level views on the input quality of administrative data sources • 3 hyperdimensions data source

  8. 3 Different high level views on quality Framework overview (2) • Three high level views on the input quality of administrative data sources • 3 hyperdimensions data source

  9. 3 Different high level views on quality METADATA: Focuses on the (availability of the) information required to understand and use the data in the data source SOURCE: - Focus on data source as a whole - Delivery related aspects - and some other things METADATA DATA SOURCE • DATA: • Technical checks • Accuracy related issues data source

  10. Determine Source and Metadata quality • With a checklist • Used for both Source and Metadata • Tested 8 administrative data sources • Took on average about 2 hours per data source • Results expressed at the dimensional level • 5 for Source, 4 for Metadata

  11. Checklist results (1) - Source +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration

  12. Checklist results (2) - Metadata +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; 1FigHE: coordinated register for Higher Education SFR: Student Finance Register; 1FigSGE: coordinated register for Secondary General Education CWI: register of Centre for Work and Income; NCP: National Car Pass register ERR: Exam Results Register; MBA, Dutch Municipal Base Administration

  13. Overall conclusions • Data sources • CWI only negative scoring data source • Tempted to recommend not using it! • Result of delivery issues and vague definitions • However, it is the only administrative data source that contains educational data on the non-student part of the population! • Solve the weaknesses!! • Other data sources • Quite OK (there are always some things you can improve) • Data processing by data source keeper needs attention • Checklist • Good way to assist the user, quite fast • Quality information on a basic but essential level • Not all information is commonly known!

  14. What about the Data hyperdimension • How to study data quality? • A draft list of indicators is available • 10 dimensions and 26 indicators • A structured approach needs to be developed! 1. Data inspection should be efficient 2. Assist user with scripts/software (were possible) • ?A checklist?

  15. Overview of data quality approach

  16. Data: Technical checks • Very basic • For RAW data • Should be easy and quick • No other info required! • Examples • File size • Number of (unique) units / records received • Metadata compliance (standard for XML-files) • Visual checks (Data fingerprinting) • 2 examples

  17. Technical checks: Visualization examples • Missing data • ‘Data fingerprinting’

  18. Data: Accuracy related indicators • First true indicators in the process • Information from other data sources is required • Examples of indicator for units • Over coverage indicator • Units in source not belonging to NSI-population • Under coverage indicators • Missing units • NSI-population units not in source • Selectivity • Representativity of units in data source compared to NSI-population (RISQ-project) • Linkability indicators • Correct, incorrect and selectivity of linked units

  19. Data: Output related indicators • Report data quality on an aggregated level • Quality of the output! • Need to link input quality to output quality • Examples of indicators: • Precision of estimates of core variables • Selectivity of core variable totals

  20. How to report data quality ? • ‘Quality Report Card’ • paper / computerized version • Place were all results are combined and orderlypresented • Which indicators always? • Is there a basic/minimum set? • Hierarchy of quality indicators • Which indicators can be automatically determined? • Create standardized scripts • Create a software prototype

  21. Future plans • Fully focus on Data hyperdimension • Is a lot of work! • Study this in a European context • BLUE-Enterprise and Trade Statistics project • 7th Framework program • From 1-4-2010 till 31-3-2013 • One of the topics is the study of admin. data quality • This topic is studied jointly by he NSI’s of: Netherlands, Italy, Norway, Slovakia, Sweden

  22. Thank you for your attention! • More details in the Q2010-paper • Checklist can be obtained • From the Statistics Netherlands website • by mailing pjh.daas@cbs.nl and request a copy

More Related