1 / 23

Model of transformation administrative data to statistical data

Model of transformation administrative data to statistical data. Data used in Population and Housing Census 2011 – examples . Janusz Dygaszewicz and Paweł Murawski Cent r al Statistical Office POLAND. Purpose of the work on administrtive sources Data quality Extract data

billie
Download Presentation

Model of transformation administrative data to statistical data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model of transformation administrative data to statistical data Data used in Population and Housing Census 2011– examples Janusz Dygaszewicz and Paweł Murawski Central Statistical Office POLAND

  2. Purpose of the work on administrtive sources • Data quality • Extract data • Transform data • Summary Outline

  3. Registers - data acquisition

  4. Obtaining a sufficiently complete data set –subjective and objective completeness corresponding to classification standards, definitions and basic categories, and thus the effective use of administrative data Purpose of the work on administrative data

  5. Measuring the quality of administrative registers • timeliness of data • methodological compatibility • completeness • identification standards used in the registry • usefulness • compatibility of data in administrative sources to data obtained in the study/survey • Measuring the quality in processing of data registers • excessive coverage error rate • incomplete coverage error rate • subjective indicator of completeness • objective indicator of completeness • imputation rate • data correction index • integration data from various sources index Data quality-measures-

  6. consolidation data fromvarioussource systems; different data format, • extract data intotheproduction environment based on the SAS software, • converting data into one format thatissuitable for processing – SAS tables, • validate of imported data structureis an integral part of thisprocess . Extract data

  7. Extract data-examples-

  8. Data processingintheproduction environment consisting of: • profiling – create a raport on the data quality, • unification/standardization of data, • parsing (separation) orcombiningvariables, • standardizationwithschemes, • conversion, • validation, • deduplication, • data integration. Transform data

  9. Transform data- profiling-

  10. Transform data- standardization and parsingexamples-

  11. Transform data- schemesexamples-

  12. Transform data- exemples: report data cleaning -

  13. Transform data- conversion: gendervariables

  14. 1 – bachelor 3 – married (M) 1 bachelor 3 married (M) 502 – bachelor 503 – married (M) KWR – bachelor Transform data- conversion: marital status variable- ZNY - married (M)

  15. checkingthe data, • correctingabnormalvalues, according to thealgorithmsprepared by methodologists, • eventualexclusionfromfurtherprocessingrecordswhichimprovementisimpossible. Transform data-validation-

  16. removal of repeatedunits, • requiresdetailedanalisys, includingalalysis of legal acts • individual for each register, • result of deduplication – one recordwithallthepossible and uniqueinformation. Transform data- deduplication -

  17. Transform data-expamle of deduplicationprocess-

  18. process of selection of thebest, most current and correct value of severalor a dozen of registers • Used to create a statistical register, which will be available for use by analysts. Transform data-dataintegration-

  19. REGISTER OF REFERENCE DATA INTEGRATION ONE ID LINKING MULTIPLE IDENTIFIRES A Register ALTERNATIVE LINKING KEYS B Register SELECTING ALGORYTHMS SELECTING THE BEST VALUES Transform data-intergationprocess – scheme- STATISTICAL REGISTER C Register DATA COMPLETENESS

  20. kraj_ur_kod_KEP # • not null • msce_ur_kod_POBYT • # not null • kraj_ur_kod_GZM # not null • Kraj_ur_kod • select • kraj_ur_kod_GZM • select • kaj_ur_kod_POBYT • select • kraj_ur_kod_KEP • FALSE Transform data-dataintegration: example of algorythm

  21. Data integration-example of process-

  22. Commondifficulties: • poorquality data, missingvalues, duplicates, • conflicting data, • technical: size of theregisters, time-consumingprocess. • Benefits: • obtainrelevent, useful, accurate data • improvethequality of theoutput data. • selection of thebestvariablesfrommultipleregisters, Summary

  23. Thankyoufor yourattention www.stat.gov.pl

More Related