1 / 28

ESSnet on Datawarehousing - the business register

ESSnet on Datawarehousing - the business register. Pieter Vlag – Statistics Netherlands. Outline of the presentation. DataWareHouse and importance population frame relationship population frame - business register - (default) target population, statistical units

saad
Download Presentation

ESSnet on Datawarehousing - the business register

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESSnetonDatawarehousing- the business register Pieter Vlag – StatisticsNetherlands

  2. Outline of the presentation • DataWareHouse and importance population frame • relationship population frame - business register • - (default) target population, statistical units • other crucial datasources: “backbones” • - turnover + employment • datalinking : the statistical unit base • conflicting information between datasources • - when correcting in statistical DWH • - when correcting in backbones • - when feedback to business register ESSnet DWH – business register 1

  3. Definition of a statistical Datawarehouse (according to the FPA) The broad definition of a data warehouse to be used in this ESSnet is therefore: ‘A common conceptual model for managing all available data of interest, enabling the NSI to (re)use this data to create new data/new outputs, to produce the necessary information and perform reporting and analysis, regardless of the data’s source.’ ESSnet DWH – business register 2

  4. A DataWarehouse: the general idea • As stagingarea is “core business” forNSIs, • term statistical DWH is usedforstagingarea + WareHouse ESSnet DWH – business register 3

  5. The statistical DataWarehouse: architecture and layers ESSnet DWH – business register 4

  6. The statistical DataWareHouse:processing steps the GSBPM model 5.1a: link data 5.1b:integrate data seepresentationFursova input DWH / int. data process Calculateaggregates ESSnet DWH – business register 5

  7. GSBPM -step 7 Output 3 Output 1 Output 2 6 Integrated data 5.7 p.analyse p.analyse 5.2 - 5.6 Processing (integrationlayer) Linking 5.1 4 datasource 1 datasource 2 datasource 3 Titel van de presentatie

  8. A datawarehouse without population frame Datasource I: Survey 1 Datasource I: Admin data Datasource I: BIG DATA Datasource I: Survey 2 • different sourcescover different enterprises -> informationabout ? • timing of availabilitysourcesdiffers -> when complete desc. available ? ESSnet DWH – business register 7

  9. A Datawarehouse with a population frame Datasource 4: survey 2 Population. Datasource 1: admin data 1 Datasource 3: survey1 Datasource 2: BIG DATA ADVANTAGE: the coverage of DWH is known (e.g. whichenterprises are included in a DWH) ESSnet DWH – business register 8

  10. Units and target population • The population should be known for the • datawarehouse; e.g. “about which enterprises info” • its preparation phase ; e.g. when linking data sources • Challenges are: • units may differ between the data sources • - decision: which unit used for linking • what is the reference population • - decision: how is the default target population defined ESSnet DWH – business register 9

  11. Proposals • Only statistical unit (=enterprise) is used • - for data-linking • - in processing phase of the statistical - DWH • - justification: most obvious, ESSnet on Consistency, maintenance • Default target population : • all enterprises with economic activity in reference period (e.g. year) • - justification: SBS-regulation • - widest definition of enterprises from which flexible outputs for subpopulations can be derived • - term default is used: as subpopulations do have a target population, too ESSnet DWH – business register 10

  12. GSBPM -step 7 6 flexible output for different populations, and units Integrated data 5.7 Weightingto flexible pop. 5.2 - 5.6 Processing on stat. unit + default target populationonly Linked data 5.1 flexibledatasourceswith different populations and units 4 Titel van de presentatie

  13. Population frame and the Business Register • Determination of the default target population in SDWH in 2 steps: • the population frame, i.e. a list of enterprises with a certain kind of activity during a period. • confirmationwhich enterprises of the list really performed economic activities during a period • The business register provides information for the population frame. • Therefore, the statistical Business Register is an indirect datasource for the statistical-DWH ESSnet DWH – business register 12

  14. Informationneededfromstat.business register • Recommended information for the population frame : • the frame reference year • the statistical enterprises unit, including national ID and EGR ID • the name and address of the enterprise • the national identification number (ID) of the enterprise • the date in population (mm/yr) • the date out of population (mm/yr) • the NACE-code • the institutional sector code • a size class ESSnet DWH – business register 13

  15. Otherbackbones • ESSnetAdminData: VAT and socialsecurityadmin • almost complete forquarter and annual • canbeusedforhigh-qualityestimatesforturnover + employmentrespectively. • ESSnet DWH: VAT and socialsecurity data are crucial • to confirm the activity status of enterprises • implictly to determine the default target population • to integrate data suitable for flexible outputs • measurement errors are reduced of sample survey (or data about subpopulation) if weighting to pop.numbers + VAT-turnover + employment • Proposal: to include these admin data as backbones in a stat-DWH ESSnet DWH – business register 14

  16. Backbones in a statistical-DWH Access layer GSBPM 7-9: disseminate Integrationlayer Int. + Analyses layer GSBPM 6: analyse / “DATAWAREHOUSE” GSBPM 5.7-5.8: calculateaggregates GSBPM 5.2-5.6: “process” Check processing GSBPM 5.1: link & integrate data 1 data 2 Sourcelayer Pop-frame VAT empl. Backbones are crucialfordata-linking and data-integration; ->need to bechecked/cleanedbysource in the sourcelayer SBR

  17. Observed: admin data incorporated in BR GSBPM 7-9: disseminate Integrationlayer Int. + Analyses layer GSBPM 6: analyse Whenchoosingthisoption, • important part oflinkingprocessoutside the S-DWH • unless S-DWH integral part of S-DWH (maintenance ?) GSBPM 5.7-5.8: calculateaggregates GSBPM 5.2-5.6: “process” Check processing GSBPM 5.1: link & integrate data 1 data 2 Pop-frame Sourcelayer VAT SBR empl.

  18. Determiningdefault target population • Ifstatistical-DWH covers annualstatisticsonly • relatively straightforward • - derive population frame from business register at the end of reference year t • determine active or non-active as soon as VAT and/or employment data become available • If STS included in statistical-DWH more complicated: • - updating necessary ! ESSnet DWH – business register 17

  19. Updating population ESSnet DWH – business register 18

  20. The largestenterprises output 2 output 1 output 3 GSBPM 5.7-5.8: calculateaggregates “DATAWAREHOUSE” If a team within a NSI produces consistent microdata forlargestenterprises -> considerthissource as backbone GSBPM 5.2-5.6: “process” Check processing GSBPM 5.1: link & integrate Pop-frame VAT empl. L.E. data 1 data 2 SBR

  21. Units: idealsituation • enterprise has a unique ID • enterprise group has a unique ID • enterprise and enterprise group • correspond with statistical definitions • are used in all data sources • In practice more complex situations do exist • (especially when using more admin data) ESSnet DWH – business register 20

  22. GSBPM -step Keyquestion: how to manage these different in- and output units and theirrelationships to the statistical unit 7 6 Flexible output for different populations, and units Integrated data 5.7 processing onone unit + onepopulationonly 5.2 - 5.6 Linked data 5.1 4 flexibledatasourceswith different population and units Titel van de presentatie

  23. INPUT IN S-DWH processing OUTPUT Legal unit KAU “Accountìng” unit LKAU “VAT-unit” Local unit ENTERPRISE GROUP ENTERPRISE (=statistical unit) Enterprise “othertax” units Enterprisegroup enterprise other units ESSnet DWH – business register

  24. The unit base • Someremarks: • Complexity of unit base depends on • - scope of statistical-DWH • national legislation (practices) with respect to enterprise units • Unit base closely related to Business Register. Main motivation to place this base outside the Business registers • - more flexible in case of new in- and outputs • - more transparent in case of linking errors ESSnet DWH – business register 23

  25. Position of Business Register in stat -DWH output 2 output 1 output 3 GSBPM 5.7-5.8: calculateaggregates “DATAWAREHOUSE” GSBPM 5.2-5.6: “process” Check processing GSBPM 5.1: link & integrate tax Pop-frame BIG DATA L.E. survey other units SBR VAT empl.

  26. Feedback to Business Register • In case of conflictinginformationbetweendatasources and conclusion is influentialerror in backbones (and indirectly SBR) • When incorporating corrections in statistical DWH ? • When incorporating corrections in backbones ? • When incorporating corrections in SBR? ESSnet DWH – business register 25

  27. Correction of information output 2 output 1 output 3 “DATAWAREHOUSE” GSBPM 5.7-5.8: calculateaggregates In SDWH: corrections at 5.6 GSBPM 5.1-5.6: “process” Check processing GSBPM 5.1: link & integrate other survey units Pop-frame VAT empl. L.E. In backbonesthemselves: timing most important revisions SBR In SBR: after end of year (forconsistency) – exception major impact

  28. Conclusions • Requirementsforstatistical-DWH • Population well defined • Use of one unit in processing • Backbonesdesiredfor • populations, VAT-turnover, admin data employment, largeenterprises • Business Register is indirect input forstatistical DWH • population frame, unit base, survey • Timing of correctionserrors (backboneinformation) • in DWH: beforeweighting • in backbone: whenrevising • in Business Register: end of year ESSnet DWH – business register 27

More Related