1 / 18

StatLine 4 metadata implementation

StatLine 4 metadata implementation. Edwin de Jonge Statistics Netherlands. What is StatLine?. StatLine is online output database of Statistics Netherlands. Primary output channel Contains all published data Current size: 1500 data cubes, 2 billion data cells, over 150 million facts

dtanya
Download Presentation

StatLine 4 metadata implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands

  2. What is StatLine? • StatLine is online output database of Statistics Netherlands. • Primary output channel • Contains all published data • Current size: 1500 data cubes, 2 billion data cells, over 150 million facts • Contains much functionality, including very good search engine

  3. StatLine in Bussiness Architecture • StatLine in statistical process

  4. What is StatLine 4? • Redesign current StatLine 3 dissemination software: • Reasons redesign: • Improve coherence • Changing publication policy • Handle time dependence • Archiving • Many new features

  5. StatLine coherence • Ideally: StatLine coherent & consistent • Currently (StatLine 3): • 1500 independent data cubes • StatLine 4: • Data cubes share metadata: • centrally moderated, quality improvement • Data cubes share data: • Each fact stored once.

  6. StatLine 4 metadata management • Metadata management centralized: • What? Conceptual metadata: • Classifications • Variables • By whom? Two organization units: • Coordination: Maintaining structure and meaning of classifications • Dissemination: Textual editing and translations • Data producers own data, but not meta data. • Result: Every fact in StatLine 4 uses central classifications.

  7. StatLine in Bussiness Architecture • StatLine in statistical process

  8. Classification status • In StatLine 4 each classification has status: • (Inter)national standard • Coordinated • within Statistics Netherlands • Shared • Shared but not coordinated • Private • Can only be used by 1 data cube • Only during conversion • This status is used for coordination purposes.

  9. Cristal model: • StatLine 4 uses Cristal model • Model for classifications and variables (Van Bracht et al.) • Focus on Conceptual and Value domain (ISO 11179) • Model elements: • Category (value): • value of variable, creates subpopulation. e.g.: male (gender: male) • Can be part of other category (partial order) • Level: • set of disjoint categories • Equals “flat” classification

  10. Cristal model (2): • Hierarchy: • Sequence of levels (total order) with contained categories • Every category in hierarchy has 1 parent in higher level • Equals “hierarchical” classification • Classification: • set of hierarchies with contained levels and categories • Equals: Family of hierarchical classifications.

  11. Cristal model (3) • Classification versioning • Each metadata object has lifetime (begin and end date) • Each metadata object can have a predecessor and successor • Models versions of categories, levels and hierarchies.

  12. Cristal model (4) • Multilingual • All textual properties are multilingual • E.g. Mannelijk (dutch) -> Male • All metadata and tables can be shown in each defined language • All textual properties have popular versions • E.g. Consumer Price Index -> Inflation • All metadata and tables can be shown in “popular” or “expert” mode • Object class: • Is stored, but not coordinated (yet)

  13. StatLine 4 conversion • All content current StatLine must be converted • From 1500 independent cubes • To 1500 coordinated cubes • Conversion means coordination! • Total coordination -> very long conversion • No coordination -> no added value • Ergo: Partial classification coordination

  14. Conversion strategy (1) • Strategy: • Coordinate standardized metadata • Allow non standards for 2 year period • Phased conversion • Preparation, conversion, coordination

  15. Conversion strategy (2) • Preparation phase: until June 2006 • Collect and store standard classifications • E.g. Time, Region (50 versions), Age, Marital status, Sex, NACE • Including variations (disclosure control) • For each data cube • Check usage standard classifications • Non standard is marked “private” • Define StatLine 4 structure

  16. Conversion strategy (3) • Conversion phase: (June 2006) • Convert data cube • Add missing meta data to metadata server • Check conversion • Coordination phase (November 2006) • After conversion: StatLine 4 contains coordinated and private metadata • In two years time all private metadata must be replaced with coordinated metadata

  17. Benefits metadata StatLine 4 • Coordinated classifications and variables • Uniform naming and description • Standard/coordinated metadata can be downloaded • Better comparability of data • Better search results

  18. Future improvements • StatLine 4.1 • Centralize population (object class) management: • E.g.: person, enterprise • Model populations and subpopulations • Statistical process • Centralize: • process metadata • quality metadata.

More Related