1 / 26

Case Studies: Statistics Canada (WP 11)

This case study discusses the use of an Integrated Metadatabase (IMDB) to support the interpretation, dissemination, and management of statistical data. It explores the role of metadata in each phase of the statistical cycle and highlights the organizational and design issues associated with implementing an IMDB. The study also outlines the relationship between the IMDB and survey planning, dissemination, aggregation and analysis, and management systems.

monroen
Download Presentation

Case Studies: Statistics Canada (WP 11)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Studies:Statistics Canada (WP 11) Alice Born alice.born@statcan.caStatistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

  2. Outline • Overview • Statistical metadata systems and the statistical cycle • Statistical metadata in each phase of the statistical cycle • Systems and design issues • Organizational and cultural issues

  3. Overview of Integrated Metadatabase (IMDB) • To support interpretation of the data – dissemination phase • Responsibility of Standards Division (metadata, classifications and standard definitions) • Adherence to Policy on Informing Users on Data Quality and Methodology, Policy on Standards and Quality Assurance Framework • In general, metadata goes back November 2001

  4. Overview of Integrated Metadatabase (IMDB) • Contains metadata on 350 active and 250 inactive surveys and statistical programs • Purpose • Methodology used to produce the data • Measures of data accuracy • Variables, classifications for the data • Location of clean master datafile • Contacts • Survey managers cannot release data without the prescribed metadata – mandatory

  5. Overview of Integrated Metadatabase (IMDB) Next priorities: • Complete documentation of variables • Complete questionnaire model • determine metadata for archived datafiles – may require additional metadata Lessons learned: • Opportunities in collecting metadata in the first phase of the statistical cycle – not at the time of dissemination

  6. Statistical metadata systems and the statistical cycle Relationship with survey planning and design phase • IMDB expanded its role as part of the Household Survey Content Harmonization • Standardize concepts, questions, question blocks across household surveys • Variables follow the ISO-IEC 11179 • Questions and question blocks, associated response choices linked to variables and classifications are stored in the IMDB at the beginning • Survey Specification Manager pulls metadata from the IMDB but contains specifications and code

  7. Statistical metadata systems and the statistical cycle Relationship to dissemination systems • Metadata for information modules on the STC website – mandatory • Information for survey respondents – requires metadata prior to release of data • Data Liberation Initiative – public-use microdata files documented in DDI • Metadata to support data exchange – SDMX, DDI, XBRL, Wiki, HTML, etc….

  8. Statistical metadata systems and the statistical cycle Relationship to aggregation - analysis phase • Analytical datawarehouses use IMDB to organize their tables (variables and classifications) Relationship to archive phase • IMDB contains location of master datafile, record layout, contact information • Currently developing business rules for archived datafiles

  9. Statistical metadata systems and the statistical cycle Relationship with management systems • Software Register – registry of Agency’s software and applications organized by survey and statistical program – IMDB is the inventory • Quality management assessment and questionnaire – based on inventory of surveys in the IMDB; reuse of existing metadata

  10. IMDB in the survey life cycle Data Warehouses Operations Management Quality Assurance Analysis Dissemination IMDB IMDB Metadata Design Collect Edit Estimate Tabulate Publish Archive Operational Data Registers Survey Data Administrative Data Operational Data Stores

  11. Statistical metadata for phases in the statistical cycle Metadata describing statistical business processes • Data dissemination for interpretation of data • IMDB serves as the corporate inventory of all surveys and statistical programs, questionnaires, master datafiles • metadata or paradata resides in other metainformation systems – SSM, IQMS

  12. Statistical metadata for phases in the statistical cycle Metadata for data elements • Supports: Survey planning and design; Analysis; Dissemination; Archiving • Metadata objects tracked over time for changes (versioning) and validity (registration) • Output to online data tables and STC products • For discovery – inventory of DE on STC website and STCWiki (internal review before going public) • Links to questions, question blocks, datafiles

  13. STCWiki – Type of marital status of person

  14. Statistical metadata for phases in the statistical cycle Metadata for survey planning and design • Questions, standard questions blocks and standard response choices in IMDB • Mapped to value domains, data elements and surveys in the IMDB • These metadata assembled into collection instruments in other metainformation systems outside the IMDB

  15. Systems and design issues • IMDB started in 1998 • Phase 1 Consolidation of existing metadata stores • Phase 2 Metadata describing statistical business processes • Phase 3 Metadata for data elements, etc. • MetaStat system – Statistical activity, survey, instance, frame, universe, instrument, datafiles, survey methodology, documentation, data accuracy • MetaWeb system – object class, property, data element, value domain, question, response choices, question block, value meaning manager

  16. Phase 2 Input Screens Text strings related to data components Directives Resource Bundle Key Value SurveySDDS Statistical Data Doc… …... IMDB database Labels Resource Bundle KeyValue SurveySDDS SDDS …...

  17. Phase 2 Input ScreenAdministered Item

  18. Phase 2 - Identification Tab

  19. Systems and design issues Dissemination and information discovery systems • Web publication from IMDB is through HTML, dynamically generated with Perl scripts • Conforms to government standards – CLF • Survey-centric view and developing DE-centric view • Discovery from Wiki solution – non-linear view of Phase 2 and 3 metadata • Allows users to view links among administered items in the IMDB

  20. Organizational and cultural issues • Information management • Assist in harmonization / usage of standards • Knowledge sharing • Corporate memory • Reuse of our metainformation assets

  21. IMDB Knowledge Sharing/Corporate MemorySurvey Life Cycle Design Collect Edit Estimate Tabulate Publish Concepts (Object Class, Property, Data Element Concept) Data Elements Questions Questions Blocks Classifications (Conceptual Domain Value Domain) Survey Universe Frame Instance Collection Instrument Methodology Data Files Enterprise Architecture

  22. IMDB Corporate MemoryData Files Operational Data Registers Survey Data Administrative Data Operational Data Stores Public Use Master File Archival information Clean Master File Archived Data

  23. ? Wiki SDMX HTML DDI IMDB Reuse of Information AssetsInformation Discovery/Dissemination One meta data source many uses for the information many output formats + =

  24. IMDB Reuse of Information AssetsApplications Development Classification coding Collection instrument development Publishing Other applications

  25. IMDB Reuse of Information AssetsIntegration with Data Data Warehouses CANSIM

  26. Organizational and cultural issues • STC is one of the most integrated statistical systems in the world • As part of its Enterprise Architecture strategy – moving towards centralized and generalized systems, including the IMDB • IMDB was built initially to support interpretation of disseminated data • Pressure is to provide metadata up (and down) the statistical value chain and into management systems • Opportunities at the Survey planning and design phase – reuse of existing metadata (variables, classifications, questions, etc) registered in the IMDB – coherence

More Related