Metadata projects and tasks at Statistics Finland

Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönensaija.ylonen@stat.fi

Organizational chart Saija Ylönen

Co-operating parties of the metadata tasks: organizational units • IT Management • situated in the Secretariat of the Director General • co-ordinates the general information architecture, of which metadata tasks form one element • Classification and Metadata Services • situated in the IT and Statistical Methods department • operational unit • active role in developing of metadata • Dissemination Services • situated in the IT and Statistical Methods department • develops the metadata connected with the dissemination Saija Ylönen

Metadata Co-ordination Group • Originally a co-operation group for persons working with metadata issues in the support function departments of SF • The objective at present is to intensify the co-operation between the statistics departments and the parties responsible for general metadata work • Comprised of members working on metadata and permanent members from all statistics department • Goal is to widen knowledge about metadata and metadata systems and to give an opportunity to the statistics departments to discuss their metadata needs with metadata specialists Saija Ylönen

CoSSI Steering Group and CoSSI model • Foundation for the metadata system • Modular, xml-based model for describing statistical tables, classifications, concepts, variables, general information on statistical documents, and quality, etc. • Expandable • CoSSI Steering Group is in charge of mastering and developing the model according to user needs in a manner that will not expose its main structure to risk Saija Ylönen

Definition of metadata • 1) Statistical metadata • variable and data descriptions • classifications, concepts • 2) Statistical data quality • quality reports • statistical method descriptions • 3) Metadata of statistical documents or products • producers • publication information • field or subject area Saija Ylönen

Definition of metadata II • 4) Process metadata • a) technical metadata • technical metadata guide the workflow of data production, makes it possible to follow data production and documents the working process. • b) conceptual process metadata • technical information of data and variables which are used in producing data. E.g. minimum or maximum values, various calculation rules or use of certain classification values Saija Ylönen

Metadata systems at Statistics Finland Saija Ylönen

Metadata systems: present situation • We are in a transitional phase from relational databases to an xml-based environment • Relational databases: classifications, concepts and definitions, archiving database • Xml database eXist: publications, classifications, concepts, data descriptions Saija Ylönen

Relational databases • Built in the 1990’s • Used in statistics production but not in all statistical processes or all statistics • Classifications in the relational databases are used in SAS and Superstar • Archiving database is in use in the archiving process • Classifications and concepts are generated from the relational databases to the web pages Saija Ylönen

XML database • At the moment, the xml database is used mostly in the creation of publications with an Arbortext word processor • Classifications and concepts are copied to the xml database from the relational databases and are ready to use • Tools for utilising metadata objects from the xml database are being constructed • The first metadata tool linked to the xml database is the variable editor Saija Ylönen

Variable editor • For creating and maintaining the descriptions of statistical data and variables • At the testing phase • Implementation begins in 2010 • Descriptions are saved as xml documents conforming to the CoSSI model in the eXist/xml database Saija Ylönen

Content and functions of the variable editor • Data descriptions are comprised of a general description of the data, a list of variables and information about an individual variable • General data description includes descriptive information on the entire data document • Variable list interleaf allows management of the list of variables in the data description and selection of the variable whose description needs editing. Saija Ylönen

Variable list interleaf Saija Ylönen

Variable metadata Saija Ylönen

Results from the variable editor project In addition to actual variable editor application the project also created preconditions for: • the development of a consistent information architecture • the construction of production applications in which metadata need not be separately produced or manually added to data when publishing or archiving statistics • information service where excessive time need not be spent on searching for metadata, or on actual reproduction of metadata for special compilation assignments • a system from which table column and row headings can in tabulation applications be retrieved in multiple languages for all statistics using the same methods. Saija Ylönen

Experiences gained during the variable editor project • Various questions concerning standardisation had to be addressed in the project although they were not originally in the projects’ scope of task – they had to be done and they took a lot of time • Because the variable editor project was the first leg in the revision of the metadata system it was subjected to a diversity of expectations • Project was a good test run for the CoSSI model – the data content of the model proved to be exhaustive Saija Ylönen

The planning and building of a classification editor • Reasons for the renewing of the classification system: • the present way of maintaining classifications has been viewed as inflexible by statistics • renunciation of the Sybase relational databases • ICT strategy: in the next few years the agency will introduce a common statistical metadata system based on the CoSSI model • Classification editor project 2010 • 1) definition stage • 2) construction stage Saija Ylönen

Goals of the classification editor project • Analyse the service needs required from a centralised classification system • Create maintenance tools for classifications in connection with the CoSSI/eXist metadata store so that the basic maintenance needs of classifications of individual statistics are met in a user-oriented manner which also allows further development of the classification system • Produce the solutions with which the interoperability of the Sybase classification database and the eXist metadatabase can be ensured • Compile user instructions for the editor • Pilot test the editor Saija Ylönen

Benefits of the new classification system • A classification system which serves well will encourage centralised and structured maintenance of classification • The documentation of classifications will improve, making them easy to find for use in-house and for the provision of information service • The new classification system will support smooth movement between data descriptions, variable descriptions and maintenance of classifications and thus improve the efficiency of the maintenance and use of classifications in statistics Saija Ylönen

General benefits of the common classification system • A centralised classification system eases the workload needed to maintain classifications because classifications are only maintained in one place • Reduces the possibility of errors because classifications are documented in the system consistently so that they are accessible to everybody and easy to find • Improves the efficiency of time use because working hours need not be spent on looking for classifications and trying to find their background information • Makes the classifications used in different statistics visible to everybody and thus creates possibilities for their harmonisation Saija Ylönen

In conclusion: Why do some statistics departments still have their own metadata systems instead of using the centralized system? • Centralised metadata work progresses too slowly from the perspective of individual statistics – We should rethink our construction and implementation strategy • Common attitude still regards the process of an individual set of statistics as unique, and therefore incapable of exploiting systems that are meant for all statistics – We have to get quick results to prove the benefits of the system • Commitment by the Management and their support to the work is crucial – We have to convince them Saija Ylönen

Thank you for your attention! Saija Ylönen

Metadata projects and tasks at Statistics Finland

Metadata projects and tasks at Statistics Finland

Presentation Transcript

Social Statistics - Integrated Use of Survey and Administrative Data at Statistics Finland

Metadata Standards and Official Statistics

Standards Based Metadata Usage at Statistics Denmark and Statistics New Zealand

Tasks, Abilities, and Term Projects

Metadata for Digital Projects

Statistics (and SoTL Projects)

Renewal of Editing Practices at Statistics Finland

Kirsti Pohjanpää Head of Statistics Statistics Finland

Metadata management and statistical business process at Statistics Estonia

Performance management and IT governance at Statistics Finland

Entrepreneurial projects in Finland

Metadata management at Statistics Canada

Remote access at Statistics Finland

Study of Editing and Imputation Practices at Statistics Finland

Statistics – all meetings Case Finland

Source: Statistics Finland

Use of administrative data at Statistics Finland

Metadata: SCHEMAS and other European projects

Metadata: SCHEMAS and other European projects

Improving Statistical Literacy at Statistics Finland

Strategic Intelligence - in Statistics Finland

Metadata Standards and Official Statistics