Metadata Driven
Download
1 / 28

Metadata Driven Integrated Statistical Data Management SystemCSB of LatviaBy Karlis Zeila Vice President CSB of LatviaME - PowerPoint PPT Presentation


  • 331 Views
  • Uploaded on

Metadata Driven Integrated S tatistical D ata M anagement S ystem CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MEXSAI, Cancun, November 2 -4. The system has been developed within 2,5 years (January 2000 to July 2002),

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Metadata Driven Integrated Statistical Data Management SystemCSB of LatviaBy Karlis Zeila Vice President CSB of LatviaME' - HarrisCezar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Metadata Driven Integrated Statistical Data

Management System

CSB of Latvia

By Karlis Zeila Vice President CSB of Latvia

MEXSAI, Cancun, November 2 -4


Introduction l.jpg

The system has been developed within 2,5 years (January 2000 to July 2002),

Development has been done by outsourced company Microlink Latvia in close cooperation with the experts from CSB,

600 000 Euros has been spend for the system development,

Use of the system in CSB of Latvia started transition from Stove Pipe to Process Oriented approach to statistical data production

INTRODUCTION


Meta data driven l.jpg

Any action within the system is ruled by metadata to July 2002),,

META DATA DRIVEN ... ?

  • Meta data is the key element of the system,

  • All software modules of entire system is connected with the Core Metadata module (Meta data base).

  • Any changes within the system starts with the changes of meta data

  • Full cycle of the data processing is possible as late as the proper description process in meta data base are completed


Integrated l.jpg

Most of the system software modules are connected with the Registers module,

INTEGRATED ... ?

  • Registers module is an integral part of the system,

  • All surveys are supported by adequate classifications stored in the Meta data base

  • In all surveys respondent data fields are connected with registers data

  • All data is stored in corporative data warehouse

  • Statistical data processing has split in unified steps for different surveys

  • Export / Import procedures ensure work with the system data files using different standard software packages


Advantages and restrictions l.jpg
Advantages Registers module, and Restrictions

Advantages

1. At most standardized main business statistics data entry, processing and storage procedures, that provide the transfer from stove pipe data processing to process oriented data processing.

  • Centralized processing and storage of the statistical data, including metadata, by using data warehouse technologies and OLAP tools.

3. All the data processing procedures are being hosted from common metadata system. These procedures are being described in metadata base, by using special pseudo language and defined notation group.

Therefore for standardized procedure execution for each survey individual programming is not required.

4. The system is informatively connected with Business Register, which provides with the direct respondent data retrieval and updating.

5. Special import and export procedure is created for data exchange with other systems.

6. A link with PC Axis is created for electronic data dissemination.


Slide6 l.jpg

Restrictions Registers module,

1.The system is oriented towards the data processing of different periodicity surveys, where data collected using respondents filled questionnaires (Some adaptation would be necessary for use CAPI, CATI technologies ).

2.Metadata base does not foreseen description of confidentiality rules for data dissemination, they are hard coded in the system.

3. Diagnostic tools for the metadata descriptions are not powerful enough, therefore experts preparing meta data descriptions should be of high experience.

4.Hardware and Standard software requirements:

PC’s >/= Pentium II, RAM >/=128Mb equipped with W – 95 to W-2000 and MS Office 2000.

5.Metadata base does not foreseen description of algorithm for automatic creation of respondents lists for Sample surveys from the Business register frame.


Slide7 l.jpg

ISDMS architecture Registers module,

Integrated statistical data management system

Corporative data Warehouse

CSB Web Site

User adminis- tration data base

Dissemi-nation data base

Metadata base

Macrodata base

FIREWALL

Raw data base

Registers base

OLAP data base

Microdata base

Windows 2000 Server Advanced MS Internet Information Server

SQL server 2000,

PC-Axis

ISDMS Business application Software Modules

Data entry and validation module

related with DB:

Data aggregation module

related with DB:

Data analysis module

related with DB:

Core metadata base module

related with DB:

Registers module

related with DB:

METADATA

USER ADMINISTRATION

REGISTERS

USER

ADMINISTRATION

METADATA MICRODATAREGISTERS

USER ADMINISTRATION

METADATA MICRODATA REGISTERS

USER

ADMINISTRATION

OLAP

METADATA MACRODATA

Data dissemination module

related with DB:

Data WEB entry module

related with DB:

User administration module

related with DB:

Data mass entry module

related with DB:

Missed data imputation module

related with DB:

METADATA MICRODATA REGISTERS

RAW DATABASE

USER

ADMINISTRATION

METADATA MACRODATA REGISTERS

USER

ADMINISTRATION

METADATA MICRODATA REGISTERS

USER

ADMINISTRATION

METADATA MICRODATA REGISTERS

DATA IMPUTATION SOFTWARE

METADATA MICRODATA MACRODATA

USER ADMINISTRATION


Structure of microdata observation data bo sundgren model l.jpg
Structure of microdata (observation data) Registers module,[Bo Sundgren model]

  • Objects characteristics:Co = O(t).V(t),

  • where: O - is an object type; V - is a variable; t - is a time parameter. Each result of observation is a value of variable (data element) - Co

  • All values of each variable are attached to object (respondent) requisites, which could be called - vectors or dimensions. Analysing population of the respondents, these dimensions we are using for formation of different groupings and for data aggregation.

  • The dimensions listed below could be attached to each value of variable in agricultural statistics :

  • - Main kind of Activities (ISIC classification); - Kind of Ownership and Entrepreneurship (code) - Regional location (code) - Employees group classification (code) - Turnover group classification (code).


Structure of macrodata statistics l.jpg
Structure of macrodata (statistics) Registers module,

  • Macrodata are the result of estimations(aggregations) of a set of microdata.

  • Statistical characteristics:Cs = O(t).V(t).f,

  • where: O and V - is an object characteristics; t - is a time parameter, f – is a aggregation function (sum,count,average, etc) summarizing the true values of V(t) for the objects in O(t).

  • The structure for macrodata is referred in metadata base to as box structure or “alfa-beta-gamma-tau” structure ( ).

  • For data interchange alfa refers to the selection property of objects (O), beta – summarized values ofvariables (V), gamma – crossclassifying variables, tau – time parameters (t).


Structure of surveys questionnaires l.jpg
Structure of Registers module,Surveys (questionnaires)

  • Newsurvey should be described in the Metadata base.For each surveyshall by createdquestionnaireversion, which is valid for at least one year. If questionnaire content and/or layout do not change, then current version and it description in Metadata base is usable for next year.

  • Each survey contains one or more data entry tables or chapterswhich could be constant table- with fixed number of rows and columns or table with variable number of rowsorvariable number of columns.

  • Rowsandcolumnsfor each chapter we aredescribing in the Metadata base with their codes and titles. This information is necessary for automatic data entry application generation, data validation e.t.c.

  • Last step in the questionnairecontent and layout description is cells formation. Cells are smallest data unit in survey data processing. Cells are created as combination of row and column from survey version side and variable from indicators and attributes side.



Structure of agricultural statistics questionnaire example fixed table l.jpg

Name of Questionnaire, index, code Registers module,;

Respondents(object)code, name and address;

Period (year, quarter, month)

Name of chapter

Structure of agricultural statistics questionnaire(example - fixed table)

Metadata repository: common table of statistical indicators, table of attributes (classifications)and table of created variables

INDICATOR 1 + ATTRIBUTE

I n d i c a t o r s

CELL

[2010,1]

VARIABLE 1

A t t r i b u t e s


1 data matrix fixed number of rows 3 and variable number of columns n l.jpg

Row heading Registers module,

Row’s code

Total

Name1

Name2

N

Name n-1

Name n

A

B

9999

ISIC 1 code

ISIC 2 code

…..

ISIC

n-1 code

ISIC n code

Number of employees

1110

Net turnover

1120

Other income

1130

1. Data matrix - Fixed number of Rows (3) and variable number ofColumns (n)

(Example)Main economical indicators of the economics activity


2 data matrix fixed number of columns 3 and variable number of rows n l.jpg

Name of production Registers module,

Product code

(HS or SITC)

Produced in natural measurement

Sailed in natural measurement

Income in USD

A

B

1

2

3

Product 1

1234567

Product 2

2345678

. . .

. . .

. . .

Product n-1

4567890

Product n

5678901

2. Data matrix - Fixed number of Columns (3) and variable number of Rows (n)

(Example)Production of products


Creating of variables l.jpg
Creating of variables Registers module,

ATTRIBUTES(CLASSIFICATIONS)

=VARIABLES

INDICATOR

+

Dimensions (Vectors) of indicators

Example:

Number of employees

+ no attribute

= Number of employees, total

+ Local kind of activity (ISIC)

= Number of employees in breakdown by kind of activity

+ Regional code

= Number of employees in breakdown by regions


Dimensions of objects and indicators example l.jpg
Dimensions of objects and indicators Registers module,(example)

Main dimensions (vectors) of respondents(objects O(t) )

MAIN KIND OF ACTIVITY (ISIC)

REGIONS (Code)

OWNERSHIP AND ENTERPRENERSHIP (Code)

EMPLOYEES GROUP (Code)

TURNOVER GROUP (Code)

INDICATOR

Number of employees in breakdown by regions

Dimensions (vectors) of indicator



Metadata base link with microdata and macrodata bases l.jpg
Metadata base link with Microdata and Macrodata bases Registers module,

META DATA BASE (REPOSITORY)

General

description

of survey

Selecting Indicators

Selecting Attributes

Description

of survey version

Creating of Variables

Description of chapters (data matrix)

Description of rows and columns

Linking variables

to cells

Generation form for data entry (automatically)

Data aggregation function (automatically)

Defining of data aggregation rules

MACRO DATABASE

MICRO DATABASE

IMPORT EXPORT


Data entry and validation l.jpg
Data entry and validation Registers module,

META DATA BASE

BUSINESS REGISTER

Description of validation rules

Data import from files

Creating list of Respon- dents

Description of data entry forms

Full data validation

MICRO DATA BASE

Standard data entry and validation

Data validation

RAW DATA BASE

Data transfer to Microdata Base

Mass data entry

F i r e w a l l

RAW Web DATA BASE

Web data entry and validation

Web Data validation


Results achieved l.jpg

To date within the Metadata Driven Integrated Statistical Data Processing and Dissemination System 67different surveys are implemented

Response rate of WEB data collection for some surveys achieved 30 %

System has been presentedon the Conferences:

- On ISIS 2002, April 2002, Geneva,

-METANET Project Meeting, Samos, Greece, May 2003,

- AMRADS Final Conference, Roma, Italy, November 2003,

- MSIS 2004, May 2004, Geneva Switzerland,

-“Statistics - investment in the future”, Prague, September 2004,

- “ Development of the State Statistical System” Yalta, Ukraine, September 2004.

RESULTS ACHIEVED


Slide22 l.jpg

LESSONS LEARNED Data Processing and Dissemination System

  • Design of the new information system should be based on the results of deep analysis of the statistical processes and data flows

  • Clear objectives of achievements have to be set up, discussed and approved by all parties involved

    • Statisticians

    • IT personal

    • Administration


Slide23 l.jpg

LESSONS LEARNED Data Processing and Dissemination System

  • Within the process of the design and implementation of metadata driven integrated statistical information system both parties statisticians and IT specialists should be involved from the very beginning

  • Both parties have to have clear understanding of all statistical processes,which will be covered by the system, as well as metadata meaning and role within the system from production and user sides


Slide24 l.jpg

LESSONS LEARNED Data Processing and Dissemination System

  • Initiative to move from classical stove-pipe production approach to process oriented have to come from statisticians side not from IT personal or administration

  • Motivation of the statisticians to move from existing to the new data processing environment is essential;

  • Improvement of knowledge about metadata is one of the most important tasks through out of the all process of the design and implementation phases of the project


Slide25 l.jpg

LESSONS LEARNED Data Processing and Dissemination System

  • Clear division of the tasks and responsibilities between statisticians and IT personal is the key point to achieve successful implementation

  • To achieve the best performance of the entire system it is important to organize the execution of the statistical processes in the right sequence

  • Design of the new surveys and questionnaires particularly as well as changes in the existing ones should be done in accordance with the system requirements


Lessons learned l.jpg

As the result of feasibility study we clear understood, that some steps of statistical data processing for different surveys defy standardization, each survey may require complementary functionality (non standard procedures), which is necessary just for this exact survey data processing;

For solving problems with the non-standard procedures interfaces for data export/import to/from system has been developed to ensure use of the standard statistical data processingsoftware packages and other generalized software available in market;

LESSONS LEARNED


Lessons learned27 l.jpg

It is necessary to establish and train special group of statisticians, which will maintain Metadata base and which will be responsible for accurateness of metadata;

For the administration and maintenance of the system it is necessary to have well trained IT staff, which is familiar with the MS SQL Server 2000 administration, MS Analysis Service, other MS tools, PC AXIS family products and system Data Model, system applications;

LESSONS LEARNED


Thank you for attention l.jpg

Thank you for attention !

Karlis Zeila = [email protected]

http://www.csb.lv


ad