Log a methodology for metadata registry based management of scientific data
Download
1 / 23

LoG: A Methodology for Metadata Registry-based Management of Scientific Data - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

LoG: A Methodology for Metadata Registry-based Management of Scientific Data. July 5, 2002 Doo-Kwon Baik [email protected] Content. Motivation Objectives Related works Overview on the MDR The scientific data properties User levels and the data property Data visibility

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' LoG: A Methodology for Metadata Registry-based Management of Scientific Data' - weylin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Content
Content Scientific Data

  • Motivation

  • Objectives

  • Related works

    • Overview on the MDR

  • The scientific data properties

  • User levels and the data property

  • Data visibility

  • The conceptual model of the LoG

  • A LoG Framework

  • An Example

  • Conclusions and Future work

CODATA/DSAO 2002


Motivation
Motivation Scientific Data

  • The existing data integration approaches

    • just focus on the technical researches and systemdevelopments

    • not consider the properties of the domain knowledge

CODATA/DSAO 2002


The domain knowledge
The Domain Knowledge Scientific Data

  • The domain knowledge property

    • is a very important factor in data integration

    • Many works and services depends on the domain knowledge properties

      • The quality degree and the quantity scope in data integration are defined depending on the domain knowledge property.

      • Many other services such as data services and application services depend on it.

data services

(information providing)

the quality degree

of data integration

Domain

knowledge

the quantity scope

of data integration

application

services

CODATA/DSAO 2002


Objectives
Objectives Scientific Data

  • The objectives of our research

    • to solve the problems of the existing data integration approaches

    • to analyze and define the domain knowledge properties

      • In this paper, we focus on the scientific data.

    • to define relationship among the domain knowledge properties, users and metadata

      • i.e., define the considerations for data integration.

    • to create a new methodology considering the results of domain knowledge analysis

      • we called it as LoG (Localization-based Global MDR methodology).

    • finally to design a framework which is suitable for the methodology.

CODATA/DSAO 2002


Related works bottom up approach 1 2
Related works: Bottom-up approach(1/2) Scientific Data

  • The existing data integration approaches are classified into the top-down approach and the bottom-up approach

  • Bottom-up approach

    • is the most general approach

    • The ontology-based methodology is representative

Analyze all factual databases

(the number of databases = n)

Design and create a guideline such as a global view

from the specified databases

the number of databases = n + c

new databases

(the number of them = c)

CODATA/DSAO 2002


Related works bottom up approach 2 2
Related works: Bottom-up approach(2/2) Scientific Data

  • Advantages

    • can reach the perfect data integration because we use a global guideline which is created through analysis and design about all databases

  • Disadvantages

    • the creation of a global guideline spends many costs and time

    • is not suitable for very large scale data integration

    • provides a static integration management mechanism

      • Whenever a new schema or a new database is added to the integrated database, the previous processes is required.

      • It causes the increase of costs and time geometrically.

    • not provide a standardized guideline

      • i.e., it depends on its domain.

      • each application domain for integration define and utilize the different and various guidelines respectively.

CODATA/DSAO 2002


Related works top down approach 1 2
Related works: Top-down approach(1/2) Scientific Data

  • Top-down approach

    • to solve the problems of the bottom-up approach

    • MDR(ISO/IEC 11179) is representative

      • MDR is the international standard

Analyze all factual databases

Design and create a guideline such as a global view(metadata elements) from the specified databases

new databases

Define the schemas of new database

according to the standardized guideline

CODATA/DSAO 2002


Related works bottom up approach 2 21
Related works: Bottom-up approach(2/2) Scientific Data

  • Advantages

    • reduces many costs

      • because it doesn’t require for the rebuilding process of the global guideline.

    • provides a standardized schema

      • all new databases can be built and managed consistently.

  • Disadvantages

    • It also spends many costs initially as the bottom-up approach

      • because it require for the create a global view through analysis of all legacy databases.

      • It is a hard work in case of the very large scale integration.

CODATA/DSAO 2002


Overview on the mdr definition
Overview on the MDR: Definition Scientific Data

  • Definition of MdR

    • Metadata Registry

    • System of Registering, Storing and managing the specification(Metadata) about data elements

    • Evolution of ISO/IEC 11179

    • Metamodel of Data Registry : ANSI X3.285

  • Purpose

    • Metadata Registry for data standardization

    • Support of data search, data specification

    • Support of data sharing among systems or organizations

    • Supporting System of creating, registering and managing data element

    • Support understanding of meaning, representation and identification of data for users

CODATA/DSAO 2002


Overview on the mdr basic concepts
Overview on the MDR: Basic concepts Scientific Data

  • Data Element

    • The basic unit of data management

    • the unit specifying the identification, context, representation of value about data

  • Components of Data Element

    • Object Class : The data for collecting or storing

    • Property : the characteristics needed to identify and explain objects

    • Representation : The description about representational form and value domain of each data elements

Data Element Concept

Data Element

1:N

1:N

Object Class

Object Class

1:1

1:1

Property

Property

1:1

Representation

CODATA/DSAO 2002


Overview on the mdr specification
Overview on the MDR: Specification Scientific Data

  • Specification of Data Element

    • Basic Attribute for specifying data element

CODATA/DSAO 2002


Overview on the mdr an example
Overview on the MDR: An Example Scientific Data

  • Definition of a metadata element

CODATA/DSAO 2002


The scientific data properties
The scientific data properties Scientific Data

  • The scientific data(knowledge) has the following properties:

    • the general data

      • most people can understand and use it easily.

      • most databases in the scientific fields have the similar or same data elements.

    • the specialized data

      • are more complicated and detailed.

      • the general users can’t understand it.

      • the experts in the specific group are interested in the data, and can utilize it.

        ※ Building the MDR for all data as a whole is not necessary

CODATA/DSAO 2002


User levels and the data property
User levels and the data property Scientific Data

  • Classification of users

    • The users are classified into two groups according to the scientific data property

      • The general users and the specialized users.

    • The general users

      • use the general data in high-level and in the many fields.

    • The specialized users

      • domain experts in a specific field.

      • use the general data and specialized data.

      • also differentiated into more detailed fields.

        i.e., The specialized users are distributed into several groups, the experts in each group are interested in more specialized data independently.

CODATA/DSAO 2002


Data visibility
Data visibility Scientific Data

  • Data visibility

    • The quantity and the specialized degree is differentiated into several levels according to the knowledge property,

    • and each level has a independent data set

general

users

all

users

detailed

-specialized

users 1

used by all users

set 1

used by specialized users

specialized

users

set 2

. . .

used in independent

expert domain group

set 3

set 4

detailed

-specialized

users n

set 5

the whole data set

CODATA/DSAO 2002


The conceptual relation diagram

. Scientific Data

.

.

The conceptual relation diagram

General User 1

General User 2

General User n

. . .

Generalization

Globalization

Domain

Expert 1

Global MDR

Domain

Expert 2

Local MDR 1

(Domain 1)

Local MDR 2

(Domain 2)

Local MDR m

(Domain m)

. . .

Domain

Expert n

DB m1

DB m2

DB 11

DB 12

DB 21

DB 22

. . .

DB mn

DB 2n

. . .

. . .

. . .

DB 1n

Domain m

Domain 1

Domain 2

Specialization

Localization

CODATA/DSAO 2002


The conceptual model of the log

User Interface Layer Scientific Data

Global MDR Layer (Generalized Layer)

Local MDR Layer (Specialized Layer)

Factual Database Layer

The conceptual model of the LoG

  • The LoG methodology has four layers

    • Interface Layer

      • provides the user interface environments for all users.

    • Global MDR Layer

      • manages the global MDR for the most generalized and common data which all users(general and specialized users) utilize and access.

    • Local MDR Layer

      • manages the local MDRs for the specialized data which the experts use.

      • The local MDR may be hierarchical structure.

    • Factual Database Layer

      • manages the low and factual data.

CODATA/DSAO 2002


A log framework 1 2

GMeta Repository Scientific Data

GMDR

A LoG Framework(1/2)

User Interface Layer

Global User Interface (General User Level Interface)

Global MDR Layer

General User Level

Interface Agent

GMDR Agent

(Registration, Classification)

LMDRs

LMeta Repository

(Sets of actual metadata)

Local MDR Layer

Local User Interface

(Expert Level Interface)

LMDR 1

LMDR 2

LMDR n

Expert Level

Interface Agent

LMDR Agent

(Registration, Classification, Authorization)

Factual DB Layer

Factual DB Layer

DB m1

DB m2

DB 11

DB 12

DB 21

DB 22

. . .

DB mn

DB 1n

DB 2n

. . .

. . .

. . .

Domain m

Domain 1

Domain 2

CODATA/DSAO 2002


A log framework 2 2
A LoG Framework(2/2) Scientific Data

  • Interface Layer

    • Global user interface and local user interface sub-layers

  • Global MDR layer

    • GMDR agent

      • manage the GMDR(global MDR) and the GMeta(global metadata repository).

    • GMDR(global MDR)

      • a standardized guideline for general users and experts.

      • the set of metadata elements used commonly in all databases.

    • GMeta(global metadata repository)

      • the set of actual metadata

  • Local MDR layer

    • LMDR agent

      • manage the LMDRs and the LMeta

    • LMDRs(local MDRs)

      • a standardized guideline for the specialized users.

      • a set of metadata elements which is to generalize data in each field or detailed field.

CODATA/DSAO 2002


An example
An Example Scientific Data

GMDR

LMDRs

. . .

. . .

CODATA/DSAO 2002


Conclusions and future work
Conclusions and Future work Scientific Data

  • Conclusions

    • We considered and defined the domain knowledge property

    • The LoG methodology is proposed with the knowledge property

      • provides a dynamic integration mechanism partially.

      • provides a standardization guideline based on ISO/IEC 11179, the international standard.

      • reduces unnecessary costs from analysis and design all databases for creation of a global view.

  • Future work

    • to analyze and define the domain knowledge property in detail

    • to implement a prototype based on the framework we described

CODATA/DSAO 2002


Q / A Scientific Data

Thanks !


ad