Sc 32 wg 2 tutorial metadata registry standards july 16 2007
Download
1 / 34

SC 32 - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

SC 32/WG 2 Tutorial Metadata Registry Standards July 16, 2007. JTC1 SC32 N1649. Bruce Bargmeyer University of California, Berkeley and Lawrence Berkley National Laboratory Tel: +1 510-495-2905 [email protected] Topics. Standards development: OMG, ISO (TC 37 & JTC 1/SC 32), W3C, OASIS

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SC 32' - kimball


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sc 32 wg 2 tutorial metadata registry standards july 16 2007

SC 32/WG 2 Tutorial

Metadata Registry Standards

July 16, 2007

JTC1 SC32 N1649

Bruce Bargmeyer

University of California, Berkeley

and

Lawrence Berkley National Laboratory

Tel: +1 510-495-2905

[email protected]


Topics
Topics

  • Standards development: OMG, ISO (TC 37 & JTC 1/SC 32), W3C, OASIS

    • Align, Coordinate, Integrate: Standards, Recommendations, Specifications

  • Semantics Challenges and Future Directions


Align coordinate integrate standards
Align, Coordinate, IntegrateStandards

WG 2 doing OK internally:

24707

11179

E3

19763

20944


Align coordinate integrate standards1
Align, Coordinate, IntegrateStandards

SC 32?

WG 1

WG 2

WG 3

WG 4

Clearwater meeting

a step forward


Align coordinate integrate standards recommendations specifications for semantic computing

Users

CONCEPT

Metadata Registry

Refers To

Symbolizes

TerminologyThesaurus Taxonomy

“Rose”,

“ClipArt

Rose”

Ontology

Stands For

Referent

Data

Standards

Structured

Metadata

11179 Metadata Registry

Align, Coordinate, Integrate Standards/Recommendations/Specificationsfor Semantic Computing

Semantic

Web

Terminology

Object

Management

ISO/IEC 11179

Metadata

Registries

Graph

RDF

MOF

ODM

CWM

IMM

Subject

Node

Predicate

Edge

Node

Object

W3C

OMG

ISO/IEC JTC 1/SC 32

ISO TC 37


Standards development semantics management and semantics services semantic computing
Standards DevelopmentSemantics Management and Semantics Services – Semantic Computing

Align, Co-develop, Fast Track, PAS Submission …

OMG

W3C

ISO/IEC

JTC 1 SC 32

ISO TC 37


Standards development semantics management and semantics services semantic computing1
Standards DevelopmentSemantics Management and Semantics Services – Semantic Computing

Align, integrate, co-develop, Fast Track, PAS Submission …

Can we coordinate content?

OMG

ISO/IEC

JTC 1 SC 32

W3C

W3C


A success
A Success

Some text and figures are identical in the two standards.

OMG

ISO/IEC 24707

OMG ODM

ISO/IEC

JTC 1 SC 32

ISO/IEC 20944 – Common Logic

OMG Ontology Definition Metamodel


Standards development semantics management and semantics services semantic computing2
Standards DevelopmentSemantics Management and Semantics Services – Semantic Computing

Ongoing effort

ISO/IEC

JTC 1 SC 32

ISO/IEC 11179

(Edition 3)


Standards development semantics management and semantics services semantic computing3
Standards DevelopmentSemantics Management and Semantics Services – Semantic Computing

Possible effort

OMG

RFP - MOF?

IMM

11179 E3 proposals


Standards development semantics management and semantics services semantic computing4
Standards DevelopmentSemantics Management and Semantics Services – Semantic Computing

Hopeful?

OMG

IMM

&

ISO/IEC

JTC 1 SC 32

ISO/IEC 11179

(Edition 3)


Other possibilities
Other Possibilities

  • OASIS ebXML Registry

  • W3C Semantic Web Deployment WG

  • TC 37


The ageless information problem cf data information knowledge wisdom
The Ageless Information Problemcf: Data, Information, Knowledge, Wisdom

Getting the information that we need, when we need it, without afflicting the excellent minds of humans with toil and drudgery

The litany:

  • Too much or too little, irrelevant, not authoritative, out of date

  • Unknown quality, not trustable, lacks provenance, no certainty measures

  • Difficult to find, difficult to access, difficult to use

  • Meaning not clear, relationship to other information not clear

  • Data creators do not have the same understanding of the data as end users

  • Recorded data loses much real world meaning, context, relationships

  • Much of the meaning of data is buried in the processes used to manipulate the data (e.g., in computer code)

  • Need improvements in efficiency and effectiveness

    Every time we solve it, we re-create it.


New semantics capabilities proposed for iso iec 11179 mdr edition 3
New Semantics Capabilities Proposed for ISO/IEC 11179 MDR (Edition 3)

  • Improve traditional data management/data administration

    • Use stronger semantics management and semantics services capabilities

  • Enable something new

    • Semantic computing


Semantic computing the nub of it
Semantic Computing: (Edition 3)The Nub of It

  • Processing that takes “meaning” into account

    • Makes use of concept systems, e.g., thesauri and/or ontologies

    • Moves some of the “meaning” of data from computer code to managed semantics

  • Processing that uses (e.g., reasons across) the relations between things not just computing about the things themselves.

  • Processing that helps to take people out of the computation, reducing the human toil

    • Semantics “grounding” for data, data discovery, extraction, mapping, translation, formatting, validation, inferencing, …

  • Delivering higher-level results that are more helpful for the user’s thought and action


In the epic information struggle we have made heroic progress
In The Epic Information Struggle (Edition 3)We Have Made Heroic Progress

Files

Computer Processing

Cards

Tape

Disk

Machine

Processing


In the epic information struggle we have made heroic progress1
In The Epic Information Struggle (Edition 3)We Have Made Heroic Progress

In structuring data and text --

  • Structured Data

    • Columns on cards & tape (possibly comma separated)

    • Hierarchical (DBMS)

    • Network

    • Table (relational DBMS)

    • Hierarchy (XML)

    • Graph (RDF)

  • Semi-structured text

    • Nrof, trof, LaTeX …

    • SGML

    • HTML

    • XML


In the epic information struggle we have made heroic progress2
In The Epic Information Struggle (Edition 3)We Have Made Heroic Progress

In documenting data and text (e.g., semantics management) –

  • Data Standards

    • Code sets

  • (Meta)Data Standards

    • Data element definitions, valid values, value meanings

    • Metadata registries (MDR, ISO/IEC 11179)

    • Other standards as presented at this conference

  • Concept systems (or KOS)

    • Glossaries

    • Dictionaries

    • Thesauri

    • Taxonomies

    • Ontologies

    • Graphs


Semantic management proposals for 11179 edition 3
Semantic Management (Edition 3)Proposals for 11179 Edition 3

  • Improve data management through use of stronger semantics management

    • Databases

    • XML data

    • Other “traditional” data

  • Enable new wave of semantic computing

    • Take meaning of data into account

    • Process across relations as well as properties

    • May use reasoning engines, e.g., to draw inferences


Semantic computing application find and process non explicit data
Semantic Computing Application: (Edition 3)Find and process non-explicit data

Analgesic Agent

For example…

Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…);

However, want to study patients taking analgesic agents

Non-Narcotic Analgesic

Analgesic and Antipyretic

Nonsteroidal Antiinflammatory Drug

Acetominophen

Datril

Tylenol

Anacin-3


A Semantics Application: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem

An organism is connected to another organism for which it is a source

of food energy and material by an arrow representing the direction of

biomass transfer.

Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)


Semantics application combine data metadata concept systems

Contamination Relations, e.g., within a

Biological

Radioactive

Chemical

mercury

lead

cadmium

Semantics Application: Combine Data, Metadata & Concept Systems

Inference Search Query:

“find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”

Concept system:

Data:

Metadata:


Challenge use data from systems that record the same facts with different terms

Dublin Relations, e.g., within a

Core

Registries

Software

Component

Registries

Common Content

Common Content

Challenge: Use data from systems that record the same facts with different terms

Database

Catalogs

Common Content

ISO 11179Registries

UDDIRegistries

Table

Column

Data

Element

Common Content

Common Content

Business

Specification

Country

Identifier

OASIS/ebXMLRegistries

CASE Tool

Repositories

XML Tag

Attribute

Common Content

Common Content

Business

Object

Coverage

TermHierarchy

OntologicalRegistries

Common Content


Same fact different terms

Name: Country Identifiers Relations, e.g., within a

Context:

Definition:

Unique ID: 5769

Conceptual Domain:

Maintenance Org.:

Steward:

Classification:

Registration Authority:

Others

DataElementConcept

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Same Fact, Different Terms

Data Elements

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZ

BE

CN

DK

EG

FR

. . .

ZW

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

012

056

156

208

818

250

. . .

716

Name:

Context:

Definition:

Unique ID: 4572

Value Domain:

Maintenance Org.

Steward:

Classification:

Registration Authority:

Others

ISO 3166

3-Alpha Code

ISO 3166

English Name

ISO 3166

French Name

ISO 3166

2-Alpha Code

ISO 3166

3-Numeric Code



A semantics application: studies, databases, reports, etc.

Information Extraction and Use

Extraction

Engine

Segment

Classify

Associate

Normalize

Deduplicate

Discover patterns

Select models

Fit parameters

Inference

Report results

11179-3

(E3)

XMDR

Actionable Information

Decision Support


Metadata registries are useful
Metadata Registries are Useful studies, databases, reports, etc.

Registered semantics

  • For “training” extraction engines

  • The “Normalize” function can make use of standard code sets that have mapping between representation forms.

  • The “Classify” function can interact with pre-established concept systems.

    Provenance

  • High precision for proper nouns, less precision (e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision


Challenge gain common understanding of meaning between data creators and data users
Challenge: Gain Common Understanding of meaning between Data Creators and Data Users

text

text

data

data

environ

agriculture

climate

human health

industry

tourism

soil

water

air

ambiente

agricultura

tiempo

salud hunano

industria

turismo

tierra

agua

aero

123

345

445

670

248

591

308

123

345

445

670

248

591

308

3268

0825

1348

5038

2708

0000

2178

3268

0825

1348

5038

2708

0000

2178

123

345

445

670

248

591

308

123

345

445

670

248

591

308

3268

0825

1348

5038

2708

0000

2178

3268

0825

1348

5038

2708

0000

2178

A common interpretation of what the data

represents

EEA

USGS

text

data

environ

agriculture

climate

human health

industry

tourism

soil

water

air

DoD

123

345

445

670

248

591

308

123

345

445

670

248

591

308

3268

0825

1348

5038

2708

0000

2178

3268

0825

1348

5038

2708

0000

2178

Users

text

data

environ

agriculture

climate

human health

industry

tourism

soil

water

air

EPA

123

345

445

670

248

591

308

123

345

445

670

248

591

308

3268

0825

1348

5038

2708

0000

2178

3268

0825

1348

5038

2708

0000

2178

text

data

3268

0825

1348

5038

2708

0000

2178

123

345

445

670

248

591

308

ambiente

agricultura

tiempo

salud huno

industria

turismo

tierra

agua

aero

123

345

445

670

248

591

308

3268

0825

1348

5038

Others . . .

Users

Information

systems

Data Creation


Practical vocabulary management
Practical Vocabulary Management Creators and Data Users

  • Vocabulary Management is essential for use of semantic technologies

    • Define concepts and relationships

    • Harmonize terminology, resolve conflicts

    • Collaborate with stakeholders

  • An approach

    • Select a domain of interest

    • Enter core concepts and relationships

    • Engage community in vocabulary review

    • Harmonize, validate and vet the vocabulary

    • Enter metadata describing enterprise data

    • Link concept system to metadata


Use extended mdr capabilities
Use eXtended MDR Capabilities Creators and Data Users

  • For vocabulary repository

    • Register, harmonize, validate, and vet definitions and relations

  • To register mappings between multiple vocabularies

  • To register mappings of concepts to data

  • To provide semantics services

  • To register and manage the provenance of data

    11179-3 (E3) is part of the infrastructure for semantics and data management.

    These capabilities are proposed for ISO/IEC 11179 Edition 3


11179 e3 use
11179 (E3) Use Creators and Data Users

  • Upside

    • Collaborative

      • Supports interaction with community of interest

      • Shared evolution and dissemination

      • Enables Review Cycle

    • Standards-based – don’t lock semantics into proprietary technology

    • Foundation for strategic data centric applications

    • Lays the foundation for Ontology-based Information Management

    • Content is reusable for many purposes

  • Downside

    • Managing semantics is HARD WORK- No matter how friendly the tools

    • Needs integration with other components


Some challenges
Some Challenges Creators and Data Users

  • Data management and metadata management must evolve to address more complex data structures (relational, object, hierarchies, graphs)

    • Query capabilities

      • More than SQL, XQuery, SPARQL

    • Discovery mechanisms

      • More than Google

    • Access, mining, extraction

      We need stronger semantics management


Metadata registry support for
Metadata Registry Support for Creators and Data Users

  • Registering and mapping ontologies

  • Ontology Evolution

  • Registering Process Ontologies


Thank you
Thank You Creators and Data Users

Bruce Bargmeyer

Lawrence Berkeley National Laboratory &

Berkeley Water Center

University of California, Berkeley

Tel: +1 510-495-2905

[email protected]

  • Acknowledgements

    • Karlo Berket, LBNL

    • Kevin Keck, LBNL

    • John McCarthy, LBNL

    • Harold Solbrig, Apelon

      This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.


ad