400 likes | 654 Views
printed 7/5/2005 10:33 AM page 2 of 30. Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt. XMDR Outline. MDRs
E N D
1. eXtending MetaData Registries:XMDR Project & Prototype XMDR Working Group
Presentation to SC 32/WG 2 meeting
And to SC 32/WG 2 (11179 P2&3 (E3), MMF, and OMG/ODM Liaison Meeting
September, 2005
Toronto, Canada
2. printed 7/5/2005 10:33 AM page 2 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR Outline MDRs – Purpose and Goals (slides 3-8)
Differentiate - position with other ISO & non ISO standards (slides 9 – 11)
Present XMDR Collaboration & Project & Prototype Purposes & Goals (slides 12 – 33)
Describe metadata/technical platform/architecture for XMDR (slide 28-33)
Demonstrate XMDR-it (live demo? Screen snaps?)
Explain importance of XMDR-it three levels of constraints
XML, RDF, OWL (slide 34-38)
Outline current challenges and future plans (slide 40-41)
Contacts/URLs/Credits (slide 42-43)
3. printed 7/5/2005 10:33 AM page 3 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt In the Beginning… Organizational structure and funding created a culture fostering stove-pipe, media-specific, heterogeneous systems
Lack of information sharing/integration
Lack of ability to aggregate information across systems
Inability to retrieve data to answer questions
Higher cost due to redundant/incompatible data maintenance
Lack of technology options to enable integration of systems and data
4. printed 7/5/2005 10:33 AM page 4 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OLD Approaches to Semantic Integration and Interoperability Option 1: “Thou Shalt”
Everyone adopts a single data model for a particular domain
Genbank, PDB, HL7 are examples of these sorts of models
Option 2: “Multi-party Agreements”
Several sites agree on a format for interchanging data
Sites maintain a local data dictionary, XML schema, etc. to describe information model
Advantages:
Ensures interoperability
Minimal overhead
Disadvantages:
Not flexible
Does not allow data stores for particular use cases
Unrealistic-especially across different large organizations
Advantages:
Flexible
Low Overhead
Disadvantages:
Works only where existing bilateral (or multilateral) agreements exist
Each new node must arrange to be interoperable with all other nodes or node cluster
5. printed 7/5/2005 10:33 AM page 5 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt New Approaches: for Semantic Integration and Interoperability Option 3 Standards based metadata descriptors
“Common Data Elements”
Common terms & concepts
Provide a complete description of all attributes in a systematic, uniform and unambiguous format
Description must be based on a common (but expandable) vocabulary.
Rely on concept codes, not concept names
Track quality and accessibility
Advantages:
Provides more ways to surface semantic matches – words and immutable codes
Allows new systems to find points of interoperability with all other data systems at once
Machine understandable
Stable immutable identifiers
Low barriers to entry
Disadvantages:
Requires a very complete description of the contents of an attribute.
Some degree of overhead associated with creating and maintaining a compatible system
6. printed 7/5/2005 10:33 AM page 6 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt How Does Registered Metadata Promote Better Data Management? Provides a model that is consistent/exchangeable for capturing data about data
Captures unambiguous semantic information in one place
Documents details on the custodian of the data
Links directly to online data
Reduces risk/cost of duplicating data collections
7. printed 7/5/2005 10:33 AM page 7 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What is a metadata registry good for? Data administration (design time)
Databases, DB applications
Messaging systems
Terminologies, Taxonomies, Ontologies
Data Integration (design + run time)
federated queries, data warehousing
Discovery of hidden relationships between data
Support for interactive users (run time)
Data entry forms, output explanation
Navigation of databases
Semantic Web Services (run time)
8. printed 7/5/2005 10:33 AM page 8 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt ISO/IEC 11179 MDR Standard Used to record and link:
Data elements
Data element concepts
Conceptual Domains
Value Domains: e.g, enumerated value domains
Classification Schemes
…..
Goals:
To record the unambiguous meaning of data elements
Human Understandable: Current paradigm is natural language definitions
Machine Understandable: Formal definitions (and axioms) coming in Edition 3 (?)
9. printed 7/5/2005 10:33 AM page 9 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt ISO/IEC 11179 Metadata Registry Standard Spans both:
Conceptual models of the real world:
Concepts, data element concepts, classification schemes
Terminologies, taxonomies, ontologies
Information Artifacts
Data elements, enumerated values, ...
UML models (e.g., in caDSR)
10. printed 7/5/2005 10:33 AM page 10 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Conceptual vs. Information CentricMetadata Standards
11. printed 7/5/2005 10:33 AM page 11 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Space of Metadata Standards
12. printed 7/5/2005 10:33 AM page 12 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Example Users of ISO/IEC 11179 Metadata Registries U.S. EPA: Environmental Data Registry (EDR)
System of Registries – need to register lots of “things”
National Cancer Institute (NCI): Cancer Data Standards Registry (caDSR)
Registration of data elements and domain object models
U.S. Veterans Health Administration
U.S. Census Bureau
U.S. Bureau of Labor Statistics
Data Element Concepts, Value Domains
Statistics Canada
Australian Health Administration
Data Elements
European Environment Agency …
13. printed 7/5/2005 10:33 AM page 13 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Evolution of ISO/IEC 11179 Edition 2 used to Register Data Elements:
Classification Schemes, Data Element Concepts, Value domains, Object Class, Property, etc…
Data element = Data element concept + Value Domain (representation)
Representation = data type or code set (enumerated list)
Concepts as optional
Supported Object Class
Not widely utilized
14. printed 7/5/2005 10:33 AM page 14 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Challenges of ISO/IEC 11179 Metadata Registries Need to easily retrieve semantically related items
Need to Support Discovery
Enable navigation within and among taxonomies
Even when producer and consumer do not share common taxonomy
Current Classification Scheme Administered Item not sufficient for registration of taxonomies, ontologies, etc.
Need for consistency and richer metamodel
Complex Relationships between items
15. printed 7/5/2005 10:33 AM page 15 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Many taxonomies Purpose of registering taxonomies and ontologies along with data elements, values, etc. in one MDR
Provide visibility to manage and consume Reuse
Harmonization
Support Discovery
Enable navigation within and among
Even when producer and consumer do not share common taxonomy
Ensure availability From the perspective of the CES Metadata Working Group, the primary purpose of a taxonomy is to aid in the discovery process. In an academic paper, we list subject keywords that don’t necessarily appear in the paper to improve the odds that the paper will be discovered by someone who is not familiar with the terminology used in the paper. In that case, the keywords do not specify a context or namespace, so the terms may be ambiguous or overloaded. The Taxonomy Focus Group (of the CES Metadata Working Group) has developed a DoD Core Taxonomy and framework whereby COIs may create (and distinguish) their own taxonomies that can be used by consumers and producers as part of the discovery process.From the perspective of the CES Metadata Working Group, the primary purpose of a taxonomy is to aid in the discovery process. In an academic paper, we list subject keywords that don’t necessarily appear in the paper to improve the odds that the paper will be discovered by someone who is not familiar with the terminology used in the paper. In that case, the keywords do not specify a context or namespace, so the terms may be ambiguous or overloaded. The Taxonomy Focus Group (of the CES Metadata Working Group) has developed a DoD Core Taxonomy and framework whereby COIs may create (and distinguish) their own taxonomies that can be used by consumers and producers as part of the discovery process.
16. printed 7/5/2005 10:33 AM page 16 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Example: XML Management Challenge XML: One Language, Many Vocabularies XML provides a standard syntax so we can compare and mediate, even when the values are identical. Some fault XML, citing a concern that each developer is creating unique XML vocabularies, even for the same data representation. While that may be true in some circumstances, it is also true that with XML, we now have a common syntax for comparing and migrating representations. The common syntax makes it easier to see the heterogeneity and resolve it or mediate.XML provides a standard syntax so we can compare and mediate, even when the values are identical. Some fault XML, citing a concern that each developer is creating unique XML vocabularies, even for the same data representation. While that may be true in some circumstances, it is also true that with XML, we now have a common syntax for comparing and migrating representations. The common syntax makes it easier to see the heterogeneity and resolve it or mediate.
17. printed 7/5/2005 10:33 AM page 17 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt *Example of Semantic Integration for Interoperability
18. printed 7/5/2005 10:33 AM page 18 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt *Concept Use and Integrationwith 11179 Part 3, Edition 2 We talked about a Data Element being formed by a concept taking on a specific representation. In ISO 11179 terms this translates to the combination of a specific Data Element Concept and a specific Value Domain, you see this denoted in the middle of this chart by the yellow box outlined in red. caDSR administered items are backed by the use of externally defined terminologies and controlled vocabularies. With UMLS as a framework NCI has developed vocabulary services that are accessed via API – application to application interfaces – to provide touch points during creation of content, resulting in administered components that are bound to immutable concept codes. These touchpoints, denoted by the EVS logos, are currently implemented at the Object Class, Property, Representation Term, Value Domain and Valid Value levels of the metadata model. We talked about a Data Element being formed by a concept taking on a specific representation. In ISO 11179 terms this translates to the combination of a specific Data Element Concept and a specific Value Domain, you see this denoted in the middle of this chart by the yellow box outlined in red. caDSR administered items are backed by the use of externally defined terminologies and controlled vocabularies. With UMLS as a framework NCI has developed vocabulary services that are accessed via API – application to application interfaces – to provide touch points during creation of content, resulting in administered components that are bound to immutable concept codes. These touchpoints, denoted by the EVS logos, are currently implemented at the Object Class, Property, Representation Term, Value Domain and Valid Value levels of the metadata model.
19. printed 7/5/2005 10:33 AM page 19 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Semantic FrameworkNCI Example The three key components of caCORE are the Enterprise Vocabulary Services, figuratively and literally the foundation of caCORE;
the cancer bioinformatics objects, caBIO, providing biomedical data objects implemented in a robust extendable architecture;
and the cancer Data Standards Repository, caDSR, providing, among other things, a semantic bridge between the data elements in registered data objects and standard vocabularies and ontologies.
NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, downloadable from the NCICB web site.
The question is how does all this fit together?
The next question is, of course, how do all of these things work together?
The three key components of caCORE are the Enterprise Vocabulary Services, figuratively and literally the foundation of caCORE;
the cancer bioinformatics objects, caBIO, providing biomedical data objects implemented in a robust extendable architecture;
and the cancer Data Standards Repository, caDSR, providing, among other things, a semantic bridge between the data elements in registered data objects and standard vocabularies and ontologies.
NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, downloadable from the NCICB web site.
The question is how does all this fit together?
The next question is, of course, how do all of these things work together?
20. printed 7/5/2005 10:33 AM page 20 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Where have we been? Where are we now?…& where are we planning to go?
21. printed 7/5/2005 10:33 AM page 21 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What is XMDR?eXtended MetaData Registries A set of collaborative initiatives by groups with shared goals
extend the ISO/IEC 11179 metadata registry standard (XMDR-s)
EPA, NCI, DOD, LBNL, Mayo Clinic, USGS, Ecoterm, UNEP, GBIF
align & harmonize various related metadata standards (XMDR-h)
ISO WG2: 11179, 19763, 20944, 24707; OMG: ODM, CWM;
Say which is which
(several of the above groups have members on these committees)
An open source implementation & testbed (XMDR-it) to
assemble & test metadata from diverse sources & structures
e.g., terminologies, ontologies, etc. for health, environment, geography, …
explore emerging semantic technologies (e.g., RDF, OWL, CL, …)
demonstrate new capabilities
e.g., ontology lifecycle management & harmonization
22. printed 7/5/2005 10:33 AM page 22 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Why do we need metadata registry extensions? …in order to Enhance capabilities to capture and retrieve semantics of information artifacts (e.g., data elements and value domains) in metadata registries using terminologies, taxonomies, ontologies, etc. …
Improve representation of relationships between data (e.g., objects, data elements & domains) and concept structures (ontologies, taxonomies, thesauri, terminologies, …)
Register complex semantic metadata (concept structures, terminologies) in more formal, systematic ways (e.g., description logic) to facilitate machine processing for
creating and managing names, definitions, terms, etc.
linking together data elements, etc. across multiple systems
discovering relationships among data elements & terms
23. printed 7/5/2005 10:33 AM page 23 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR Semantic Extensions Goals Sharable data that can easily be identified and aggregated across organizations
Unambiguous metadata characteristics to convey semantic, syntactic and lexical meaning
Human AND Machine understandable
Registration and management of everything useful for administering and managing data, including concept systems, ontologies, etc.
Machine understanding of semantics to facilitate inference, aggregation, and agent services
24. printed 7/5/2005 10:33 AM page 24 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Goals of the open source XMDR-it prototype implementation testbed Demonstrate feasibility & utility of proposed revisions to ISO/IEC 11179
Provide open-source reference implementation with XMDR capabilities
Determine the necessary features to leverage semantic interoperability between ‘concept’ systems and ‘data elements’
e.g., for ontology lifecycle management & harmonization
Explore benefits of representing XMDR content using emerging semantic technologies (e.g., RDF, OWL, CL, …)
integrate open source tools to create, maintain, deploy XMDR standards
test capabilities and performance of candidate tools
Assemble semantic metadata with different structures from diverse sources to test various semantic technologies
terminologies, thesauri, ontologies, …
From health, environment, geography, …
Help resolve registration & harmonization issues for different metadata standards, including ODM & MMF
25. printed 7/5/2005 10:33 AM page 25 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Role of terminologies and ontologies in metadata registries Sources for concepts, concept definitions, object classes, properties, value meanings, external references
Terminologies as classification schemes (e.g., taxonomies)
Ontologies to specify semantic relationships
is-a, part-of, instance-of, …
inheritance permits more compact definitions
semantic pathways for indexing
facilitates searching subclasses & inverses
Frameworks for integration of multiple schemas …
Help connect metadata entities via shared terms
via automatic indexing of metadata words
via text values from specific metadata elements
26. printed 7/5/2005 10:33 AM page 26 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What the XMDR Project IS NOT! An attempt to turn 11179 metadata registries into a development and maintenance facility for every type of concept structure
An attempt to standardize the complete range of terminology and ontology data & services
Production implementation for one organization
[any other things we want to disavow?]
27. printed 7/5/2005 10:33 AM page 27 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it example content has been loadedfrom diverse sources via lexgrid & XSLT
28. printed 7/5/2005 10:33 AM page 28 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Additional Metadata Content will be added to XMDR Prototype EDR (EPA Environmental Data Registry)
caDSR (NCI Cancer Data Standards Registry)
IETF RFC 3066 Language Codes
NBII Biocomplexity Thesaurus
USGS Geographic Names Information System
Getty Thesaurus of Geographic Names
I.T.I.S. - Integrated Taxonomic Information System
Adult Mouse Anatomy
Foundational Model of Anatomy
NASA SWEET (Semantic Web Earth & Environmental Terminologies)
EPA Chemical Substance Registry
GO (Gene Ontology), ….Agrovoc
29. printed 7/5/2005 10:33 AM page 29 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it now contains an xml file for each 11179 “item” Context for Administered Items e.g., XMDR?
Concept Systems e.g., GEMET, DTIC
Data Elements e.g., Country Name
Data Element Concepts e.g., Country Label
Conceptual Domains e.g., Countries of the World
Representation Classes e.g., Code
Value Domains e.g., countries of the world
Relationship Types e.g., ??
30. printed 7/5/2005 10:33 AM page 30 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Each metadata entity (object, concept, data element) is Logically stored as a separate XML “file/document”
Stored in Subversion code management system
provides a versioning capability
stores “files” in Berkeley DB
Berkeley DB provides transactions, backups, ...
Compliant with three complementary standards:
An XML Schema (document constraints)
An RDF Schema (graph constraints)
OWL ontology
31. printed 7/5/2005 10:33 AM page 31 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it XML schema provides a number of important benefits… Schema specifies what is required as well as what is legal
Divides metadata into files conforming to XML schema
Normalizes data (ala’ relational “one fact in one place”)
Facilitates XSLT transformations by reducing degrees of freedom to a canonical encoding within the RDF standard
Relax NG used to create and check XMDR-it schema
RNG validator enforces many OWL ontology constraints
TRang automatically translates into XML schema syntax
32. printed 7/5/2005 10:33 AM page 32 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt RDF provides complementary benefits on top of XML All the advantages of XML plus …
RDF provides more explicit semantics than XML
Users can employ a growing set of RDF tools
e.g., SPARQL query language, SWRL rule language, Jena inference
More powerful retrieval capabilities
Using many different RDF graph query tools
RDF’s graph data model supports inference
e.g., inclusion of subsumed sub-classes
Results can be either
tuples (ala relational tables)
XML/RDF graphs (being developed for W3C’s SPARQL)
Facilitates integrated use and management of multiple related concepts spanning different concept systems
33. printed 7/5/2005 10:33 AM page 33 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OWL ontology specification adds richer semantics atop RDF & XML All the advantages of XML & RDF plus…
RNG validator enforces many OWL ontology constraints
Classes and subclasses (is-a relationships)
Union classes
Inverses
Same-as, same-property-as, same-class-as
Restriction classes (restrict range, cardinality, etc. of property based on type of subject)
…and tools for creation, editing, visualization, and management (Protégé & plug-ins)
34. printed 7/5/2005 10:33 AM page 34 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OWL, RDF & XML Schema used to specify XMDR-it as UML for 11179-X metamodel
35. printed 7/5/2005 10:33 AM page 35 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it Architecture: Initial Implemented Modules
36. printed 7/5/2005 10:33 AM page 36 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it Advanced Search Interfacehelps explore registry contents
37. printed 7/5/2005 10:33 AM page 37 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Technical Challenges and Issues for XMDR Implementation Testbed Complexity
Representation of Relationships
XML + RDF + OWL is a lot
Scalability & performance
Currently includes only 60,000+ objects
maybe indexing and/or distributed registries will help?
RDF Issues
RDF queries yield tuples, not RDF objects (but W3C at work)
RDF tools won’t create XMDR files (add wrapper constraints?)
User-friendly interface for RDF queries (later)
External data sources, ontologies, terminologies
Harmonization with ODM and MMF
XML/RDF objects results display & browsing
Something like EDR UI with link labels & inverse refs
38. printed 7/5/2005 10:33 AM page 38 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-s 11179 extensions Challenges and Issues Harominze and align XMDR recommendations with
ISO 11179 Metadata Registries (MDR)
ISO 19763 Framework for Metamodel Interoperability (MMF)
ISO 24707 Common Logic (CL)
ISO 20944 Metadata Interoperability and Bindings
OMG’s ODM
Improve the current Part 3 of ISO 11179 standard
Separate registration section of the model so we can register "anything"
Simplify mechanisms for registering and using relationships between administered items, along with mechanisms for registering & using ontologies
Improve the "classification" region in Part 3, particularly with regard to concepts and relationships
39. printed 7/5/2005 10:33 AM page 39 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR RegistryFrom XMDR & MMF meeting
40. printed 7/5/2005 10:33 AM page 40 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt The Requirements for XMDR (from XMDR & MMF meeting)
41. printed 7/5/2005 10:33 AM page 41 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt More Information XMDR Web Site
http://xmdr.org
ISO/IEC 11179 Web site
http://www.metadata-standards.org
OMG Web Site
http://www.omg.org
Annual Open Metadata Forum
Kobe, Japan, Spring 2006
W3C RDF Access Working Group
http://www.w3.org/2001/sw/DataAccess/
Bruce Bargmeyer
XMDR Principal Investigator
Contact concerning open postion
Lawrence Berkeley National Laboratory
bebargmeyer@lbl.gov
510-495-2905