1 / 40

EXtending MetaData Registries: XMDR Project Prototype

printed 7/5/2005 10:33 AM page 2 of 30. Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt. XMDR Outline. MDRs

maik
Download Presentation

EXtending MetaData Registries: XMDR Project Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. eXtending MetaData Registries: XMDR Project & Prototype XMDR Working Group Presentation to SC 32/WG 2 meeting And to SC 32/WG 2 (11179 P2&3 (E3), MMF, and OMG/ODM Liaison Meeting September, 2005 Toronto, Canada

    2. printed 7/5/2005 10:33 AM page 2 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR Outline MDRs – Purpose and Goals (slides 3-8) Differentiate - position with other ISO & non ISO standards (slides 9 – 11) Present XMDR Collaboration & Project & Prototype Purposes & Goals (slides 12 – 33) Describe metadata/technical platform/architecture for XMDR (slide 28-33) Demonstrate XMDR-it (live demo? Screen snaps?) Explain importance of XMDR-it three levels of constraints XML, RDF, OWL (slide 34-38) Outline current challenges and future plans (slide 40-41) Contacts/URLs/Credits (slide 42-43)

    3. printed 7/5/2005 10:33 AM page 3 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt In the Beginning… Organizational structure and funding created a culture fostering stove-pipe, media-specific, heterogeneous systems Lack of information sharing/integration Lack of ability to aggregate information across systems Inability to retrieve data to answer questions Higher cost due to redundant/incompatible data maintenance Lack of technology options to enable integration of systems and data

    4. printed 7/5/2005 10:33 AM page 4 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OLD Approaches to Semantic Integration and Interoperability Option 1: “Thou Shalt” Everyone adopts a single data model for a particular domain Genbank, PDB, HL7 are examples of these sorts of models Option 2: “Multi-party Agreements” Several sites agree on a format for interchanging data Sites maintain a local data dictionary, XML schema, etc. to describe information model Advantages: Ensures interoperability Minimal overhead Disadvantages: Not flexible Does not allow data stores for particular use cases Unrealistic-especially across different large organizations Advantages: Flexible Low Overhead Disadvantages: Works only where existing bilateral (or multilateral) agreements exist Each new node must arrange to be interoperable with all other nodes or node cluster

    5. printed 7/5/2005 10:33 AM page 5 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt New Approaches: for Semantic Integration and Interoperability Option 3 Standards based metadata descriptors “Common Data Elements” Common terms & concepts Provide a complete description of all attributes in a systematic, uniform and unambiguous format Description must be based on a common (but expandable) vocabulary. Rely on concept codes, not concept names Track quality and accessibility Advantages: Provides more ways to surface semantic matches – words and immutable codes Allows new systems to find points of interoperability with all other data systems at once Machine understandable Stable immutable identifiers Low barriers to entry Disadvantages: Requires a very complete description of the contents of an attribute. Some degree of overhead associated with creating and maintaining a compatible system

    6. printed 7/5/2005 10:33 AM page 6 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt How Does Registered Metadata Promote Better Data Management? Provides a model that is consistent/exchangeable for capturing data about data Captures unambiguous semantic information in one place Documents details on the custodian of the data Links directly to online data Reduces risk/cost of duplicating data collections

    7. printed 7/5/2005 10:33 AM page 7 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What is a metadata registry good for? Data administration (design time) Databases, DB applications Messaging systems Terminologies, Taxonomies, Ontologies Data Integration (design + run time) federated queries, data warehousing Discovery of hidden relationships between data Support for interactive users (run time) Data entry forms, output explanation Navigation of databases Semantic Web Services (run time)

    8. printed 7/5/2005 10:33 AM page 8 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt ISO/IEC 11179 MDR Standard Used to record and link: Data elements Data element concepts Conceptual Domains Value Domains: e.g, enumerated value domains Classification Schemes ….. Goals: To record the unambiguous meaning of data elements Human Understandable: Current paradigm is natural language definitions Machine Understandable: Formal definitions (and axioms) coming in Edition 3 (?)

    9. printed 7/5/2005 10:33 AM page 9 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt ISO/IEC 11179 Metadata Registry Standard Spans both: Conceptual models of the real world: Concepts, data element concepts, classification schemes Terminologies, taxonomies, ontologies Information Artifacts Data elements, enumerated values, ... UML models (e.g., in caDSR)

    10. printed 7/5/2005 10:33 AM page 10 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Conceptual vs. Information Centric Metadata Standards

    11. printed 7/5/2005 10:33 AM page 11 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Space of Metadata Standards

    12. printed 7/5/2005 10:33 AM page 12 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Example Users of ISO/IEC 11179 Metadata Registries U.S. EPA: Environmental Data Registry (EDR) System of Registries – need to register lots of “things” National Cancer Institute (NCI): Cancer Data Standards Registry (caDSR) Registration of data elements and domain object models U.S. Veterans Health Administration U.S. Census Bureau U.S. Bureau of Labor Statistics Data Element Concepts, Value Domains Statistics Canada Australian Health Administration Data Elements European Environment Agency …

    13. printed 7/5/2005 10:33 AM page 13 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Evolution of ISO/IEC 11179 Edition 2 used to Register Data Elements: Classification Schemes, Data Element Concepts, Value domains, Object Class, Property, etc… Data element = Data element concept + Value Domain (representation) Representation = data type or code set (enumerated list) Concepts as optional Supported Object Class Not widely utilized

    14. printed 7/5/2005 10:33 AM page 14 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Challenges of ISO/IEC 11179 Metadata Registries Need to easily retrieve semantically related items Need to Support Discovery Enable navigation within and among taxonomies Even when producer and consumer do not share common taxonomy Current Classification Scheme Administered Item not sufficient for registration of taxonomies, ontologies, etc. Need for consistency and richer metamodel Complex Relationships between items

    15. printed 7/5/2005 10:33 AM page 15 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Many taxonomies Purpose of registering taxonomies and ontologies along with data elements, values, etc. in one MDR Provide visibility to manage and consume Reuse Harmonization Support Discovery Enable navigation within and among Even when producer and consumer do not share common taxonomy Ensure availability From the perspective of the CES Metadata Working Group, the primary purpose of a taxonomy is to aid in the discovery process. In an academic paper, we list subject keywords that don’t necessarily appear in the paper to improve the odds that the paper will be discovered by someone who is not familiar with the terminology used in the paper. In that case, the keywords do not specify a context or namespace, so the terms may be ambiguous or overloaded. The Taxonomy Focus Group (of the CES Metadata Working Group) has developed a DoD Core Taxonomy and framework whereby COIs may create (and distinguish) their own taxonomies that can be used by consumers and producers as part of the discovery process.From the perspective of the CES Metadata Working Group, the primary purpose of a taxonomy is to aid in the discovery process. In an academic paper, we list subject keywords that don’t necessarily appear in the paper to improve the odds that the paper will be discovered by someone who is not familiar with the terminology used in the paper. In that case, the keywords do not specify a context or namespace, so the terms may be ambiguous or overloaded. The Taxonomy Focus Group (of the CES Metadata Working Group) has developed a DoD Core Taxonomy and framework whereby COIs may create (and distinguish) their own taxonomies that can be used by consumers and producers as part of the discovery process.

    16. printed 7/5/2005 10:33 AM page 16 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Example: XML Management Challenge XML: One Language, Many Vocabularies XML provides a standard syntax so we can compare and mediate, even when the values are identical. Some fault XML, citing a concern that each developer is creating unique XML vocabularies, even for the same data representation. While that may be true in some circumstances, it is also true that with XML, we now have a common syntax for comparing and migrating representations. The common syntax makes it easier to see the heterogeneity and resolve it or mediate.XML provides a standard syntax so we can compare and mediate, even when the values are identical. Some fault XML, citing a concern that each developer is creating unique XML vocabularies, even for the same data representation. While that may be true in some circumstances, it is also true that with XML, we now have a common syntax for comparing and migrating representations. The common syntax makes it easier to see the heterogeneity and resolve it or mediate.

    17. printed 7/5/2005 10:33 AM page 17 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt *Example of Semantic Integration for Interoperability

    18. printed 7/5/2005 10:33 AM page 18 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt *Concept Use and Integration with 11179 Part 3, Edition 2 We talked about a Data Element being formed by a concept taking on a specific representation. In ISO 11179 terms this translates to the combination of a specific Data Element Concept and a specific Value Domain, you see this denoted in the middle of this chart by the yellow box outlined in red. caDSR administered items are backed by the use of externally defined terminologies and controlled vocabularies. With UMLS as a framework NCI has developed vocabulary services that are accessed via API – application to application interfaces – to provide touch points during creation of content, resulting in administered components that are bound to immutable concept codes. These touchpoints, denoted by the EVS logos, are currently implemented at the Object Class, Property, Representation Term, Value Domain and Valid Value levels of the metadata model. We talked about a Data Element being formed by a concept taking on a specific representation. In ISO 11179 terms this translates to the combination of a specific Data Element Concept and a specific Value Domain, you see this denoted in the middle of this chart by the yellow box outlined in red. caDSR administered items are backed by the use of externally defined terminologies and controlled vocabularies. With UMLS as a framework NCI has developed vocabulary services that are accessed via API – application to application interfaces – to provide touch points during creation of content, resulting in administered components that are bound to immutable concept codes. These touchpoints, denoted by the EVS logos, are currently implemented at the Object Class, Property, Representation Term, Value Domain and Valid Value levels of the metadata model.

    19. printed 7/5/2005 10:33 AM page 19 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Semantic Framework NCI Example The three key components of caCORE are the Enterprise Vocabulary Services, figuratively and literally the foundation of caCORE; the cancer bioinformatics objects, caBIO, providing biomedical data objects implemented in a robust extendable architecture; and the cancer Data Standards Repository, caDSR, providing, among other things, a semantic bridge between the data elements in registered data objects and standard vocabularies and ontologies. NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, downloadable from the NCICB web site. The question is how does all this fit together? The next question is, of course, how do all of these things work together? The three key components of caCORE are the Enterprise Vocabulary Services, figuratively and literally the foundation of caCORE; the cancer bioinformatics objects, caBIO, providing biomedical data objects implemented in a robust extendable architecture; and the cancer Data Standards Repository, caDSR, providing, among other things, a semantic bridge between the data elements in registered data objects and standard vocabularies and ontologies. NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, downloadable from the NCICB web site. The question is how does all this fit together? The next question is, of course, how do all of these things work together?

    20. printed 7/5/2005 10:33 AM page 20 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Where have we been? Where are we now?…& where are we planning to go?

    21. printed 7/5/2005 10:33 AM page 21 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What is XMDR? eXtended MetaData Registries A set of collaborative initiatives by groups with shared goals extend the ISO/IEC 11179 metadata registry standard (XMDR-s) EPA, NCI, DOD, LBNL, Mayo Clinic, USGS, Ecoterm, UNEP, GBIF align & harmonize various related metadata standards (XMDR-h) ISO WG2: 11179, 19763, 20944, 24707; OMG: ODM, CWM; Say which is which (several of the above groups have members on these committees) An open source implementation & testbed (XMDR-it) to assemble & test metadata from diverse sources & structures e.g., terminologies, ontologies, etc. for health, environment, geography, … explore emerging semantic technologies (e.g., RDF, OWL, CL, …) demonstrate new capabilities e.g., ontology lifecycle management & harmonization

    22. printed 7/5/2005 10:33 AM page 22 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Why do we need metadata registry extensions? …in order to Enhance capabilities to capture and retrieve semantics of information artifacts (e.g., data elements and value domains) in metadata registries using terminologies, taxonomies, ontologies, etc. … Improve representation of relationships between data (e.g., objects, data elements & domains) and concept structures (ontologies, taxonomies, thesauri, terminologies, …) Register complex semantic metadata (concept structures, terminologies) in more formal, systematic ways (e.g., description logic) to facilitate machine processing for creating and managing names, definitions, terms, etc. linking together data elements, etc. across multiple systems discovering relationships among data elements & terms

    23. printed 7/5/2005 10:33 AM page 23 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR Semantic Extensions Goals Sharable data that can easily be identified and aggregated across organizations Unambiguous metadata characteristics to convey semantic, syntactic and lexical meaning Human AND Machine understandable Registration and management of everything useful for administering and managing data, including concept systems, ontologies, etc. Machine understanding of semantics to facilitate inference, aggregation, and agent services

    24. printed 7/5/2005 10:33 AM page 24 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Goals of the open source XMDR-it prototype implementation testbed Demonstrate feasibility & utility of proposed revisions to ISO/IEC 11179 Provide open-source reference implementation with XMDR capabilities Determine the necessary features to leverage semantic interoperability between ‘concept’ systems and ‘data elements’ e.g., for ontology lifecycle management & harmonization Explore benefits of representing XMDR content using emerging semantic technologies (e.g., RDF, OWL, CL, …) integrate open source tools to create, maintain, deploy XMDR standards test capabilities and performance of candidate tools Assemble semantic metadata with different structures from diverse sources to test various semantic technologies terminologies, thesauri, ontologies, … From health, environment, geography, … Help resolve registration & harmonization issues for different metadata standards, including ODM & MMF

    25. printed 7/5/2005 10:33 AM page 25 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Role of terminologies and ontologies in metadata registries Sources for concepts, concept definitions, object classes, properties, value meanings, external references Terminologies as classification schemes (e.g., taxonomies) Ontologies to specify semantic relationships is-a, part-of, instance-of, … inheritance permits more compact definitions semantic pathways for indexing facilitates searching subclasses & inverses Frameworks for integration of multiple schemas … Help connect metadata entities via shared terms via automatic indexing of metadata words via text values from specific metadata elements

    26. printed 7/5/2005 10:33 AM page 26 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt What the XMDR Project IS NOT! An attempt to turn 11179 metadata registries into a development and maintenance facility for every type of concept structure An attempt to standardize the complete range of terminology and ontology data & services Production implementation for one organization [any other things we want to disavow?]

    27. printed 7/5/2005 10:33 AM page 27 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it example content has been loaded from diverse sources via lexgrid & XSLT

    28. printed 7/5/2005 10:33 AM page 28 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Additional Metadata Content will be added to XMDR Prototype EDR (EPA Environmental Data Registry) caDSR (NCI Cancer Data Standards Registry) IETF RFC 3066 Language Codes NBII Biocomplexity Thesaurus USGS Geographic Names Information System Getty Thesaurus of Geographic Names I.T.I.S. - Integrated Taxonomic Information System Adult Mouse Anatomy Foundational Model of Anatomy NASA SWEET (Semantic Web Earth & Environmental Terminologies) EPA Chemical Substance Registry GO (Gene Ontology), ….Agrovoc

    29. printed 7/5/2005 10:33 AM page 29 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it now contains an xml file for each 11179 “item” Context for Administered Items e.g., XMDR? Concept Systems e.g., GEMET, DTIC Data Elements e.g., Country Name Data Element Concepts e.g., Country Label Conceptual Domains e.g., Countries of the World Representation Classes e.g., Code Value Domains e.g., countries of the world Relationship Types e.g., ??

    30. printed 7/5/2005 10:33 AM page 30 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Each metadata entity (object, concept, data element) is Logically stored as a separate XML “file/document” Stored in Subversion code management system provides a versioning capability stores “files” in Berkeley DB Berkeley DB provides transactions, backups, ... Compliant with three complementary standards: An XML Schema (document constraints) An RDF Schema (graph constraints) OWL ontology

    31. printed 7/5/2005 10:33 AM page 31 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it XML schema provides a number of important benefits… Schema specifies what is required as well as what is legal Divides metadata into files conforming to XML schema Normalizes data (ala’ relational “one fact in one place”) Facilitates XSLT transformations by reducing degrees of freedom to a canonical encoding within the RDF standard Relax NG used to create and check XMDR-it schema RNG validator enforces many OWL ontology constraints TRang automatically translates into XML schema syntax

    32. printed 7/5/2005 10:33 AM page 32 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt RDF provides complementary benefits on top of XML All the advantages of XML plus … RDF provides more explicit semantics than XML Users can employ a growing set of RDF tools e.g., SPARQL query language, SWRL rule language, Jena inference More powerful retrieval capabilities Using many different RDF graph query tools RDF’s graph data model supports inference e.g., inclusion of subsumed sub-classes Results can be either tuples (ala relational tables) XML/RDF graphs (being developed for W3C’s SPARQL) Facilitates integrated use and management of multiple related concepts spanning different concept systems

    33. printed 7/5/2005 10:33 AM page 33 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OWL ontology specification adds richer semantics atop RDF & XML All the advantages of XML & RDF plus… RNG validator enforces many OWL ontology constraints Classes and subclasses (is-a relationships) Union classes Inverses Same-as, same-property-as, same-class-as Restriction classes (restrict range, cardinality, etc. of property based on type of subject) …and tools for creation, editing, visualization, and management (Protégé & plug-ins)

    34. printed 7/5/2005 10:33 AM page 34 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt OWL, RDF & XML Schema used to specify XMDR-it as UML for 11179-X metamodel

    35. printed 7/5/2005 10:33 AM page 35 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it Architecture: Initial Implemented Modules

    36. printed 7/5/2005 10:33 AM page 36 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-it Advanced Search Interface helps explore registry contents

    37. printed 7/5/2005 10:33 AM page 37 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt Technical Challenges and Issues for XMDR Implementation Testbed Complexity Representation of Relationships XML + RDF + OWL is a lot Scalability & performance Currently includes only 60,000+ objects maybe indexing and/or distributed registries will help? RDF Issues RDF queries yield tuples, not RDF objects (but W3C at work) RDF tools won’t create XMDR files (add wrapper constraints?) User-friendly interface for RDF queries (later) External data sources, ontologies, terminologies Harmonization with ODM and MMF XML/RDF objects results display & browsing Something like EDR UI with link labels & inverse refs

    38. printed 7/5/2005 10:33 AM page 38 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR-s 11179 extensions Challenges and Issues Harominze and align XMDR recommendations with ISO 11179 Metadata Registries (MDR) ISO 19763 Framework for Metamodel Interoperability (MMF) ISO 24707 Common Logic (CL) ISO 20944 Metadata Interoperability and Bindings OMG’s ODM Improve the current Part 3 of ISO 11179 standard Separate registration section of the model so we can register "anything" Simplify mechanisms for registering and using relationships between administered items, along with mechanisms for registering & using ontologies Improve the "classification" region in Part 3, particularly with regard to concepts and relationships

    39. printed 7/5/2005 10:33 AM page 39 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt XMDR Registry From XMDR & MMF meeting

    40. printed 7/5/2005 10:33 AM page 40 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt The Requirements for XMDR (from XMDR & MMF meeting)

    41. printed 7/5/2005 10:33 AM page 41 of 30 Warzel-McCarthy 20050906 file:XMDR_Presentation2.ppt More Information XMDR Web Site http://xmdr.org ISO/IEC 11179 Web site http://www.metadata-standards.org OMG Web Site http://www.omg.org Annual Open Metadata Forum Kobe, Japan, Spring 2006 W3C RDF Access Working Group http://www.w3.org/2001/sw/DataAccess/ Bruce Bargmeyer XMDR Principal Investigator Contact concerning open postion Lawrence Berkeley National Laboratory bebargmeyer@lbl.gov 510-495-2905

More Related