1 / 41

Exchanging Components In Health Metadata Registries Denise Warzel

Exchanging Components In Health Metadata Registries Denise Warzel NCI Center for Biomedical Informatics And Information Technology (CBIIT) May 21, 2008. “Metadata DownUnder” : 11th Open Forum on Metadata Registries Sydney, NSW Australia.

vadin
Download Presentation

Exchanging Components In Health Metadata Registries Denise Warzel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exchanging Components In Health Metadata Registries Denise Warzel NCI Center for Biomedical Informatics And Information Technology (CBIIT) May 21, 2008 “Metadata DownUnder”:11th Open Forum on Metadata Registries Sydney, NSW Australia Sharing and advancing knowledge and experience about standards, technologies and implementations. 

  2. ‘But I don't want to go among mad people,’ Alice remarked. ‘Oh, you can't help that,’ said the Cat: ‘we're all mad here. I'm mad.  You're mad.’‘How do you know I'm mad?’ said Alice. Alice in Wonderland – Lewis Caroll ‘You must be,’ said the Cat, ‘or you wouldn't have come here.’

  3. Outline • Background /Overview of caCORE • Leveraging 11179 metadata • Drivers behind the need for standards to enable discovery and exchange of metadata components *Disclaimer: - Not particularly about Health Metadata Registries, that’s just the business NCI and our ‘trading partners’ are in - Not particularly about “Exchanging Components” but about leveraging metadata to Discover and reuse components

  4. NCI caCORE Program Area • National Cancer Institute (NCI) • One of the 14 US National Institutes or Health • Mission: end pain, suffering and death due to Cancer • caCORE • cancer Common Ontologic Research Environment • Software, services and methodologies that underpin application development at NCI

  5. Origins of caBIG™ Community • Need: Enable independent investigators and research teams nationwide to combine and leverage findings and expertise from disparate sources. • Strategy: Create scalable, actively managed organization that will connect members of the NCI-supported cancer enterprise by building a biomedical informatics network

  6. caCORE Background • caCORE infrastructure provides the technical underpinnings supporting NCI’s award winning caBIG™ Program • Computerworld 21st Century Award for Science for our approach to developing integrated applications   • http://www.cwhonors.org/case_studies/NationalCancerInstitute.pdf • Bio-IT World’s Editor’s Choice Award 2008 for the implementation of an open source network to speed cancer research • http://cabig.cancer.gov/media/links/May_08/award.asp

  7. Model Driven (MDA) Agile software development Small Functional Releases RUP methodology: Inception, Elaboration, Construction, Transition Object Oriented Open Source Semantic Services Oriented Architecture (SOA) Semantically aware application Programming Interfaces, web services Unified Modeling Language (UML) Business process models Use Case development Domain Modeling describes high level reusable classes of data XML Schema structured syntax for object representation ISO 11179 Metadata Registry structured syntax for data semantics Controlled vocabularies – concept binding for data semantics caCORE Principals

  8. S E C U R I T Y caCORE Components Domain Objects Common Data Elements Enterprise Vocabulary

  9. Biomedical Information Objects caCORE Infrastrucutre Components Security Verify Credentials Public APIs Domain Object Metadata Scientific Research Common Data Elements MetaData Standards Repository Clinical Trials Common Data Elements legacy data Enterprise Vocabulary Services Dictionary, thesaurus services Vocabulary for CDE Specification

  10. Public APIs Domain object metadata Common data elements Common data elements (CDEs) Vocabulary for CDE specification Dictionary, thesaurus services caCORE System Generation

  11. Key Aspects of the caCORE Semantic Infrastructure • Use of 11179 for describing Data • a predictable information model for describing data semantics (11179 Object Class and Property) • Use of Controlled Terminology to support data descriptions (semantics) • These two characteristics of the caCORE infrastructure allow us to programmatically detect when two metadata items are semantically the same, and promote reuse, which enhances interoperability of systems using these items

  12. How? Utilizing Semantic Integration for Interoperability C1708 Drug/Agent nSCNumber Agent Drug name name id id = nSCNumber = C1708:C41243 NDCCode NDCCode CTEPName approvalDate approvalDate FDAIndID approver approver IUPACName fdaCode = C1708:C41243 CTEPName FDAIndID - Text Names are inadequate for identifying semantically similar concepts or for performing cross Object ‘joins’ – opaque/immutable concept identifiers are needed – based on controlled vocabularies IUPACName

  13. Object Oriented Information Systems Start with an Information Model – a Class Diagram Interoperability by Objects and Common Metadata Elements

  14. Object Oriented Information Systems Annotate with Controlled Vocabulary FDA Interoperability by Objects and Common MetaData Elements

  15. Object Oriented Information Systems Compare information model based on concept annotations Interoperability by Objects and Common Metadata Elements

  16. What are we registering in 11179 Registry? • Information Model • Primarily derived from a class diagram, but could come from a Data Entry Form, or a items classified in a Classification Scheme • Object Class, Property, Data Element Concept, Value Domain, Value Meaning, Conceptual Domain, Data Element, Concept Reference (more about this) • NCI Extensions: • Expanded Relation Model for Object Classes – Object Class Relationships • Concept References

  17. 11179 NCI Extensions • Expanded 11179 “Relationship” model for Object Classes <–> Object Class Relationships • cardinality, directionality, source and target role names, association name, isArray • So we could register class<->class associations • Concept References • The caDSR system can be used to bind any Administered Item to Controlled Terminology concepts to anchor the 11179 Semantic components: • Concept Derivation rule: Primary Concept + 1 or more Qualifiers • Capture the Concept Unique Identifier, Preferred Name, Definition, Source • Object Class • Property • Value Meanings • Value Domains • Conceptual Domains (less formally)

  18. 11179 Enabling characteristics: Classification Schemes used to Organize, Communicate and Discover ‘Usage’

  19. Enabling: NCI Classification Scheme “Types” CS Type • = “Project”  UML Domain Models • = “Classification”  Simple SKOS category • = “Analytical Service”  caGrid analytical service • = “Data Service”  caGrid data service • = “Container”  Holds other Classification Schemes CSI Type • = “UML Package”  holds classes in a domain model • CSI = “Disease Category”  different disease areas ie. Lung, Breast, etc.

  20. Enabling: Organization of meta data elements Template and Generic CDEs

  21. What ‘s driving the need forexchange outside NCI’s systems? • caBIG™ information technology program which includes the registration of Data Elements in caDSR has brought exponential growth of cancer research and health software solutions based on registered 11179 administered items. • Simultaneously, adoption of data element semantics conforming to 11179 has increased globally

  22. Semanticinteroperability Syntacticinteroperability We want to maintain Interoperability Interoperability: ability of a system to access and use the parts or equipment of another system (based on ISO/IEEE definition for interoperability)

  23. What do people want to share? • Discovery and reuse 11179 registered meta data elements • Parts of 11179 registered meta data elements e.g. value domains • Tell others about our 11179 meta data elements • Create and reuse mappings and transformations between value domains (similar to “Shims”) • Terminology • Classification Schemes • Forms • etc

  24. MDR Futures How will we support this? ‘Federation’ of Semantic Metadata Registries

  25. Semantic Metadata Registries

  26. NCI Semantic Metadata Registry (sMDR) Futures

  27. How will we compare the semanticsof our Information Models so we can share? • 11179 Edition 2 • Compare DEC identifier? Names? • Object Class identifier? Name? • Value Domains? • 11179 Edition 3 • Concept References? • What about detecting similarities of other types of registered metadata objects in the sMDR?

  28. So how do application developers do this? I quite agree with you,' said the Duchess; `and the moral of that is --Be what you would seem to be— or if you'd like it put more simply --Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise.‘ `I think I should understand that better,' Alice said very politely, `if I had it written down: but I can't quite follow it as you say it.'

  29. “the associations I am able to discern are dependent on all the associations I have ever been able to realize in the past.” Attribution: ????

  30. What to do? • 11179 Edition 2 • It value for NCI is that it provides a predictable information model by which to compare semantics across meta data elements: Object Class and Property, Value Meanings • Binding these items to controlled terminology provides even greater ‘computability’ • Are there other ways to bind the semantics of meta data elements to an information model that will provide similar capabilities? • Quite possibly!

  31. What to do? • Lots of things are needed to create a interoperable systems • Syntax and Semantics • Syntax: Standard Schema? • Standard binding to ontology/registry items? ModelReference to Registry items • Semantic Annotations for WSDL and XML Schema • W3C Recommendation 28 August 2007 • http://www.w3.org/TR/2007/REC-sawsdl-20070828/ • Semantics: Registration of concept bindings to meta data elements via 11179 Edition 3?

  32. Back to caCORE-Like System Verify Credentials Public APIs Terminology Node 1) Information Model Information Model Scientific Research Data Standards Repository Clinical Trials legacy data Enterprise Vocabulary Services Vocabulary for CDE Specification Model Annotations

  33. Near Term Semantic Metadata Needs/Challenges • 11179 Metadata • Standard Classification Scheme vocabulary • E.g. hasA, isA, broader, narrower, etc. • Standard Relationship metamodel and vocabulary • Clarification of Best Practices for use of 11179 • E.g. Conceptual Domains = collection of Semantically Equivalent Value Domains • Standards for registering the association to an information model • Ability to specify reuse of item registered by another metadata registry – without copying the item into your registry

  34. Near Term Semantic Metadata Needs/Challenges • Services Metadata • How do we programmatically identify services that contain content that meets our needs? Such as other 11179 Registries? • E.g. myGrid – BioCatalogue; caGrid – Service Index • Form Metadata • How do we find particular type of form or survey instrument (e.g. CAPI), for a particular purpose? • E.g. caDSR Forms Catalog; CancerGrid Clinical Trials Forms • Terminology Metadata • How do we find terminology databases to use for a specific purpose?

  35. Near Term Semantic Metadata Needs/Challenges • Standard Registry Service Specification • CTS II for metadata registries? • E.g. • Registration Services • Shopping Cart Services • Subscription and Notification Services

  36. Thanks To Collaborators caBIG Community University of Oxford - CancerGrid Jim Davies – Professor, Software Engineering Programme Director Steve Harris - Research Officer, Database Design Ekagra Technologies – NCI sMDR Futures JJ Maurer – Principal Consultant Denis Avdic – Senior Architect

  37. Extra Slides

  38. caCORE Tooling • The NCI provides freely available enabling technology for caBIG compatibility • These technologies are distributed under a ‘non-viral’ open source license. • caCORE • Enterprise Vocabulary Services (EVS) • Cancer Data Standards Repository (caDSR) • caCORE Software Development Kit • When complete process is followed, the outcome is a caBIG ‘Silver’ compliant data system. • Common Security Module • Common Logging Module • Schema Generation • Generate XML Schema • Generate Castro File • Generate Code • Generate Pojos • Generate Hibernate Mapping Files

  39. caDSR Tools • CDE Browser to Search for, View and Download • Form Builder to Create user specified collections of CDEs • Skip patterns, repeating groups, default values • Side-by-Side Compare • UML Model Browser to View and manage UML Model metadata • CDE Curation Tool to Create Data Elements • Admin Toolto Administer caDSR and curate content - “Power Users” • Sentinel Tool to Generate end user ‘Alerts’ triggered by metadata changes • Semantic Integration Workbench – Semantic Integration/Annotation Tools, annotate, transform and register metadata • Batch Loadto import Administered Items • Excel Loader (MS Excel) • Semantic Integration Workbench • UML Model Loader (XMI) • Case Report Form Loader (MS Excel) Access, Develop, Manage, Consume

  40. caCORE Software Developers Toolkit (SDK) •  Open Source Download • Programmers Guide

  41. Open Source CollaborationNCI’s Gforge site

More Related