1 / 34

Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI. - An Overview -. Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute. Outline:. Terminology management and semantic integration at NCI NCI Enterprise Vocabulary Services

ivrit
Download Presentation

Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An Overview - Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

  2. Outline: • Terminology management and semantic integration at NCI • NCI Enterprise Vocabulary Services • NCI Thesaurus (NCIt) • NCI Metathesaurus (NCI Meta) • Collaborations

  3. NCI biomedical informatics • Goal: A virtual web of interconnected data, individuals, and organizations redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise

  4. Interoperability Courtesy: Charlie Mead • in·ter·op·er·a·bil·i·ty • ability of a system...to use the parts or equipment of another systemSource: Merriam-Webster web site • interoperability • ability of two or more systems or components to exchange information and to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE, 1990] Syntacticinteroperability Semanticinteroperability

  5. No Controlled Terminology?No Interoperability • Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaning • Terminology services provide token and codes • Proper use of them assures consistent meaning across the enterprise

  6. Public APIs Domain object metadata Common data elements Common data elements (CDEs) Can it be done?caCORE - An Example Dictionary, thesaurus, ontology services via caBIO API Vocabulary for CDE specification via downloads

  7. cancer Common Ontologic Representation Environment (caCORE) • Information integration • Cross-discipline reasoning biomedical objects common data elements controlled vocabulary

  8. Common Data Elements • Structured data reporting elements • Precisely defining the questions and answers • What question are you asking, exactly? • What are the possible answers, and what do they mean? biomedical objects common data elements controlled vocabulary

  9. Biomedical Information Objects • Data service infrastructure developed using OMG’s Model Driven Architecture approach • Object models expressed in UML represent actual biomedical research entities such as genes, sequences, chromosomes, sequences, cellular pathways, ontologies, clinical protocols, etc. • The object models form the basis for uniform APIs (Java, SOAP, HTTP-XML, Perl) that provide an abstraction layer and interfaces for developers to access information without worrying about the back-end data stores biomedical objects common data elements controlled vocabulary

  10. Binding Data, Metadata to Terminology - caCORE SDK • UML Modeling Tool (provided by user) • Information model that will define data classes, attributes and relationships • Semantic Connector • Annotate UML model with ontology concepts: bridges the world of databases to that of structured semantics. • UML Loader (run by NCI staff) • Loads model into the caDSR metadata registry • Model and associated semantics are available at runtime • Code Generator • Model and a code template are inputs into generator • Creates the ‘caCORE-like’ n-tier software system with Java and Web Services APIs

  11. caCORE SDK

  12. Extending Interoperability Beyond the Enterprise • cancer Biomedical Informatics Grid (caBIG) • Common, widely distributed infrastructure permits cancer research community to focus on innovation • Shared vocabulary, data elements, data models facilitate information exchange • Collection of interoperable applications developed to common standard • Raw cancer research data is available for mining and integration

  13. caBIG - facilitate sharing of infrastructure, applications, and data

  14. OTHER TOOLKITS NCI OTHER caBIG SERVICE PROVIDERS Cancer Center Cancer Center caGrid Cancer Center Cancer Center Cancer Center

  15. caGRID caBIO Other caBIG DataResource … caARRAY rProteomics Other caBIG Analysis tool • Data source exposed as objects • Well-defined objects using caDSR / EVS • Mobius GME for schemas • Metadata identifies services, objects exposed, relationships between objects, relationships between services • Standard Grid interfaces • Standard query language and interface • Advertisement and Discovery • Security • Invocation / Schedule • Execution / coordination Resource API caBIG Dataresource GRAM OGSA-DAI caBIG Analytical Service Security Identifiers caDSR EVS Query Invocation Globus Registry Grid client API GUI Admin

  16. Common Metadata describes generic information about service providing Cancer Center Data Service Metadata describes the data exposed using terminology and objects from caDSR/EVS Analytical Service Metadata describes the supported operations and their inputs and outputs using terminology and objects from caDSR/EVS caGrid Standard Service Metadata Data Service Metadata Common Service Metadata Analytical Service Metadata

  17. Enterprise Vocabulary • NCI Metathesaurus (Cross-map standard vocabularies/ontologies, e.g. SNOMED, MedDRA, ICD) • Semantic integration, inter-vocabulary mapping • UMLS Metathesaurus extended with cancer-oriented vocabularies • 930,000 Concepts, 2,200,000 terms and phrases • Mappings among over 50 vocabularies • NCI Thesaurus • Description logic-based • 48,000 “Concepts” • Concept is the semantic unit • Terms are Concept labels – synonymy • Semantic relationships between Concepts • Other standard terminologies • MedDRA, MGED, SNOMED, GO, etc. biomedical objects common data elements controlled vocabulary

  18. NCI builds on EVS via caCORE Infrastructure

  19. Production EVS Serversin caCORE

  20. Enterprise Vocabulary Services • Services and resources that address NCI's needs for controlled vocabulary http://www.nci.nih.gov/EVS • A collaboration • NCI Office of Communications • Physician Data Query (PDQ), Cancer Information Service and the NCI web portal www.cancer.gov • NCI Center for Bioinformatics • Bioinformatics Core Infrastructure (caCORE), including metadata repository (caDSR) and object models built using EVS terminology for core semantics

  21. NCI EVS Goal – Integration by Meaning • Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to: • Integrate different conceptual frameworks • Create terminological and taxonomic conventions across systems • Vocabulary Products • NCI Thesaurus – an ontology-like terminology • NCI Metathesaurus – maps vocabularies • External vocabularies maintained and served: MedDRA, HL7, NDF-RT, LOINC, etc.

  22. TerminologyDevelopment Guidelines • Develop a content model • Leverage existing sources where appropriate • (VA NDF-RT, RxNorm, LOINC, etc. …) • Develop unique content where needed • (Cancer genes and diagnoses, drugs and therapies, molecular abnormalities, clinical trial standard terminology etc.) • Link to other information sources and standards using URLs as possible • (GO, Swissprot, drug formularies, trial protocols) • Federate, merge or map with other standard terminology for semantic integration

  23. NCI Thesaurus (NCIt) • Reference Terminology for NCI, Partners • A Federal Standard Terminology • Broad coverage of the cancer research and clinical domain including prevention and treatment trials • Neoplastic and other Diseases • Findings and Abnormalities • Anatomy, Tissues, Subcellular Structures • Agents, Drugs, Chemicals • Genes, Gene Products, Biological Processes • Animal Models – Mouse, other • Research techniques and management, apparatus, clinical and lab, radiology, imagery

  24. NCI Thesaurus (2) • Published Monthly • Public domain, open content license • Available on-line and by download (OWL, Ontylog XML, flat files) • 48,000+ “Concepts” hierarchically organized • Description-logic based • “Roles” establish machine readable semantic relationships between Concepts, ex.: “Carcinoma” Clinically_associated_with “Lytic Bone Lesions,” “TP53” Gene_associated_with_Disease “Breast Carcinoma”

  25. NCI Thesaurus is Deployed: http://nciterms.nci.nih.gov http://www.nci.nih.gov/EVS (full documentation) • API: caCORE public access • Fulfills NCI and collaborators’ needs for controlled vocabulary • Public domain, open content license

  26. Example Concept Details Concept Details URI: http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C19151 Version: August 2005 (05.09e) Metastasis Identifiers:  name   Metastasis  code   C19151 Relationships to other concepts:  Biological_Process_Has_Result_Biological_Process Tumor Expansion   Biological_Process_Has_Initiator_Process Pathologic Process Information about this concept:  Synonym MET   Synonym metastasis Synonym Tumor Cell Migration Synonym with source data Metastasis|PT|CADSR   Synonym with source data MET|AB|CADSR   Synonym with source data Tumor Cell Migration|SY|NCI Synonym with source data Metastasis|PT|NCI Synonym with source data metastasis|SY|NCI-GLOSS|CDR0000046710 NCI_META_CUI CL001192 Semantic_Type Phenomenon or Process   Related_Lash_Concept metastasis   Preferred_Name Metastasis    DEFINITION NCI|Metastasis is the spread or migration of cancer cells from one part of the body (the organ in which it first appeared) to another. The secondary tumor contains cells that are like those in the original (primary) tumor. For example, breast cancer cells may spread (metastasize) to the lungs and cause the growth of a new tumor. When this happens, the disease is called metastatic breast cancer. (NCI)  Synonym Metastasis   DEFINITION NCI-GLOSS|(meh-TAS-ta-sis) The spread of cancer from one part of the body to another. A tumor formed from cells that have spread is called a secondary tumor, a metastatic tumor, or a metastasis. The secondary tumor contains cells that are like those in the original (primary) tumor. The plural form of metastasis is metastases (meh-TAS-ta-seez).    Superconcepts: Cancer Progression Subconcepts: Distant Metastasis Intravascular Metastasis

  27. Other Examples : Use URI to view Details of a Drug Concept- http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C620 Use GUI to search for and view hierarchy Http://nciterms.nci.nih.gov Fluvastatin Sodium

  28. NCI Metathesaurus: • Filtered UMLS Metathesaurus extended with additional required vocabularies • 930,000+ concepts, 2,200,000 terms and phrases with definitions • Mappings among over 50 vocabularies • Extensive synonymy: Over 40,000 terms for neoplasms mapped to 7,000 concepts • Used as online dictionary and thesaurus, for mapping and document indexing

  29. NCI Metathesaurus (2) • Minor releases monthly, Major releases twice a year • Provides a mapped overlap and partial inter-relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)

  30. EVS Products & Services Are Open • NCI Thesaurus is Open Contnentftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTermsofUse.htm • NCI Metathesaurus is Mostly Open Source See Each Source’s Licensehttp://ncimeta.nci.nih.gov/MetaServlet/GenerateSourcesServlet • NCI EVS Servers Are Freely Accessible • On the Web: • Via API: • All Software Developed by NCI EVS is Public Open Source and Free for the Asking: http://nciterms.nci.nih.govandhttp://ncimeta.nci.nih.gov http://ncicb.nci.nih.gov/core/caBIO http://ncicb.nci.nih.gov/core

  31. EVS Collaborations • Many Active Collaborations • Federal: FDA, VA, CDC, and Various NIH Institutes such as NHLBI, NIDCR • Major Standards Organizations: HL7, CDISC, W3C, FHA • Cancer Centers and Cancer Cooperative Groups (caBIG, caGRID) • Numerous Research collaborators such as the Microarray Gene Expression Data Society (MGED Ontology, FuGO)

  32. Areas of Collaboration • FDA (Terminology for Drugs, Devices, and Clinical Trial Terminology Initiatives) • VA (Drugs, Common Clinical Trials Semantics, Terminology Operations) • CDC (Cancer Incidence and Prevention, Terminology Operations) • Cancer Centers (Clinical Trials, Experimental Organism Terminology, Micro- nutrients, Open Terminology Servers, other (caBIG)) • CDISC/HL7 RCRIM (Clinical Research Data Standards)

  33. Contact:Frank Hartel, PhDNCI Center for Bioinformaticshartel@mail.nih.gov

More Related