Developing i2b2 ontologies for the long haul
Download
1 / 59

Developing i2b2 Ontologies for the Long Haul - PowerPoint PPT Presentation


  • 457 Views
  • Uploaded on

Developing i2b2 Ontologies for the Long Haul. Lori Phillips, MS Partners HealthCare Systems, Inc April 25, 2012. National Centers for Biomedical Computing. What is i2b2?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Developing i2b2 Ontologies for the Long Haul' - gin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Developing i2b2 ontologies for the long haul

Developing i2b2 Ontologies for the Long Haul

Lori Phillips, MS

Partners HealthCare Systems, Inc

April 25, 2012



What is i2b2
What is i2b2?

  • Software for explicitly organizing and transforming person-oriented clinical data in a way that is optimized for research

    • Allows integration of clinical data, trials data, and genotypic data

  • A portable and extensible application framework

    • Modular software architecture allows additions without disturbing core parts

    • Available as open source at https://www.i2b2.org


Where is it used

Academic Health Centers (does not include AHCs that are part of a CTSA):

Arizona State University

City of Hope, Los Angeles

Georgia Health Sciences University, Augusta

Hartford Hospital, CN 

HealthShare Montana

Massachusetts Veterans Epidemiology Research and Information Center (MAVERICK), Boston

Nemours

Phoenix Children's Hospital

Regenstrief Institute

Thomas Jefferson University

University of Connecticut Health Center

University of Missouri School of Medicine

University of Tennessee Health Sciences Center

Wake Forest University Baptist Medical Center

HMOs:

Group Health Cooperative

Kaiser Permanente 

International:

Georges Pompidou Hospital, Paris, France

Hospital of the Free University of Brussels, Belgium

Inserm U936, Rennes, France

Institute for Data Technology and Informatics (IDI), NTNU, Norway

Institute for Molecular Medicine Finland (FIMM)

Karolinska Institute, Sweden

Landspitali University Hospital, Reykjavik, Iceland

Tokyo Medical and Dental University, Japan

University of Bordeau Segalen, France

University of Erlangen-Nuremberg, Germany

University of Goettingen, Goettingen, Germany

University of Leicester and Hospitals, England (Biomed. Res. Informatics Ctr. for Clin. Sci)

University of Pavia, Pavia, Italy

University of Seoul, Seoul, Korea

Companies:

Johnson and Johnson (TransMART)

GE Healthcare Clinical Data Services

Where is it used?

CTSA’s

  • Boston University

  • Case Western Reserve University (including Cleveland Clinic)

  • Children's National Medical Center (GWU), Washington D.C.

  • Duke University

  • Emory University (including Morehouse School of Medicine and Georgia Tech )

  • Harvard University (includingBeth Israel Deaconness Medical Center, Brigham and Women's Hospital, Children's Hospital Boston, Dana Farber Cancer Center, Joslin Diabetes Center, Massachusetts General Hospital)

  • Medical University of South Carolina

  • Medical College of Wisconsin

  • Oregon Health & Science University

  • Penn State MIlton S. Hershey Medical Center

  • Tufts University

  • University of Alabama at Birmingham

  • University of Arkansas for Medical Sciences

  • University of California Davis

  • University of California, Irvine

  • University of California, Los Angeles*

  • University of California, San Diego*

  • University of California San Francisco

  • University of Chicago

  • University of Cincinnati (including Cinncinati Children's Hospital Medical Center)

  • University of Colorado Denver (including Children's Hospital Colorado)

  • University of Florida

  • University of Kansas Medical Center

  • University of Kentucky Research Foundation

  • University of Massachusetts Medical School, Worcester

  • University of Michigan

  • University of Pennsylvania (including Children's Hospital of Philadelphia)

  • University of Pittsburgh (including their Cancer Institute)

  • University of Rochester School of Medicine and Dentistry

  • University of Texas Health Sciences Center  at Houston

  • University of Texas Health Sciences Center at San Antonio

  • University of Texas Medical Branch (Galveston)

  • University of Texas Southwestern Medical Center at Dallas

  • University of Utah

  • University of Washington

  • University of Wisconsin - Madison (including Marshfield Clinic)

  • Virginia Commonwealth University

  • Weill Cornell Medical College


Why use i2b2
Why use i2b2? of a CTSA):

  • Cohort discovery

    • Enables and simplifies research cohort discovery across an institution’s large, heterogeneous clinical datasets

  • Hypothesis generation

    • Enables and simplifies analysis of data to support a hypothesis

  • Retrospective data analysis

    • Enables the retrospective analysis of data to support/refute claims.


I2b2 workbench
i2b2 Workbench of a CTSA):


Data model
Data Model of a CTSA):

  • FACTS

    • The quantitative or factual data being queried

  • DIMENSIONS

    • Groups of hierarchies and descriptors that define the facts.

  • STAR SCHEMA

    • A single fact table surrounded by numerous dimension tables.


I2b2 star schema
i2b2 Star Schema of a CTSA):


Observation fact table primary keys
Observation (fact table) Primary Keys of a CTSA):

Patient_num Distinct number for every patient

Encounter_num Distinct number for every visit

Concept_cd Distinct code for every concept

Observer_cd Distinct code for every observer

Start_date Date-time observation began

Modifier_cd Code to modify concept_cd

Instance_num Mechanism to group concept modifers


I2b2 fact table
i2b2 Fact Table of a CTSA):

  • In i2b2, an atomic fact is an observation on a patient.

  • Examples of facts

    • Diagnoses

    • Procedures

    • Lab data

    • Medications

    • Genetic data


I2b2 dimension tables
i2b2 Dimension Tables of a CTSA):

  • Dimension tables contain descriptive information about the facts.

  • Examples

    • Concept dimension describes the concepts stored in the concept_cd field.

    • Provider dimension contains information about the observer_cd field

    • Patient dimension contains information about the patient_num field

    • Visit dimension contains information about the encounter_num field

    • Modifier dimension contains information about the modifier_cd field


How does i2b2 use ontologies
How does i2b2 use Ontologies? of a CTSA):

  • By and large, the concepts stored in the fact table come from clinical coding systems or ontologies.

  • Largely dependent on data available to institution

    • Diagnoses ICD9/ICD10/SNOMED

    • Procedures CPT/ICD9

    • Medications NDC/RXNORM

    • Lab results LOINC

    • Molecular/genomic data

    • Custom or project specific data

  • Ontologies are used to organize query terms (and concepts) hierarchically.


Metadata table
Metadata table of a CTSA):

  • Query terms are stored in a separate metadata table.

  • There is a one-to-one mapping of terms in the metadata to concepts in the dimension table.

  • The structure of the metadata table is integral to both the visualization of the query terms (tree) and the query mechanism itself.



I2b2 metadata root level categories
i2b2 Metadata Root Level Categories of a CTSA):

  • Terms with c_hlevel = 1

  • Display name is c_name

  • Icon (folder or container) is determined by c_visualattributes

  • Example c_fullname:

    • \Diagnoses\


Query terms are visualized hierarchically in tree
Query terms are visualized hierarchically in tree of a CTSA):

\Diagnoses\ 1

Respiratory system\ 2

Chronic obstructive diseases\ 3

Emphysema\ 4


Why are hierarchies so important for i2b2

Hierarchies form the basis of both the visualization of the terms and the query mechanism itself.

Why are hierarchies so important for i2b2?

select * from metadata where c_fullname like ‘\Diagnoses\Respiratory system\Chronic obstructive diseases\Emphysema\%’ and c_hlevel = 5


Structure of metadata table1
Structure of Metadata Table terms and the query mechanism itself.


Hierarchies in queries
Hierarchies in queries terms and the query mechanism itself.

select patient_num from observation_fact where concept_cd IN (select concept_cd from concept_dimension where concept_path LIKE '\Diagnoses\Respiratory system\Chronic obstructive diseases\ Emphysema\%')


I2b2 ontologies for the long haul
i2b2 Ontologies for the Long Haul terms and the query mechanism itself.

  • How do I create i2b2 metadata for a known ontology?

    • ICD-10

  • What happens to my legacy clinical data when I have to move to ICD-10?

    • Merging ICD-9 with ICD-10

  • How do I handle genomic metadata?

  • …. Custom metadata?


Ncbo bioportal icd 10
NCBO BioPortal ICD-10 terms and the query mechanism itself.


Building an icd 10 ontology with ncbo services
Building an ICD-10 Ontology with NCBO services terms and the query mechanism itself.

  • Pull data from NCBO via REST services.

  • Reorganize information into i2b2 Metadata format

bioportal/concepts/46302/all

<data>

<pageNum>1</pageNum>

<numPages>1832</numPages>

<pageSize>50</pageSize>

<numResultsPage>50</numResultsPage>

<numResultsTotal>91590</numResultsTotal>

<contents class="org.ncbo.stanford.bean.concept. ClassBeanResultListBean">

<classBeanResultList>

<classBean>

<id>0-ICD10CM</id>

<fullId>http://purl.bioontology.org/

ontology/ICD10CM/0-ICD10CM</fullId>

<label>ICD-10-CM TABULAR LIST of DISEASES and INJURIES</label>

<type>class</type>

<relations>

<entry>

<string>ChildCount</string>

<int>0</int>

</entry> ……


Primary challenges
Primary challenges terms and the query mechanism itself.

  • i2b2 Metadata depends upon hierarchical information

    • c_fullname, c_tooltip maintain the hierarchy from root to leaves

Diseases of the respiratory system \

Chronic lower respiratory diseases \

Emphysema


Challenges
Challenges.. terms and the query mechanism itself.

  • NCBO REST service that enables pull of concepts includes immediate parent/child info only

    • Hierarchy must be computed

  • <data>

  • <classBean>

  • <id>J43</id>

  • <label>Emphysema</label>

  • <relations>

  • <entry>

  • <string>SuperClass</string>

  • <list>

  • <classBean>

    • <id>J40-J47</id>

    • <label>Chronic lower respiratory diseases</label>

    • </classBean>

    • </list>

    • </entry>

    • </relations>

    • </classBean>

  • </data>


Ncbo extraction workflow
NCBO Extraction workflow terms and the query mechanism itself.

Extraction

Workflow

Request to extract ontology

NCBOREST

XML

ICD-10

Process

Extracted

Data

i2b2

Metadata


Extracted icd 10 terms
Extracted ICD-10 terms terms and the query mechanism itself.


Released deliverables
Released deliverables terms and the query mechanism itself.

https://community.i2b2.org/wiki/display/NCBO


What about my legacy icd 9 data
What about my legacy ICD-9 data? terms and the query mechanism itself.

  • Ideally we would like an i2b2 ontology that integrates ICD-9 into ICD-10.


Mapping tool
Mapping Tool terms and the query mechanism itself.

  • Tool to verify/(re)assign ontology mappings.


Navigating the mapping tool tree
Navigating the Mapping Tool Tree terms and the query mechanism itself.

  • Displays terms mapped from one ontology within hierarchy of another

  • Mapped terms are displayed adjacent to terms they are mapped to and appear in bold


Adding a new mapping
Adding a new mapping terms and the query mechanism itself.

  • ICD9:269.3, Mineral deficiency should appear for ICD10:E63 Other nutritional deficiencies

  • Copy term ICD9:269.3


Adding a new mapping1
Adding a new mapping terms and the query mechanism itself.

  • Paste onto ICD10:E63 Other nutritional deficiencies


Move a mapping
Move a mapping terms and the query mechanism itself.

  • Ascorbic acid deficiency (ICD9:267) can be moved down one level to Ascorbic acid deficiency (ICD10:E54)

  • Drag and drop down the term one level.


Unmap a mapping
Unmap a mapping terms and the query mechanism itself.

ICD9:416.8 Other chronic pulmonary heart diseases appears in two places: the one attached to ICD10:I27.2 appears incorrect and can be unmapped.


The unmapped terms list
The Unmapped Terms List terms and the query mechanism itself.

  • Free form list of terms to be mapped

  • Locate term you wish to map to in the hierarchy tree. Drag from table to term in the tree.

  • If you make a mistake you can either reassign the mapped term within the tree or unmap it from tree.

  • Unmap will cause it to reappear in the unmapped terms list if the term has no other mappings.


Assigning an unmapped term
Assigning an unmapped term terms and the query mechanism itself.

  • Drag from unmapped terms list

  • Drop onto term we are mapping to


Unmapping a term
Unmapping a term terms and the query mechanism itself.

  • Drag term from tree

  • Drop onto unmapped terms list


Search unmapped terms by name
Search Unmapped Terms By Name terms and the query mechanism itself.


Search unmapped terms by code
Search Unmapped Terms by Code terms and the query mechanism itself.


Mapped terms viewer
Mapped Terms Viewer terms and the query mechanism itself.


Search mapped terms by code
Search Mapped Terms By Code terms and the query mechanism itself.


Search mapped terms by name
Search Mapped Terms By Name terms and the query mechanism itself.


Merging ontologies
Merging Ontologies terms and the query mechanism itself.

  • Mapping tool provides a visualization of what the merged ontologies would look like

  • What if we could extract a single metadata table from this?


Integration tool

Request to integrate terms and the query mechanism itself.

Integration

Workflow

MapperCell

ICD9 into ICD-10

For each mapped ICD-9

terms, compute ICD-10

hierarchy

ICD-10 merged

with ICD9 terms

Mapped ICD-9 terms

Integration tool


How to handle genomic data
How to handle genomic data terms and the query mechanism itself.

  • Ability to organize the variants for ease of navigation

    • Needs may differ between geneticist, physician, research scientist

  • Ability to query for the variant in the workbench

    • Genomic labs may report data differently

    • Define the variant so it may be reliably identified over time

    • Implication is that the identifier for the variant does not change over time or is maintainable.


How to reliably identify a genomic variant
How to (reliably) identify a genomic variant? terms and the query mechanism itself.

Chr location,

Nucleotide subst ?

HGVS

Name ?

Gene name +

flanking

sequences ?

All of them??

RS

# ?


Rs number
RS number terms and the query mechanism itself.

Uniquely identifies a variant over time ….but….

Novel variants may not have rs number

User may not want to submit to dbSNP


Gene name flanking sequences
Gene name + flanking sequences terms and the query mechanism itself.

Not guaranteed if gene has several isoforms

EGFR


Hgvs name
HGVS Name terms and the query mechanism itself.

Uniquely identifies variant within a referenced and versioned accession and details the nucleotide substitution.

NM_005228.3:c.2155G>T

Nucleotide

substitution

RefSeq accession Position

Coding DNA


Is there a common denominator in all of this
Is there a common denominator in all of this? terms and the query mechanism itself.

Yes … all ultimately describe variant location on a chromosome.

Nucleotide substitution defines the physical manifestation of the variant.

WE PROPOSE:

HGVS name (n/t subst, positional info)

Flanking sequences (a way to verify positional info)

AS A WAY TO UNEQUIVOCALLY EQUATE TWO VARIANTS

ACROSS DOMAINS

ACROSS VERSIONS


Structure of metadata table2
Structure of Metadata Table terms and the query mechanism itself.


Genomic metadataxml record
Genomic MetadataXML record terms and the query mechanism itself.

GenomicMetadata

Version 1.0

ReferenceGenomeVersion hg18

SequenceVariant

HGVSName NM_0005228.3:c.2155G>T

SystematicName c.2155G>T

SystematicNameProtein p.Glu719Cys

AaChange missense

DnaChange substitution

SequenceVariantLocation

GeneName EGFR

FlankingSeq_5 GAATTCAAAAAGATCAAAGTGCTG

FlankingSeq_3 GCTCCGGTGCGTTCGGCACGGTGT

RegionType exon

RegionName Exon 18

Accessions

Accession

Name NM_005228

Type mrna (NCBI)

Accession

Name NP_005219

Type protein (NCBI)

Accession

Name NT_004487

Type contig (NCBI)

ChromosomeLocation

Chromosome chr7

Region 7p12

Orientation +


Organizational challenges
Organizational challenges terms and the query mechanism itself.

By Disease?

By Gene?


Combining equivalent terms
Combining equivalent terms terms and the query mechanism itself.


How to handle custom local metadata
How to handle custom (local) metadata terms and the query mechanism itself.

  • Edit Tool ideal for creating small, non-standard ontology for a local project.

  • Consider the case for classifying patients as smokers, non-smokers or smoking status unknown

  • The Custom Metadata folder is designed for use with the creation of local terms.


Create a smoking status folder
Create a “Smoking status” folder terms and the query mechanism itself.


Populate folder with smoker non smoker etc
Populate folder with “Smoker”, “Non-smoker”, etc terms and the query mechanism itself.


Smoking status custom metadata
Smoking status custom metadata terms and the query mechanism itself.


www.i2b2.org terms and the query mechanism itself.

https://community.i2b2.org/wiki

http://bioportal.bioontology.org


ad