Square wheels electronic medical records for discovery research in rheumatoid arthritis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

Square wheels: electronic medical records for discovery research in rheumatoid arthritis PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

Square wheels: electronic medical records for discovery research in rheumatoid arthritis. ^ genetic. Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored " Using EHR Data for Discovery Research ". HARVARD MEDICAL SCHOOL. Key questions.

Download Presentation

Square wheels: electronic medical records for discovery research in rheumatoid arthritis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Square wheels: electronic medical records for discovery research in rheumatoid arthritis

^

genetic

Robert M. Plenge, M.D., Ph.D.

October 30, 2009

NCRR sponsored "Using EHR Data for Discovery Research"

HARVARD

MEDICAL SCHOOL


Key questions

Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


Key questions1

Key questions

How can I implement your approach, and how much better is it?


Square wheels electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical care


Square wheels electronic medical records for discovery research in rheumatoid arthritis

genotype

bottleneck

phenotype

clinical care


Square wheels electronic medical records for discovery research in rheumatoid arthritis

October 2009: >30 RA risk loci

Together explain ~35% of the genetic burden of disease

REL

BLK

TAGAP

CD28

TRAF6

PTPRC

FCGR2A

PRDM1

CD2-CD58

CD40

CCL21

CD244

IL2RB

TNFRSF14

PRKCQ

PIP4K2C

IL2RAAFF3

TNFAIP3

STAT4

TRAF1-C5

IL2-IL21

HLA

DR4

“shared epitope”

hypothesis

PADI4

PTPN22

CTLA4

2009

1978

1987

2003

2004

2005

2007

2008

Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci

Raychaudhuri et al in press Nature Genetics


Square wheels electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

bottleneck

clinical care


Genetic predictors of response to anti tnf therapy in ra

Genetic predictors of response to anti-TNF therapy in RA

PTPRC/CD45 allele

n=1,283 patients

P=0.0001

Submitted to Arth & Rheum


How can we collect dna and detailed clinical data on 20 000 ra patients

How can we collect DNA and detailed clinical data on >20,000 RA patients?


What are the options for collecting clinical data and dna for genetic studies

What are the options for collecting clinical data and DNA for genetic studies?


Options for clinical dna

Options for clinical + DNA


Content of emrs

Content of EMRs

EMRs are increasingly utilized!

  • Narrative data = free-form written text

    • info about symptoms, medical history, medications, exam, impression/plan

  • Codified data = structured format

    • age, demographics, and billing codes


Square wheels electronic medical records for discovery research in rheumatoid arthritis

This is not a new idea…

Sens: 89%

PPV: 57%

Gabriel (1994) Arthritis and Rheumatism


Square wheels electronic medical records for discovery research in rheumatoid arthritis

…but EMR data are “dirty”

Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis.

Gabriel (1994) Arthritis and Rheumatism


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: 4 million patients


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: linked by EMR


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Partners HealthCare: organized by i2b2


Square wheels electronic medical records for discovery research in rheumatoid arthritis

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets


Our library of ra phenotypes

Our library of RA phenotypes

Qing Zeng

  • Natural language processing (NLP)

    • disease terms (e.g., RA, lupus)

    • medications (e.g., methotrexate)

    • autoantibodies (e.g., CCP, RF)

    • radiographic erosions

  • Codified data

    • ICD9 disease codes

    • prescription medications

    • laboratory autoantibodies


Our library of ra phenotypes1

Our library of RA phenotypes

Shawn Murphy

  • Natural language processing (NLP)

    • disease terms (e.g., RA, lupus)

    • medications (e.g., methotrexate)

    • autoantibodies (e.g., CCP, RF)

    • radiographic erosions

  • Codified data

    • ICD9 disease codes

    • prescription medications

    • laboratory autoantibodies


Optimal algorithm to classify ra nlp codified data

‘Optimal’ algorithm to classify RA: NLP + codified data

Codified data

NLP data

Regression model with a penalty parameter (to avoid over-fitting)

Tianxi Cai, Kat Liao


High ppv with adequate sensitivity

High PPV with adequate sensitivity

✪392 out of 400 (98%) had definite or possible RA!


This means more patients

This means more patients!

~25% more subjects with the complete algorithm:

3,585 subjects (3,334 with true RA)

3,046 subjects (2,680 with true RA)


Square wheels electronic medical records for discovery research in rheumatoid arthritis

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Linking the Datamart-Crimson

NLP data

Codified data


Status of i2b2 crimson collection

Status of i2b2 Crimson collection

genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute

  • Over 3,000 samples collected to date

    • cost = $10 per sample

  • DNA extracted on >2,400 Buffy coats

    • cost = $20 per sample

    • >90% had ≥1 ug of DNA

    • >99% had ≥5 ug of DNA after WGA


Status of i2b2 crimson collection1

Status of i2b2 Crimson collection

stay tuned…more data soon!

  • Measured autoantibodies from plasma

    • 5 autoantibodies in ~380 RA patients

    • ~85% are CCP+, ~35% ANA+, ~15% TPO+

  • Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls?


Key questions2

Key questions

How can I implement your approach, and how much better is it?


Key questions3

Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


Key questions4

Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?


Regulatory obstacles

Regulatory obstacles

IRB approval

De-identified vs truly anonymous

Open question: sharing of genetic data


Key questions5

Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?


Resources required

Resources required

  • Building a research DataMart

    • clinical EMR ≠ research EMR

    • multiple FTE’s to build/maintain

  • NLP expertise

    • open-source software available

    • iterative process for fine-tuning

  • Clinical expertise

    • understand nature of clinical data


Resources required cont

Resources required (cont.)

  • Statistical expertise

    • simple algorithm is not sufficient

    • prepare for the unexpected!

    • true for narrative and codified

  • Biospecimen collection, DNA extraction

    • varies by institution

    • Crimson

    • Broad Institute


Key questions6

Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


Square wheels electronic medical records for discovery research in rheumatoid arthritis

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets


Clinical features of patients

Clinical features of patients

CCP has an OR = 1.5 for predicting erosions


Subset patients in clinically meaningful ways causes of mortality

Subset patients in clinically meaningful ways: causes of mortality

NLP+codified data, together with statistical modeling, to define cardiovascular disease


Non responder to anti tnf therapy

Non-responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment response


Responder to anti tnf therapy

Responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment response


Post marketing surveillance of adverse events

Post-marketing surveillance of adverse events

pharmacovigilance

NLP+codified data, together with statistical modeling, to define treatment response


Conclusions

Conclusions


Options for clinical dna1

Options for clinical + DNA

Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.


Options for clinical dna2

Options for clinical + DNA

Conclusion: We can collect DNA and plasma in a high-throughput manner.


Options for clinical dna3

Options for clinical + DNA

Conclusion: The cost is reasonable...even for >20,000 RA patients!


Square wheels electronic medical records for discovery research in rheumatoid arthritis

genotype

phenotype

clinical care


Acknowledgments

Acknowledgments

Zak Kohane

Susanne Churchill

Vivian Gainer

Kat Liao

Tianxi Cai

Shawn Murphy

Qing Zing

Soumya Raychaudhuri

Beth Karlson

Pete Szolovits

Lee-Jen Wei

Lynn Bry (Crimson)

Sergey Goryachev

Barbara Mawn

& many others !

Namaste!


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Narrative data (NLP text extractions)

Codified data (ICD9 codes, etc)


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Run specific queries


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Visualize results in a timeline


Identifying ra patients in our i2b2 ra datamart

Identifying RA patients in our i2b2 RA DataMart

1993

2008

Signs and symptoms

Diseases that mimick RA

Medications specific to RA

Notes (including whether seen by a rheumatologist)

diagnostic codes for RA

Shawn Murphy, Vivian Gainer, others


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Identifying RA patients in our i2b2 RA DataMart

1993

2008

signs and symptoms c/w RA

RA without other diseases

Specific RA meds, including MTX

Seen by rheumatology

Many diagnostic codes for RA


Probability of ra all 31k subjects

Probability of RA: all 31K subjects

not RA

RA (n=3,585)

Frequency

Probability of RA


Roc curves for algorithms

ROC curves for algorithms

97% specificity

sensitivity

codified + NLP

NLP only

codified only

1 - specificity


Other algorithms to classify ra

NLP Only

Other algorithms to classify RA

Codified only

Portability!


Classification of ra cases and not ra

Classification of RA cases (and not RA)

1.00

???

0.80

0.60

Probability RA

0.40

threshold

0.29

0.20

0.00

possible

Yes RA

Not RA


Diagnosis ankylosing spondylitis but many ra codes

Diagnosis = Ankylosing Spondylitis (but many RA codes)

Probability RA = 0.78

A few signs and symptoms c/w RA

NLP with few mentions of RA

Specific meds

Visits to

BWH/MGH

diagnostic codes for RA


Diagnosis jra but many ra codes

Diagnosis = JRA (but many RA codes)

signs and symptoms c/w RA

NLP with “RA”

and “JRA”

Specific meds

Visits to the RA Center at BWH

Many diagnostic codes for RA


Diagnosis not clear initially

Diagnosis not clear initially…

Probability RA = 0.33

signs and symptoms c/w RA

NLP without much “RA”, few specific meds (MTX x 1)

…and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center


Now the false negatives

Now the false negatives…


Diagnosed in 1992 little follow up

Diagnosed in 1992, little follow-up

Probability RA = 0.11

For some reason few RA diagnostic codes


Square wheels electronic medical records for discovery research in rheumatoid arthritis

Medications: codified data vs. NLP

Enbrel (etanercept)codified: 1,628

NLP: 3,796

overlap: 1,612 (99%)

Note: review of 50 NLP

occurrences shows that

38 out of 50 actively on Enbrel


  • Login