square wheels electronic medical records for discovery research in rheumatoid arthritis
Download
Skip this Video
Download Presentation
Square wheels: electronic medical records for discovery research in rheumatoid arthritis

Loading in 2 Seconds...

play fullscreen
1 / 64

Square wheels: electronic medical records for discovery research in rheumatoid arthritis - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

Square wheels: electronic medical records for discovery research in rheumatoid arthritis. ^ genetic. Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored " Using EHR Data for Discovery Research ". HARVARD MEDICAL SCHOOL. Key questions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Square wheels: electronic medical records for discovery research in rheumatoid arthritis' - calvin-michael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
square wheels electronic medical records for discovery research in rheumatoid arthritis
Square wheels: electronic medical records for discovery research in rheumatoid arthritis

^

genetic

Robert M. Plenge, M.D., Ph.D.

October 30, 2009

NCRR sponsored "Using EHR Data for Discovery Research"

HARVARD

MEDICAL SCHOOL

key questions
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

key questions1
Key questions

How can I implement your approach, and how much better is it?

slide5

genotype

phenotype

clinical care

slide6

genotype

bottleneck

phenotype

clinical care

slide7

October 2009: >30 RA risk loci

Together explain ~35% of the genetic burden of disease

REL

BLK

TAGAP

CD28

TRAF6

PTPRC

FCGR2A

PRDM1

CD2-CD58

CD40

CCL21

CD244

IL2RB

TNFRSF14

PRKCQ

PIP4K2C

IL2RAAFF3

TNFAIP3

STAT4

TRAF1-C5

IL2-IL21

HLA

DR4

“shared epitope”

hypothesis

PADI4

PTPN22

CTLA4

2009

1978

1987

2003

2004

2005

2007

2008

Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci

Raychaudhuri et al in press Nature Genetics

slide8

genotype

phenotype

bottleneck

clinical care

genetic predictors of response to anti tnf therapy in ra
Genetic predictors of response to anti-TNF therapy in RA

PTPRC/CD45 allele

n=1,283 patients

P=0.0001

Submitted to Arth & Rheum

content of emrs
Content of EMRs

EMRs are increasingly utilized!

  • Narrative data = free-form written text
    • info about symptoms, medical history, medications, exam, impression/plan
  • Codified data = structured format
    • age, demographics, and billing codes
slide14

This is not a new idea…

Sens: 89%

PPV: 57%

Gabriel (1994) Arthritis and Rheumatism

slide15

…but EMR data are “dirty”

Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis.

Gabriel (1994) Arthritis and Rheumatism

slide19

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets

our library of ra phenotypes
Our library of RA phenotypes

Qing Zeng

  • Natural language processing (NLP)
    • disease terms (e.g., RA, lupus)
    • medications (e.g., methotrexate)
    • autoantibodies (e.g., CCP, RF)
    • radiographic erosions
  • Codified data
    • ICD9 disease codes
    • prescription medications
    • laboratory autoantibodies
our library of ra phenotypes1
Our library of RA phenotypes

Shawn Murphy

  • Natural language processing (NLP)
    • disease terms (e.g., RA, lupus)
    • medications (e.g., methotrexate)
    • autoantibodies (e.g., CCP, RF)
    • radiographic erosions
  • Codified data
    • ICD9 disease codes
    • prescription medications
    • laboratory autoantibodies
optimal algorithm to classify ra nlp codified data
‘Optimal’ algorithm to classify RA: NLP + codified data

Codified data

NLP data

Regression model with a penalty parameter (to avoid over-fitting)

Tianxi Cai, Kat Liao

high ppv with adequate sensitivity
High PPV with adequate sensitivity

✪392 out of 400 (98%) had definite or possible RA!

this means more patients
This means more patients!

~25% more subjects with the complete algorithm:

3,585 subjects (3,334 with true RA)

3,046 subjects (2,680 with true RA)

slide25

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

slide26

Linking the Datamart-Crimson

NLP data

Codified data

status of i2b2 crimson collection
Status of i2b2 Crimson collection

genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute

  • Over 3,000 samples collected to date
    • cost = $10 per sample
  • DNA extracted on >2,400 Buffy coats
    • cost = $20 per sample
    • >90% had ≥1 ug of DNA
    • >99% had ≥5 ug of DNA after WGA
status of i2b2 crimson collection1
Status of i2b2 Crimson collection

stay tuned…more data soon!

  • Measured autoantibodies from plasma
    • 5 autoantibodies in ~380 RA patients
    • ~85% are CCP+, ~35% ANA+, ~15% TPO+
  • Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls?
key questions2
Key questions

How can I implement your approach, and how much better is it?

key questions3
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

key questions4
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

regulatory obstacles
Regulatory obstacles

IRB approval

De-identified vs truly anonymous

Open question: sharing of genetic data

key questions5
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?

resources required
Resources required
  • Building a research DataMart
    • clinical EMR ≠ research EMR
    • multiple FTE’s to build/maintain
  • NLP expertise
    • open-source software available
    • iterative process for fine-tuning
  • Clinical expertise
    • understand nature of clinical data
resources required cont
Resources required (cont.)
  • Statistical expertise
    • simple algorithm is not sufficient
    • prepare for the unexpected!
    • true for narrative and codified
  • Biospecimen collection, DNA extraction
    • varies by institution
    • Crimson
    • Broad Institute
key questions6
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?

slide37

4 million patients

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets

clinical features of patients
Clinical features of patients

CCP has an OR = 1.5 for predicting erosions

subset patients in clinically meaningful ways causes of mortality
Subset patients in clinically meaningful ways: causes of mortality

NLP+codified data, together with statistical modeling, to define cardiovascular disease

non responder to anti tnf therapy
Non-responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment response

responder to anti tnf therapy
Responder to anti-TNF therapy

NLP+codified data, together with statistical modeling, to define treatment response

post marketing surveillance of adverse events
Post-marketing surveillance of adverse events

pharmacovigilance

NLP+codified data, together with statistical modeling, to define treatment response

options for clinical dna1
Options for clinical + DNA

Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.

options for clinical dna2
Options for clinical + DNA

Conclusion: We can collect DNA and plasma in a high-throughput manner.

options for clinical dna3
Options for clinical + DNA

Conclusion: The cost is reasonable...even for >20,000 RA patients!

slide47

genotype

phenotype

clinical care

acknowledgments
Acknowledgments

Zak Kohane

Susanne Churchill

Vivian Gainer

Kat Liao

Tianxi Cai

Shawn Murphy

Qing Zing

Soumya Raychaudhuri

Beth Karlson

Pete Szolovits

Lee-Jen Wei

Lynn Bry (Crimson)

Sergey Goryachev

Barbara Mawn

& many others !

Namaste!

slide50

Narrative data (NLP text extractions)

Codified data (ICD9 codes, etc)

identifying ra patients in our i2b2 ra datamart
Identifying RA patients in our i2b2 RA DataMart

1993

2008

Signs and symptoms

Diseases that mimick RA

Medications specific to RA

Notes (including whether seen by a rheumatologist)

diagnostic codes for RA

Shawn Murphy, Vivian Gainer, others

slide54

Identifying RA patients in our i2b2 RA DataMart

1993

2008

signs and symptoms c/w RA

RA without other diseases

Specific RA meds, including MTX

Seen by rheumatology

Many diagnostic codes for RA

probability of ra all 31k subjects
Probability of RA: all 31K subjects

not RA

RA (n=3,585)

Frequency

Probability of RA

roc curves for algorithms
ROC curves for algorithms

97% specificity

sensitivity

codified + NLP

NLP only

codified only

1 - specificity

classification of ra cases and not ra
Classification of RA cases (and not RA)

1.00

???

0.80

0.60

Probability RA

0.40

threshold

0.29

0.20

0.00

possible

Yes RA

Not RA

diagnosis ankylosing spondylitis but many ra codes
Diagnosis = Ankylosing Spondylitis (but many RA codes)

Probability RA = 0.78

A few signs and symptoms c/w RA

NLP with few mentions of RA

Specific meds

Visits to

BWH/MGH

diagnostic codes for RA

diagnosis jra but many ra codes
Diagnosis = JRA (but many RA codes)

signs and symptoms c/w RA

NLP with “RA”

and “JRA”

Specific meds

Visits to the RA Center at BWH

Many diagnostic codes for RA

diagnosis not clear initially
Diagnosis not clear initially…

Probability RA = 0.33

signs and symptoms c/w RA

NLP without much “RA”, few specific meds (MTX x 1)

…and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center

diagnosed in 1992 little follow up
Diagnosed in 1992, little follow-up

Probability RA = 0.11

For some reason few RA diagnostic codes

slide64

Medications: codified data vs. NLP

Enbrel (etanercept)codified: 1,628

NLP: 3,796

overlap: 1,612 (99%)

Note: review of 50 NLP

occurrences shows that

38 out of 50 actively on Enbrel

ad