Square wheels electronic medical records for discovery research in rheumatoid arthritis
Download
1 / 64

Square wheels: electronic medical records for discovery research in rheumatoid arthritis - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Square wheels: electronic medical records for discovery research in rheumatoid arthritis. ^ genetic. Robert M. Plenge, M.D., Ph.D. October 30, 2009 NCRR sponsored " Using EHR Data for Discovery Research ". HARVARD MEDICAL SCHOOL. Key questions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Square wheels: electronic medical records for discovery research in rheumatoid arthritis' - calvin-michael


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Square wheels electronic medical records for discovery research in rheumatoid arthritis
Square wheels: electronic medical records for discovery research in rheumatoid arthritis

^

genetic

Robert M. Plenge, M.D., Ph.D.

October 30, 2009

NCRR sponsored "Using EHR Data for Discovery Research"

HARVARD

MEDICAL SCHOOL


Key questions
Key questions

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


Key questions1
Key questions

How can I implement your approach, and how much better is it?


genotype

phenotype

clinical care


genotype

bottleneck

phenotype

clinical care


October 2009: >30 RA risk loci

Together explain ~35% of the genetic burden of disease

REL

BLK

TAGAP

CD28

TRAF6

PTPRC

FCGR2A

PRDM1

CD2-CD58

CD40

CCL21

CD244

IL2RB

TNFRSF14

PRKCQ

PIP4K2C

IL2RAAFF3

TNFAIP3

STAT4

TRAF1-C5

IL2-IL21

HLA

DR4

“shared epitope”

hypothesis

PADI4

PTPN22

CTLA4

2009

1978

1987

2003

2004

2005

2007

2008

Latest GWAS in 25,000 case-control samples with replication in 20,000 additional samples: >10 new loci

Raychaudhuri et al in press Nature Genetics


genotype

phenotype

bottleneck

clinical care


Genetic predictors of response to anti tnf therapy in ra
Genetic predictors of response to anti-TNF therapy in RA

PTPRC/CD45 allele

n=1,283 patients

P=0.0001

Submitted to Arth & Rheum



What are the options for collecting clinical data and dna for genetic studies

What are the options for collecting clinical data RA patients?and DNA for genetic studies?



Content of emrs
Content of EMRs RA patients?

EMRs are increasingly utilized!

  • Narrative data = free-form written text

    • info about symptoms, medical history, medications, exam, impression/plan

  • Codified data = structured format

    • age, demographics, and billing codes


This is not a new idea… RA patients?

Sens: 89%

PPV: 57%

Gabriel (1994) Arthritis and Rheumatism


…but EMR data are “ RA patients?dirty”

Conclusion: The sole reliance on such databases for the diagnosis of RA can result in substantial misdiagnosis.

Gabriel (1994) Arthritis and Rheumatism


Partners HealthCare: RA patients?4 million patients


Partners HealthCare: RA patients?linked by EMR


Partners HealthCare: RA patients?organized by i2b2


4 million patients RA patients?

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets


Our library of ra phenotypes
Our library of RA phenotypes RA patients?

Qing Zeng

  • Natural language processing (NLP)

    • disease terms (e.g., RA, lupus)

    • medications (e.g., methotrexate)

    • autoantibodies (e.g., CCP, RF)

    • radiographic erosions

  • Codified data

    • ICD9 disease codes

    • prescription medications

    • laboratory autoantibodies


Our library of ra phenotypes1
Our library of RA phenotypes RA patients?

Shawn Murphy

  • Natural language processing (NLP)

    • disease terms (e.g., RA, lupus)

    • medications (e.g., methotrexate)

    • autoantibodies (e.g., CCP, RF)

    • radiographic erosions

  • Codified data

    • ICD9 disease codes

    • prescription medications

    • laboratory autoantibodies


Optimal algorithm to classify ra nlp codified data
‘Optimal’ algorithm to classify RA: RA patients?NLP + codified data

Codified data

NLP data

Regression model with a penalty parameter (to avoid over-fitting)

Tianxi Cai, Kat Liao


High ppv with adequate sensitivity
High PPV with adequate sensitivity RA patients?

✪392 out of 400 (98%) had definite or possible RA!


This means more patients
This means more patients! RA patients?

~25% more subjects with the complete algorithm:

3,585 subjects (3,334 with true RA)

3,046 subjects (2,680 with true RA)


4 million patients RA patients?

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA


Linking the Datamart-Crimson RA patients?

NLP data

Codified data


Status of i2b2 crimson collection
Status of RA patients?i2b2 Crimson collection

genotyping of 384 SNPs (RA risk alleles, AIMs, other) is ongoing at Broad Institute

  • Over 3,000 samples collected to date

    • cost = $10 per sample

  • DNA extracted on >2,400 Buffy coats

    • cost = $20 per sample

    • >90% had ≥1 ug of DNA

    • >99% had ≥5 ug of DNA after WGA


Status of i2b2 crimson collection1
Status of RA patients?i2b2 Crimson collection

stay tuned…more data soon!

  • Measured autoantibodies from plasma

    • 5 autoantibodies in ~380 RA patients

    • ~85% are CCP+, ~35% ANA+, ~15% TPO+

  • Question: are non-RA autoantibodies present at increased frequency in RA patients vs matched controls?


Key questions2
Key questions RA patients?

How can I implement your approach, and how much better is it?


Key questions3
Key questions RA patients?

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


Key questions4
Key questions RA patients?

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?


Regulatory obstacles
Regulatory obstacles RA patients?

IRB approval

De-identified vs truly anonymous

Open question: sharing of genetic data


Key questions5
Key questions RA patients?

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your fields that would represent the most rapid payoff on investment?


Resources required
Resources required RA patients?

  • Building a research DataMart

    • clinical EMR ≠ research EMR

    • multiple FTE’s to build/maintain

  • NLP expertise

    • open-source software available

    • iterative process for fine-tuning

  • Clinical expertise

    • understand nature of clinical data


Resources required cont
Resources required (cont.) RA patients?

  • Statistical expertise

    • simple algorithm is not sufficient

    • prepare for the unexpected!

    • true for narrative and codified

  • Biospecimen collection, DNA extraction

    • varies by institution

    • Crimson

    • Broad Institute


Key questions6
Key questions RA patients?

What are the regulatory obstacles impacting your work?

What are the resource needs required to replicate your work at other institutions?

What are the priority short term "translational" questions in your field that would represent the most rapid payoff on investment?


4 million patients RA patients?

ICD9 RA and/or CCP checked

(goal = high sensitivity)

31,171 patients

Classification algorithm

(goal = high PPV)

3,585

RA patients

Discarded blood

for DNA

Clinical subsets


Clinical features of patients
Clinical features of patients RA patients?

CCP has an OR = 1.5 for predicting erosions


Subset patients in clinically meaningful ways causes of mortality
Subset patients in clinically meaningful ways: RA patients?causes of mortality

NLP+codified data, together with statistical modeling, to define cardiovascular disease


Non responder to anti tnf therapy
Non-responder to anti-TNF therapy RA patients?

NLP+codified data, together with statistical modeling, to define treatment response


Responder to anti tnf therapy
Responder to anti-TNF therapy RA patients?

NLP+codified data, together with statistical modeling, to define treatment response


Post marketing surveillance of adverse events
Post-marketing surveillance of adverse events RA patients?

pharmacovigilance

NLP+codified data, together with statistical modeling, to define treatment response


Conclusions

Conclusions RA patients?


Options for clinical dna1
Options for clinical + DNA RA patients?

Conclusion: NLP + codified data, together with appropriate statistical modeling, can yield accurate clinical data.


Options for clinical dna2
Options for clinical + DNA RA patients?

Conclusion: We can collect DNA and plasma in a high-throughput manner.


Options for clinical dna3
Options for clinical + DNA RA patients?

Conclusion: The cost is reasonable...even for >20,000 RA patients!


genotype RA patients?

phenotype

clinical care


Acknowledgments
Acknowledgments RA patients?

Zak Kohane

Susanne Churchill

Vivian Gainer

Kat Liao

Tianxi Cai

Shawn Murphy

Qing Zing

Soumya Raychaudhuri

Beth Karlson

Pete Szolovits

Lee-Jen Wei

Lynn Bry (Crimson)

Sergey Goryachev

Barbara Mawn

& many others !

Namaste!


Narrative data (NLP text extractions) RA patients?

Codified data (ICD9 codes, etc)


Run specific queries RA patients?



Identifying ra patients in our i2b2 ra datamart
Identifying RA patients in our RA patients?i2b2 RA DataMart

1993

2008

Signs and symptoms

Diseases that mimick RA

Medications specific to RA

Notes (including whether seen by a rheumatologist)

diagnostic codes for RA

Shawn Murphy, Vivian Gainer, others


Identifying RA patients in our i2b2 RA DataMart RA patients?

1993

2008

signs and symptoms c/w RA

RA without other diseases

Specific RA meds, including MTX

Seen by rheumatology

Many diagnostic codes for RA


Probability of ra all 31k subjects
Probability of RA: RA patients?all 31K subjects

not RA

RA (n=3,585)

Frequency

Probability of RA


Roc curves for algorithms
ROC curves for algorithms RA patients?

97% specificity

sensitivity

codified + NLP

NLP only

codified only

1 - specificity


Other algorithms to classify ra

NLP Only RA patients?

Other algorithms to classify RA

Codified only

Portability!


Classification of ra cases and not ra
Classification of RA cases (and not RA) RA patients?

1.00

???

0.80

0.60

Probability RA

0.40

threshold

0.29

0.20

0.00

possible

Yes RA

Not RA


Diagnosis ankylosing spondylitis but many ra codes
Diagnosis = Ankylosing Spondylitis RA patients?(but many RA codes)

Probability RA = 0.78

A few signs and symptoms c/w RA

NLP with few mentions of RA

Specific meds

Visits to

BWH/MGH

diagnostic codes for RA


Diagnosis jra but many ra codes
Diagnosis = JRA (but many RA codes) RA patients?

signs and symptoms c/w RA

NLP with “RA”

and “JRA”

Specific meds

Visits to the RA Center at BWH

Many diagnostic codes for RA


Diagnosis not clear initially
Diagnosis not clear initially… RA patients?

Probability RA = 0.33

signs and symptoms c/w RA

NLP without much “RA”, few specific meds (MTX x 1)

…and few diagnostic codes for RA, despite multiple LMR notes, including visits to the BWH Arthritis Center



Diagnosed in 1992 little follow up
Diagnosed in 1992, little follow-up RA patients?

Probability RA = 0.11

For some reason few RA diagnostic codes


Medications: RA patients?codified data vs. NLP

Enbrel (etanercept)codified: 1,628

NLP: 3,796

overlap: 1,612 (99%)

Note: review of 50 NLP

occurrences shows that

38 out of 50 actively on Enbrel


ad