Machine Learning
Download
1 / 80

Electronic Medical Record - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Machine Learning for Healthcare David Page Dept. of Biostatistics & Medical Informatics and Dept. of Computer Sciences University of Wisconsin-Madison. PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Electronic Medical Record' - ember


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Machine Learningfor HealthcareDavid PageDept. of Biostatistics & Medical Informaticsand Dept. of Computer SciencesUniversity of Wisconsin-Madison


Electronic medical record

PatientID Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic

P1 2/1/03 Jones fever, aches influenza

PatientID Gender Birthdate

P1 M 3/22/63

PatientID Date Lab Test Result

PatientID SNP1 SNP2 … SNP500K

P1 AA AB BB

P2 AB BB AA

P1 1/1/01 blood glucose 42

P1 1/9/01 blood glucose 45

PatientID Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

Electronic Medical Record


Predictive personalized medicine

Individual Patient

G + C + E

Personalized

Treatment

Predictive Model for

Disease Susceptibility

& Treatment

Response

State-of-the-Art

Machine

Learning

Genetic,

Clinical,

&

Environmental

Data

Repeat for thousands of patients

Predictive PersonalizedMedicine

Repeat for hundreds of diseases and treatments


Estimation of the warfarin dose with clinical and pharmacogenetic data
Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data

  • International WarfarinPharmacogenetics Consortium

  • (IWPC)

  • NEJM, February 19, 2009, vol. 360, no. 8


Motivation
Motivation

  • “In Milestone, FDA Pushes Genetic Tests Tied to Drug”

  • Where: Front-page article, Wall Street Journal, August 16, 2007

  • Why: FDA released new warfarin product labeling with pharmacogenomics dosing recommendations

  • What:New pharmacogenetics section and changes in initial dosage section with pharmacogentics in the warnings section

  • http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf


In milestone fda pushes genetic tests tied to drug
“In Milestone, FDA Pushes Genetic Tests Tied to Drug”

Initial dosing (warfarin package insert)

“The dosing of COUMADIN must be individualized according to patient’s sensitivity to the drug as indicated by the PT/INR….. It is recommended that COUMADIN therapy be initiated with a dose of 2 to 5 mg per day with dosage adjustments based on the results of PT/INR determinations.The lower initiation doses should be considered for patients with certain genetic variations in CYP2C9 and VKORC1 enzymes as well as for elderly and/or debilitated patients….”

http://www.fda.gov/cder/foi/label/2007/009218s105lblv2.pdf


Clinicians responses to fda labeling change for warfarins
Clinicians’ responses to FDA labeling change for warfarins

  • How, exactly, would I use this information?

  • Nice science, but prove to me that it’s better than what we already do

    • i.e., I have to see a randomized trial comparing genotype-guided versus usual dosing

    • Summer 2009: the NHLBI Clarification of Optimal Anticoagulation through Genetics (COAG) trial (PI: Stephen Kimmel, MD)


Current warfarin pharmacogenetics information limitations
Current warfarinpharmacogenetics information limitations

  • Clinical utility (or a randomized trial) will require dosing equation that incorporates genetic and non-genetic, demographic information.

  • Numerous such equations have been proposed, but:

    • most are highly geographically confined

    • none were developed from robust data in Asians, Caucasians, and Africans

  • Thus, an equation derived from a large, geographically and ethnically diverse population was needed to help insure global clinical utility.


  • Iwpc 21 research groups
    IWPC - 21 research groups

    4 continents and 9 countries

    • Asia

      • Israel, Japan, Korea, Taiwan, Singapore

    • Europe

      • Sweden, United Kingdom

    • North America

      • USA (11 states: Alabama, California, Florida, Illinois, Missouri, North Carolina, Pennsylvania, Tennessee, Utah, Washington, Wisconsin)

    • South America

      • Brazil


    Dataset
    Dataset

    • 5,700 patients treated with warfarin

    • Demographic characteristics

    • Primary indication for warfarin treatment

    • Stable therapeutic dose of warfarin

    • Treatment INR

    • Target INR

      • 5,052 patients with a target INR of 2-3

    • Concomitant medications

      • Grouped by increased or decreased effect on INR

    • Presence of genotype variants

      • CYP2C9(*1, *2 and *3)

      • VKORC1 (one of seven SNPs in linkage disequilibrium)

        • blinded re-genotyping for quality control










    Modeling of vkorc1 snps
    Modeling of VKORC1 SNPs

    • Missing values of VKORC1 -1639 G>A (rs9923231)

      • Imputed based on race and VKORC1 SNP data at 2255C>T (rs2359612), 1173 C>T (rs9934438), or 1542G>Crs8050894

      • If the VKORC1 genotype could not be imputed, it was treated as “missing” (a distinct variable) in the model.


    Data analysis methodology
    Data Analysis Methodology

    • Derivation Cohort

      • 4,043 patients with a stable dose of warfarin and target INR of 2-3 mg/week

      • Used for developing dose prediction models

        Validation Cohort

      • 1,009 patients (20% of dataset)

      • Used for testing final selected model

        Analysis group did not have access to validation set until after the final model was selected


    Real valued prediction methods used
    Real-valued prediction methods used

    • Included, among others

      • Support vector regression

      • Regression trees

      • Model trees

      • Multivariate adaptive regression splines

      • Least-angle regression

      • Lasso

      • Logarithmic and square-root transformations

      • Direct prediction of dose

        Support vector regression and Ordinary least-squares linear regression gave the lowest mean absolute error

      • Predicted the square root of the dose

      • Incorporated both genetic and clinical data


    Iwpc pharmacogenetic dosing algorithm
    IWPC pharmacogenetic dosing algorithm

    • **The output of this algorithm must be squared to compute weekly dose in mg

    • ^All references to VKORC1 refer to genotype for rs9923231


    Iwpc clinical dosing algorithm
    IWPC clinical dosing algorithm

    • **The output of this algorithm must be squared to compute weekly dose in mg


    Results
    Results

    Inclusion of genotypes for CYP2C9 and VKORC1, in addition to clinical variables, are significantly closer to estimating the appropriate initial dose of warfarin than just a clinical or fixed-dose approach

    46.2% of the population with ≤21 mg/wk or ≥49 mg/wk benefit the most

    • These are the patients for whom an underdose or overdose could have adverse clinical consequences.

      Patients requiring an intermediate dose are likely to obtain little benefit including genotypes



    Warfarin doses predicted for the clinical and pgx algorithms with and without amiodarone
    Warfarin doses predicted for the clinical and PGx algorithms with and without amiodarone

    50 yr old

    White

    Male

    175 cm

    80 kg

    Genotypes can change the recommended dose from

    >45 mg/wk to <10 mg/wk when all other factors equal!


    Warfarin doses predicted for the clinical and pgx algorithms based on race and genotype
    Warfarin doses predicted for the clinical and PGx algorithms based on race and genotype

    50 yr old

    Male

    175 cm

    80 kg

    Racial differences in the estimated dose are insignificant when

    genotypes included. Clinical algorithm may substantially overestimate or underestimate the dose.


    Patients with dose estimates within 20 of actual dose
    % Patients with dose estimates within 20% of actual dose

    • Comparison of PGx, clinical

    • and fixed dose approaches

    • 3 dose groups shown (mg/wk)

      • low (≤21)

      • intermediate (>21 to <49)

      • high (≥49)

    • Fixed dose (35 mg/wk)

      • None of the estimates for

    • low and high dose groups were

    • within 20% of actual dose


    Limitations of this study
    Limitations of this study

    • Did not address the issue of whether a precise initial dose of warfarin translates into

      • improved clinical end points reduction in time needed to achieve a stable therapeutic INR, fewer INRs out of range, reduced incidence of bleeding or thromboembolic events

    • Did not have sufficient data across the 21 groups to include potentially important factors such as

      • smoking status, vitamin K intake, alcohol consumption, other genetic factors (e.g., CYP4F2, ApoE, GGCX), environmental factors


    New England Journal of Medicine, Feb 2009

    Data available at PharmGKB

    • www.pharmgkb.org

      • Accession number: PA162355460


    Iwpc authors

    Writing committee: Teri E. Klein, Russ B. Altman, Niklas Eriksson, Brian F. Gage, Stephen E. Kimmel, Ming-Ta M. Lee, Nita A. Limdi, David Page, Dan M. Roden, Michael J. Wagner, Michael D. Caldwell, Julie A. Johnson

    Data Contributors:

    Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Yuan-Tsong Chen

    Chang Gung Memorial Hospital, Chang Gung University, Taiwan, ROC: Ming-ShienWen

    China Medical University, Graduate Institute of Chinese Medical Science, Taichung, Taiwan, ROC: Ming-Ta M. Lee

    Hadassah Medical Organization, Israel: YosephCaraco, IditAchache, SimhaBlotnick, MordechaiMuszkat

    Inje University, Korea: Jae-Gook Shin, Ho-Sook Kim

    InstitutoNacional de Câncer, Brazil: Guilherme Suarez-Kurtz, Jamila Alessandra Perini

    InstitutoNacional de CardiologiaLaranjeiras, Brazil: Edimilson Silva-Assunção

    Intermountain Healthcare, USA: Jeffrey L. Anderson, Benjamin D. Horne, John F. Carlquist

    Marshfield Clinic, USA: Michael D. Caldwell, Richard L. Berg, James K. Burmester

    National University Hospital, Singapore: Boon Cher Goh, Soo-Chin Lee

    Newcastle University, United Kingdom: FarhadKamali, Elizabeth Sconce, Ann K. Daly

    University of Alabama, USA: Nita A. Limdi

    University of California, San Francisco, USA: Alan H.B. Wu

    University of Florida, USA: Julie A. Johnson, Taimour Y. Langaee, HuaFeng

    University of Illinois, Chicago, USA: Larisa Cavallari, Kathryn Momary

    University of Liverpool, United Kingdom: MunirPirmohamed, Andrea Jorgensen, Cheng HokToh, Paula Williamson

    University of North Carolina, USA: Howard McLeod, James P. Evans, Karen E. Weck

    University of Pennsylvania, USA: Stephen E. Kimmel, Colleen Brensinger

    University of Tokyo and RIKEN Center for Genomic Medicine, Japan: Yusuke Nakamura, Taisei Mushiroda

    University of Washington, USA: David Veenstra, Lisa Meckley, Mark J. Rieder, Allan E. Rettie

    Uppsala University, Sweden: Mia Wadelius, Niclas Eriksson, HåkanMelhus

    Vanderbilt University, USA: C. Michael Stein, Dan M. Roden, Ute Schwartz, Daniel Kurnik

    Washington University in St. Louis, USA: Brian F. Gage, Elena Deych, Petra Lenzini, Charles Eby

    Wellcome Trust Sanger Institute, United Kingdom: Leslie Y. Chen, PanosDeloukas

    IWPC Authors

    Statistical Analysis:

    University of Alabama, USA: Nita A. Limdi

    Marshfield Clinic, USA: Michael D. Caldwell

    North Carolina State University, USA: Alison Motsinger-Reif

    Stanford University, USA: Russ B. Altman, HershSagrieya, Teri E. Klein, Balaji S. Srinivasan

    Uppsala University, Uppsala Clinical Research Center, Sweden: Niclas Eriksson

    University of California, San Francisco, USA: Alan H.B. Wu

    University of North Carolina, USA: Michael J. Wagner

    University of Florida, USA: Julie A. Johnson

    University of Pennsylvania, USA: Stephen E. Kimmel

    University of Wisconsin-Madison, USA: David Page, Eric Lantz, Tim Chang

    Vanderbilt University, USA: Marylyn Ritchie

    Washington University in St. Louis, USA: Brian F. Gage, Elena Deych

    Genotyping QC of IWPC Samples:

    Academic Sinica, Taiwan, ROC: Ming-Ta M. Lee, Liang-Suei Lu

    Genotype and Phenotype QC:

    Inje University, Korea: Jae-Gook Shin

    Marshfield Clinic, USA: Michael D. Caldwell

    Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan

    University of Alabama, USA: Nita A. Limdi

    University of Florida, USA: Julie A. Johnson

    University of Pennsylvania, USA: Stephen E. Kimmel

    University of North Carolina, USA: Michael J. Wagner

    University of Wisconsin-Madison, USA: David Page

    Washington University in St. Louis, USA: Brian F. Gage

    Vanderbilt University, USA: Marylyn Ritchie

    Data Curation:

    Stanford University, USA: Teri E. Klein, Russ B. Altman, Balaji S. Srinivasan

    University of North Carolina, USA: Michael J. Wagner

    Washington University in St. Louis, USA: Elena Deych


    Application mammography
    Application: Mammography

    • Provide decision support for radiologists

    • Variability due to differences in training and experience… to get 90% of cancers, have high false positive rate

    • Experts have higher cancer detection and fewer benign biopsies

    • Shortage of experts


    Bayes net for mammography
    Bayes Net for Mammography

    • Kahn, Roberts, Wang, Jenks, Haddawy (1995)

    • Kahn, Roberts, Shaffer, Haddawy (1997)

    • Burnside, Rubin, Shachter (2000)

    • Note: not CAD (computer-assisted diagnosis), which circles abnormalities in an image… this is based on data entered into National Mammography Database schema by radiologists


    Ca++ Lucent

    Centered

    Milk of

    Calcium

    Mass Stability

    Ca++ Dermal

    Mass Margins

    Mass Density

    Ca++ Round

    Mass Shape

    Ca++ Dystrophic

    Mass Size

    Ca++ Popcorn

    Benign v.

    Malignant

    Ca++ Fine/

    Linear

    Breast

    Density

    Mass P/A/O

    Ca++ Eggshell

    Skin Lesion

    Ca++ Pleomorphic

    Tubular

    Density

    FHx

    Ca++ Punctate

    Age

    Ca++ Amorphous

    HRT

    Architectural

    Distortion

    Asymmetric

    Density

    LN

    Ca++ Rod-like


    Mammography database

    Patient Abnormality Date Calcification … Mass Loc Benign/

    Fine/Linear Size Malignant

    P1 1 5/02 No 0.03 RU4 B

    P1 2 5/04 Yes 0.05 RU4 M

    P1 3 5/04 No 0.04 LL3 B

    P2 4 6/00 No 0.02 RL2 B

    … … … … … … …

    Mammography Database


    Level 1 parameters

    Benign v. Mass Loc Benign/

    Malignant

    Calc Fine

    Linear

    Mass

    Size

    Level 1: Parameters

    P(Benign) =

    ??

    .99

    P(Yes| Benign) =

    P(Yes| Malignant) =

    .01

    .55

    ??

    ??

    P( size > 5| Benign) =

    P(size > 5| Malignant) =

    .33

    .42

    ??

    ??


    Level 2 structure parameters
    Level 2: Structure + Parameters Mass Loc Benign/

    Benign v.

    Malignant

    P(Benign) = .99

    Calc Fine

    Linear

    Mass

    Size

    P(Yes| Benign) = .01

    P(Yes| Malignant) = .55

    P(Yes) = .02

    P( size > 5 )= .1

    P(size > 5| Benign ^ Yes) = .4

    P(size > 5| Malignant ^ Yes) = .6

    P(size > 5| Benign ^ No) = .05

    P(size > 5| Malignant ^ No) = .2

    P( size > 5| Benign) = .33

    P(size > 5| Malignant) = .42


    Data Mass Loc Benign/

    • Structured data from actual practice

    • National Mammography Database

      • Standard for reporting all abnormalities

    • Our dataset contains

      • 435 malignancies

      • 65,365 benign abnormalities

    • Link to biopsy results

      • Obtain disease diagnosis – our ground truth


    Hypotheses
    Hypotheses Mass Loc Benign/

    • Learn relationships that are useful to radiologist

    • Improve by moving up learning hierarchy


    Results radiology 2009
    Results (Radiology, 2009) Mass Loc Benign/

    • Trained (Level 2, TAN) Bayesian network model achieved an AUC of 0.966 which was significantly better than the radiologists’ AUC of 0.940 (P = 0.005)

    • Trained BN demonstrated significantly better sensitivity than the radiologist (89.5% vs. 82.3%—P = 0.009) at a specificity of 90%

    • Trained BN demonstrated significantly better specificity than the radiologist (93.4% versus 86.5%—P = 0.007) at a sensitivity of 85%



    Precision recall curves
    Precision-Recall Curves Mass Loc Benign/


    Mammography database1

    Patient Abnormality Date Calcification … Mass Loc Benign/

    Fine/Linear Size Malignant

    P1 1 5/02 No 0.03 RU4 B

    P1 2 5/04 Yes 0.05 RU4 M

    P1 3 5/04 No 0.04 LL3 B

    P2 4 6/00 No 0.02 RL2 B

    … … … … … … …

    Mammography Database


    Statistical relational learning
    Statistical Relational Learning Mass Loc Benign/

    • Learn probabilistic model, but don’t assume iid data: there may be relevant data in other rows or even other tables

    • Database schema: defines set of features


    Srl aggregates information from related rows or tables
    SRL Aggregates Information from Related Rows or Tables Mass Loc Benign/

    • Extend probabilistic models to relational databases

    • Probabilistic Relational Models(Friedman et al. 1999, Getoor et al. 2001)

      • Tricky issue: one to many relationships

      • Approach: use aggregation

    • PRMs cannot capture all relevant concepts


    Aggregate illustration

    Patient Abnormality Date Calcification … Mass Loc Benign/

    Fine/Linear Size Malignant

    P1 1 5/02 No 0.03 RU4 B

    P1 2 5/04 Yes 0.05 RU4 M

    P1 3 5/04 No 0.04 LL3 B

    P2 4 6/00 No 0.02 RL2 B

    … … … … … … …

    Aggregate Illustration

    Aggregation Function:

    Min, Max, Average, etc.


    New schema
    New Schema Mass Loc Benign/

    Avg Size

    this Date

    0.03

    0.045

    0.045

    0.02

    Patient Abnormality Date Calcification … Mass Avg Size Loc Benign/

    Fine/Linear Size this date Malignant

    P1 1 5/02 No 0.03 0.03 RU4 B

    P1 2 5/04 Yes 0.05 0.045 RU4 M

    P1 3 5/04 No 0.04 0.045 LL3 B

    P2 4 6/00 No 0.02 0.02 RL2 B

    … … … … … … … …


    Level 3 aggregates
    Level 3: Aggregates Mass Loc Benign/

    Avg Size

    this date

    Benign v.

    Malignant

    Calc Fine

    Linear

    Mass

    Size

    Note: Learn parameters for each node


    Database notion of view
    Database Notion of Mass Loc Benign/View

    • New tables or fields defined in terms of existing tables and fields known as views

    • A view corresponds to alteration in database schema

    • Goal: automate the learning of views


    Possible view

    Patient Abnormality Date Calcification … Mass Loc Benign/

    Fine/Linear Size Malignant

    P1 1 5/02 No 0.03 RU4 B

    P1 2 5/04 Yes 0.05 RU4 M

    P1 3 5/04 No 0.04 LL3 B

    P2 4 6/00 No 0.02 RL2 B

    … … … … … … …

    Possible View


    New schema1
    New Schema Mass Loc Benign/

    Increase

    In Size

    No

    Yes

    No

    No

    Patient Abnormality Date Calcification … Mass Increase Loc Benign/

    Fine/Linear Size in size Malignant

    P1 1 5/02 No 0.03 No RU4 B

    P1 2 5/04 Yes 0.05 Yes RU4 M

    P1 3 5/04 No 0.04 No LL3 B

    P2 4 6/00 No 0.02 No RL2 B

    … … … … … … … …


    Level 4 view learning
    Level 4: View Learning Mass Loc Benign/

    Increase

    in Size

    Avg Size

    this date

    Benign v.

    Malignant

    Calc Fine

    Linear

    Mass

    Size

    Note: Include aggregate features Learn parameters for each node


    Level 4 view learning1
    Level 4: View Learning Mass Loc Benign/

    • Learn rules predictive of “malignant”

      • We used Aleph (Srinivasan)

    • Treat each rule as a new field

      • 1 if abnormality matches rule

      • 0 otherwise

    • New view consists of original table extended with new fields


    Experimental methodology
    Experimental Methodology Mass Loc Benign/

    • 10-fold cross validation

    • Split at the patient level

    • Roughly 40 malignant cases and 6000 benign cases in each fold

    • Tree Augmented Naïve Bayes (TAN) as structure learner (Friedman,Geiger & Goldszmidt ’97)


    Sample view burnside et al amia05
    Sample View Mass Loc Benign/[Burnside et al. AMIA05]

    malignant(A) :-

    birads_category(A,b5),

    massPAO(A,present),

    massesDensity(A,high),

    ho_breastCA(A,hxDCorLC),

    in_same_mammogram(A,B),

    calc_pleomorphic(B,notPresent),

    calc_punctate(B,notPresent).


    View learning first approach davis et al ia05 davis et al ijcai05

    Step 2 Mass Loc Benign/

    Step 1

    Step 3

    Rule

    Learner

    Target

    Predicate

    Rule 1

    Rule 2

    Rule N

    Learn

    Select

    Build Model

    View Learning: First Approach[Davis et al. IA05, Davis et al. IJCAI05]


    Drawback to first approach
    Drawback to First Approach Mass Loc Benign/

    • Mismatch between

      • Rule building

      • Model’s use of rules

    • Should Score As You Use (SAYU)


    Sayu davis et al ecml05
    SAYU Mass Loc Benign/[Davis et al. ECML05]

    • Build network as we learn rules[Landwehr et al. AAAI 2005]

    • Score rule on whether it improves network

    • Results in tight coupling between rule generation, selection and usage


    Sayu nb
    SAYU-NB Mass Loc Benign/

    0.02

    0.12

    0.10

    0.15

    0.35

    Score =

    Class

    Value

    Rule 14

    Rule N

    seed 2

    seed 1

    Rule 2

    Rule 1

    Rule 3


    Sayu view davis et al intro to srl 06

    Mass Loc Benign/

    Rule

    1

    Rule

    L

    SAYU-View[Davis et al. Intro to SRL 06]

    Class

    Value

    Feat

    1

    Feat

    N

    Agg

    1

    Agg

    M


    Parameter settings
    Parameter Settings Mass Loc Benign/

    • Score using AUC-PR (recall >= .5)

    • Keep a rule: 2% increase in AUC

    • Switch seeds after adding a rule

    • Train set to learn network structure and parameters

    • Tune set to score structures


    Electronic medical record1

    PatientID Date Physician Symptoms Diagnosis Mass Loc Benign/

    P1 1/1/01 Smith palpitations hypoglycemic

    P1 2/1/03 Jones fever, aches influenza

    PatientID Gender Birthdate

    P1 M 3/22/63

    PatientID Date Lab Test Result

    PatientID SNP1 SNP2 … SNP500K

    P1 AA AB BB

    P2 AB BB AA

    P1 1/1/01 blood glucose 42

    P1 1/9/01 blood glucose 45

    PatientID Date Prescribed Date Filled Physician Medication Dose Duration

    P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

    Electronic Medical Record


    Cox inhibition
    Cox Inhibition Mass Loc Benign/

    • Non-steroidal anti-inflammatory drug

    • Cox-2 goal: reduce stomach trouble

    Vioxx, Bextra, Celebrex block this pathway

    Aspirin, Aleve,

    Ibuprofen, etc

    block both pathways

    Cox-1

    Cox-2


    Cox 2 timeline

    Dec. 1998-May 1999, Mass Loc Benign/

    Celebrex, Vioxx approved

    2001,

    Cox-2 sales top

    $6 billion/year in US

    2002, Beginning of

    APPROVe Study

    Sept 2004,

    Vioxx voluntarily

    pulled from market

    Dec. 2004, FDA issues warning

    April 2005,

    FDA removes Bextra from market

    Cox-2 Timeline


    Predicting adverse reaction to cox 2 inhibitors
    Predicting Adverse Reaction Mass Loc Benign/to Cox-2 Inhibitors

    Given: A patient’s clinical history

    Do: Predict whether the patient will have a myocardial infarction (MI)

    Note: This is work in progress


    Data Mass Loc Benign/

    • 492 patients who took Cox-2, MI

    • 77077 patients who took Cox-2, no MI

      • Sub-sampled 651 patients

    • Relational tables for

      • Lab tests

      • Drugs taken

      • Diagnoses

      • Observations


    Q what data to use
    Q: What Data to Use? Mass Loc Benign/

    • All data for a patient? Many perfect predictors

    • Cut off data right before MI

      • Model not relevant pre-Cox2ib

      • Uniformly more data for non-MI cases

    • Our choice: cut off data for each patient at first Cox2ib prescription


    Approaches tried
    Approaches Tried Mass Loc Benign/

    • Propositional: Linear SVM, naïve Bayes, TAN, trees, boosted trees, boosted rules

    • Relational: Inductive Logic Programming (ILP) system Aleph

    • SRL: View learning with SAYU


    Experimental methodology1
    Experimental Methodology Mass Loc Benign/

    • 10-fold cross validation

    • Feature selection pick top 50/fold

    • ROC curves to evaluate

    • Paired t-test for significance


    Algorithms compared
    Algorithms Compared Mass Loc Benign/

    • Naïve Bayes

    • Boosted rules (C5)

    • SAYU-TAN (w/initial feature set)

      Note: Preliminary results with Aleph were poor/slow

    Best feature vector approaches


    Algorithm comparison
    Algorithm Comparison Mass Loc Benign/


    Roc area
    ROC Area Mass Loc Benign/


    Sample rule
    Sample Rule Mass Loc Benign/

    • myocardial_infarction(A) :-

      hasdrug(A, GLUCOSE),

      diagnosis(A, ischemic heart disease).


    Sample rule1
    Sample Rule Mass Loc Benign/

    • myocardial_infarction(A) :-

      diagnosis(A,B, INFECTIOUS AND PARASITIC DISEASES),

      before(B,10/26/1982),

      age(A,B,C),

      younger(C, 51).


    Lingering questions
    Lingering Questions Mass Loc Benign/

    • Are we predicting predisposition to MI?

    • Can we do better with data we have?

    • How much will genotype data help?


    Conclusions
    Conclusions Mass Loc Benign/

    • EMRs and genotyping give machine learning a new opportunity for great impact on healthcare in next few years

      • Personalized medicine

      • Pharmacovigilance (FDA’s Sentinel, OMOP)

      • Decision support

    • Statistical relational learning helps for some tasks (but not all)


    Conclusions continued
    Conclusions (Continued) Mass Loc Benign/

    • Fancy new algorithms not always the best… healthcare applications raise other issues

      • Missing data (not missing at random)

      • Need simple, comprehensible models… clinicians may prefer slightly less accurate model if it makes more sense to them

      • Different evaluation metrics


    Thanks
    Thanks Mass Loc Benign/

    • Jesse Davis

    • Beth Burnside

    • Vitor Santos Costa

    • Michael Caldwell

    • Peggy Peissig

    • Eric Lantz

    • Jude Shavlik

    • IWPC

    • WGI (Wisconsin Genomics Initiative)


    ad