1 / 27

Phenotyping from Electronic Health Records

Phenotyping from Electronic Health Records. Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.org. My research focus on health a nalytics. Health Analytic Apps. Genomic data. Clinical Researchers. Visualization. Training data. Clinical data. User.

dmitri
Download Presentation

Phenotyping from Electronic Health Records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.org

  2. My research focus on health analytics Health Analytic Apps Genomic data Clinical Researchers Visualization Training data Clinical data User Heart disease predictorfor $5.99 Privacyengine Social data Analytic cloud Behavior data My focus Research Challenges • Big data analytics on the cloud • Data mining and machine learning techniques • Privacy preserving data sharing • Visual analytic techniques

  3. Outline • Phenotyping from EHR • Other work • PARAMO: Large scale predictive modeling pipeline • Patient Similarity

  4. Phenotyping from Electronic Health Records EHR Demographic Procedure Diagnosis Medical Concepts(phenotypes) Lab Tests Phenotyping Medication Medical Images

  5. Motivation: Increasing Importance of Electronic Health Records • EHR become acceptable data sources for clinical research • EHR data can enable many more research Explosion in interest How to turn EHR into phenotypes?

  6. Challenges in Phenotyping from EHR This talk • Representation • How to represent heterogeneous EHR data and phenotypes? • Speed • How to construct diverse phenotypes in unsupervised fashion? • Intuition • How to validate and refine the phenotypes? • Adaptation • How to adapt phenotypes from one site to another?

  7. Constructing Feature Tensor • Tensor is a generalization of matrix • Matrix is a 2nd order tensor • Tensors can better capture interactions among concepts • Data element types: • Binary • Count (integer) • Continuous (numeric) Mode

  8. Multiple Tensors Lab Results Medication Reconciliation Diagnosis-Medication Diagnostic Sources Symptoms Vital

  9. Phenotyping through Tensor Factorization Medication factor Factor elementssum to 1 Phenotype importance Diagnosis factor λ1 λR + … + ≈ Elements sum to 1 Patients factor Phenotype 1 Phenotype R

  10. Example Phenotype Medication factor λk Diagnosis factor Patients factor

  11. Phenotyping Process using Tensor Factorization λ1 λR + … + Phenotype Definitions Count Data Tensor Factorization Projection CountData New Patients PhenotypesMatrix

  12. CP-APR Model KL divergence for count data Element index Nonnegative combinations Stochastic constraint(elements in factor sum to 1) Chi, E.C. and Kolda, T.G. 2012. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications. 33, 4 (2012), 1272–1299.

  13. Constructing the Tensor • Medication orders from Geisinger dataset • Diagnosis codes aggregated into HCC codes • Medications are defined as pharmacy subclass • 31,816 patients x 169 diagnoses x 471 medications

  14. Evaluation of Phenotypes: Classification • Task: predict patients with heart failure • Model: logistic regression with ℓ1 regularization • 10 random even splits of the dataset (50% training) • Features: • Baseline using source independence matrix • Principal Component Analysis (PCA) • Nonnegative Matrix Factorization (NMF) • Phenotype Tensor Factorization (PTF)

  15. Predictive Performance Effect Small number of phenotypes outperforms 640 features Number of Phenotypes

  16. NMF factors are not concise, harder to interpret

  17. PTF interpretation: Major disease phenotypes can be identified Uncomplicated Diabetes Mild Hypertension Chronic Respiratory Inflammation/Infection

  18. PTF interpretation: Disease subtypes can be automatically identified Mild Hypertension Moderate Hypertension Severe Hypertension Over 80% phenotype factors are clinically meaningful

  19. Summary: Phenotyping using Tensor Factorization • Nonnegative tensor factorization can be used to learn phenotypes without supervision • Small number of phenotypes outperforms a large number of features in a prediction task λ1 λR ≈ +…+ Few diagnosis Phenotype R Phenotype 1

  20. System PARAMO: Parallel Predictive Modeling Platform

  21. Predictive Modeling Pipeline • There are many different models that need to be built and evaluated • Different patient cohorts • Different targets • Different features • Different algorithms • Multiple training and testing splits in cross-validation ~100K different pipelines

  22. Running Time vs. Parallelism level 9 days 3 hours • Patient sets • Small: 5,000 patients for hypertension control prediction • Medium: 33K for predicting heart failure onset • Large: 319K for hypertension diagnosis prediction • Dependency graph: 1808 nodes and 3610 edges 72X speed up

  23. Algorithm Patient similarity

  24. Patient Similarity Problem Patient Doctor Supervision Similarity search EHR Database

  25. Patient Similarity Problem Patient Doctor

  26. Summary on Patient Similarity • To learn a customized distance metric for a target [1] • Extension 1: Composite distance integration (Comdi) [2] • How to combine multiple patient similarity measures? • Extension 2: Interactive metric update (iMet) [3] • How to update an existing distance measure? Sun, J., Wang, F., Hu, J., Edabollahi, S., 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 16. FeiWang, Jimeng Sun, ShahramEbadollahi: Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment. SDM 2011: 59-70 56  Fei Wang, Jimeng Sun, Jianying Hu, ShahramEbadollahi: iMet: Interactive Metric Learning in Healthcare Applications. SDM 2011: 944-955

  27. Phenotyping from Electronic Health Records Jimeng Sun College of Computing Georgia Tech jsun@cc.gatech.edu More info at sunlab.org

More Related