Jaideep Srivastava. University of Minnesota firstname.lastname@example.org. US-China Workshop on Collaboration for Tobacco Control & Research March 27-29, 2008 Beijing, PRC. Patient. Cardio logist. Geriatrics. Medical Problems. Pulmono logist. Podiatrist. Rheumato logist.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Jaideep Srivastava University of Minnesota email@example.com US-China Workshop on Collaboration for Tobacco Control & Research March 27-29, 2008 Beijing, PRC
Patient Cardio logist Geriatrics Medical Problems Pulmono logist Podiatrist Rheumato logist Discovering Referral Networks from Medicare Data
Referral Networks and Cooperation • Problem • In many cases people visit multiple doctors and specialists for their medical needs • The patients would be served better if there were better coordination between these specialists • Classical approach • Offer incentives individually to specialists • Defects in this approach • Each specialist may want to “optimize” his/her own incentives • In such settings local optimization of services does not lead to global optimization of services • Proposed approach • Identify Referral Networks to encourage specialists to work together to offer better services provide group incentives
Network Analysis Formulation of Problem • Problem is reformulated as a SNA Problem • Matrices can be constructed for variables of interest e.g., Patient- Specialist Matrix, Specialist-Location matrix etc. • Techniques like Clustering, Link Analysis, Co-clustering, identifying cliques can be used
Mining Liver Cirrhosis Data - Mining structured and unstructured patient data - Post-analysis of results from mining to gain further disease understanding Specifically: Is cirrhosis the result of a single cause or a number of liver damaging conditions that work in tandem? Many cancers are “multi-hit” diseases, we know about some of these hits for Liver Cirrhosis, abuse of alcohol, presence of viral hepatitis, and possibly obesity. Are there any others and how strong is the support for each of the known hits?
Project Goal: Predict Disease Stage Given a set of patients having undergone liver biopsy i.e., we know biopsy stage as well as hepatitis type • Classification Problem • The aim is to predict the biopsy stage by evaluating only the laboratory test values obtained prior to the biopsy date • Constructed Dataset • For each patient who has undergone a liver biopsy, we only use laboratory test results that precede the biopsy date • We take the average of the last n test point values for each laboratory test individually
Nature of the Data • Temporal • Difficult to handle discrete time points as well as varied range of values across the lab test • Sparse • Missing values imply missing data imputation techniques or other ways suitable for this domain need to be used • Noisy • Need data cleansing techniques, outlier detection • Irregular • Inability to do direct comparisons between patient test values • Domain Knowledge • Need expertise to understand data, evaluate results, etc, which takes years to obtain
Stages of Fibrosis in Chronic Hepatitis Stage 1 Stage 2 Portal Tracts Portal Periportal Stage 3 Stage 4 Septal Cirrhosis Slide provided by Dr. John Gross, Mayo Clinic
Approach- Identify significant laboratory tests • Use the constructed data set to compute correlation between the test values obtained before liver biopsy and liver biopsy stage • Result: 11 laboratory tests identified as significant and present in many patients • Literature survey and physician expertise suggested the use of a subset of laboratory tests • Result: 8 commonly used laboratory tests were present in both hepatitis B and C data sets, of which domain expert identified 5 laboratory tests as likely to be important for predicting the course of hepatitis
K-Nearest Neighbor • A technique that looks at the k points nearest to itself • Current point is classified based on the count of the neighbors • Results using 20 nearest neighbors for classification represented as a confusion matrix PREDICTION L A B E L S For such a medical problem it would be ideal to get the predictions as close to the diagonal. Right now we fail to do so as shown by the red marked region. PS: Even the “best” industry classifier shows similar results
Kernel Metrics • Features used • Dynamic time warping as a direct similarity measure between various patients • Use of the average of last n test points seen before liver biopsy • Results (Preliminary) • Area under the ROC Curve used to evaluate the performance shows that there is promise in this technique, provided • the classes are defined well • features are combined together • The best performance value: 0.66 ROC, which is a good estimate given the difficulty of the problem
Commercially available “Best” Laboratory-based Test for Hepatic Fibrosis FibroTest ® Note extensive overlap between stages when 5 specific markers are used 3 of 5 markers are not commonly used tests Overlap in prediction: similar to our results, however they do better for the severe classes
Learned so Far… Immediate Next Steps • Relative rates • Trends in the laboratory test values should be considered rather than just the absolute values. The idea would be to use other trend capturing measures as features and move towards an improvement. • Increasing the time span • Previously we restricted the time span of test series before liver biopsy and used only a few months • Separating hepatitis B and C • Hepatitis B cases may have a different pattern in the temporal nature of the laboratory tests compared to Hepatitis C cases • 5 laboratory tests • We identified 5 laboratory tests as the ones very core to our classification problem
TSEEN (Contractor et al) • Title: Social Networking tools to enable collaboration in TSEEN TSEEN Social Science Researchers Information Science Researchers Social Networking Tools Public Health Researchers TobacSIG CyberInfrastructure SN Tools Knowledge Base TobacSIG : Tobacco Systems integration Grid TSEEN: Tobacco Surveillance, Epidemiology, and Evaluation Network
Social Science Researchers TSEEN Information Science Researchers ToBIG Smart tools for Community networking TobacSIG CyberInfrastructure CI-KNOW Public Health Researchers Computer Science Researchers SIDCOM Knowledge Base Focus of this proposal community-based result ranking query incentive networks social network based recommendation intelligent task routing expert identification Relationship between SCI-COM, TSEEN and ToBIG. Smart Information Discovery for Communities (SIDCOM)