1 / 12

Taming EHR Data

Taming EHR Data. Using Semantic Similarity to Reduce Dimensionality. Jim Weatherall, PhD Head, Advanced Analytics Centre, AstraZeneca Visiting Lecturer, School of Computer Science, University of Manchester 14 th World Congress on Medical & Health Informatics, August 2013, Copenhagen.

brant
Download Presentation

Taming EHR Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taming EHR Data Using Semantic Similarity to Reduce Dimensionality Jim Weatherall, PhD Head, Advanced Analytics Centre, AstraZeneca Visiting Lecturer, School of Computer Science, University of Manchester 14th World Congress on Medical & Health Informatics, August 2013, Copenhagen • On behalf of the authors: • Leila Kalankesh, School of Computer Science, UoM • James Weatherall, AstraZeneca • Thamer Ba-Dhfari, School of Computer Science, UoM • Iain Buchan, Institute of Population Health, UoM • Andy Brass, School of Computer Science, UoM

  2. Introduction Problems with mining healthcare data Large collections not easily visualised or interpreted Research not primary purpose for collection 10s of 1000s of dimensions 100s of 1000s of codes Biometrics & Information Sciences | GMD

  3. Data The Salford Integrated Record (SIR) • Population ~220,000 • Integrated primary and secondary care information • Individual Read Code entries captured in primary care information systems • Codes for diagnosis • Codes for procedures • All clinical transactions in primary care and some in secondary care • Data extract for this analysis based on: • GP data in date range 2003-2009 • Containing 136M Read code entries • Selected 24K patients with chronic conditions • Containing 443K Read code entries Biometrics & Information Sciences | GMD

  4. Methods Semantic Similarity How alike are the meanings of two terms? ? Measure depth? Or not? Measure ontological distance? From Sanchez, J.Biomed.Inform, 2011 Biometrics & Information Sciences | GMD

  5. Methods Semantic Similarity – which method? An ontology of methods! Biometrics & Information Sciences | GMD

  6. Semantic similarity calculation The Resnik measure Term probability, based on frequency, including descendants and annotations 1 2 Log transformation, gives “Information Content” IC of “Most Informative Common Ancestor” gives similarity measure 3 P. Resnik, “Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language”, J Artif Intell Res, 1999 Biometrics & Information Sciences | GMD

  7. Map patient records from diagnosis space into a similarity space Analysis Plan Stepwise approach to dimensionality reduction • Map patient records into a low-dimensional vector space via PCA 1 • Project patient records onto low-dimensional vector space and cluster patients by similarity 2 3 Biometrics & Information Sciences | GMD

  8. Analysis – Step 1 Mapping from diagnosis space to similarity space “The Similarity Matrix” pi = patient i sim(pi,pj) = similarity score between patients i and j Biometrics & Information Sciences | GMD

  9. 3 2 Analysis – Steps + PCA on the similarity matrix, visualisation & clustering Natural co-morbidity: Diabetes is a risk factor for angina due to its accelerating effect on atherosclerosis Biometrics & Information Sciences | GMD

  10. Discussion & Conclusion Review & Outlook Patients with similar diagnosis codes are grouped together Therefore, the semantic similarity technique works, to some degree Therefore, this is a viable route to dimensionality reduction in complex healthcare data sets Exploring co-morbidity and co-treatment effects? New biomedical hypotheses? Transferability of method? Population level characterisation? New data mining paradigms? Biometrics & Information Sciences | GMD

  11. Thank You!

  12. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 2 Kingdom Street, London, W2 6BD, UK, T: +44(0)20 7604 8000, F: +44 (0)20 7604 8151, www.astrazeneca.com Biometrics & Information Sciences | GMD

More Related