1 / 68

Data Integration in eHealth: A Domain/Disease Specific Roadmap

Data Integration in eHealth: A Domain/Disease Specific Roadmap. Jenny Ure School of Informatics Univ. of Edinburgh. …there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

dysis
Download Presentation

Data Integration in eHealth: A Domain/Disease Specific Roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

  2. …there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

  3. Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems • Conclusions • In organic communities, the processes of structuring collaboration, coordination and control structures happens as a matter of course. NeuroGrid is employing an early prototype to generate engagement and dialogue, to enable early discussion of requirements for more complex services, compute capability and workflows, as well as data quality and configurational issues. • In addition to ameliorating the recurring issue of requirements ‘creep’, late in the design process, it allows disparate groups to engage with the real issues, and possible solutions in a shared context. • Introduction • NeuroGRID www.neurogrid.ac.uk is a three-year, £2.1M project funded through the UK Medical Research Council to:- • develop a Grid-based research environment to facilitate the sharing of MR and CT scans of the brain and clinical patient data in the diagnosis of psychoses, dementia and stroke • bring together clinicians, researchers and e-scientists at Oxford, Edinburgh, Nottingham and London • create a toolset for image registration, analysis, normalisation, anonymisation, real-time acquisition and error trapping • ensure rapid, reliable and secure access, authentication and data sharing Data Quality Issues: The Social Life of Information Challenge: The large scale aggregation of diverse datasets offers both potential benefits and risks, particularly if the outputs are to be used with patients in a clinical context. Thus aggregating data is a key issue for e-Health, yet data is not independent of the context in which it is generated. Within small communities of practice a degree of shared and updated knowledge and experience allows judicious use of resources whose provenance is known and whose weaknesses are often already transparent. The same is not true of aggregated data from multiple sources. Approach: Early use of prototypes to provide a ‘sandpit’ for promoting both technical and inter-community dialogue and engagement, and start the process of identifying, sharing and updating knowledge of emerging issues. Early trials with known datasets aim to generate an awareness of the types of variance that can arise and ways in which it might be minimized, harmonized, or made transparent to users Socio-technical Issues Aligning Technical and Human Systems Challenge:Integrating the technical work of system building, with the socio-political work of generating the governance of the new risks and opportunities they generate Approach: The creation of real and virtual ‘shared spaces’ (e.g. via Access Grid) and the use of an early prototype for engagement in areas of shared professional concern, to help this new hybrid community develop its own rules of engagement, and start making collective sense of local requirements in relation to common project goals. • Semantic Issues Il nome della rosa • Challenge: • Multi-site studies raise issues such as different naming conventions for files, different coding and classification systems, different protocols, and different conceptualisations of domains. • Approach: • The project agreed on core and node specific metadata and will use an OWL-based ontology (logic-based domain map) to allow human and machine-readable searching and basic reasoning across the datasets. In this there is a trade-off between the benefits of share-ability and automated reasoning, on the one hand, and the formalisation of concepts and relationships that are evolving. • Challenge: • Aligning and representing datasets at different levels of granularity. While NeuroGrid uses MR and CT scans, other relevant datasets such as diffusion tensor imaging, genetic, proteomic datasets also contribute to an understanding of neurological processes. • Approach: • The project is adopting a two –pronged approach • developing task specific ontologies • developing a reference ontology based on the Foundational Model of Anatomy adopted by the BIRN • Human Brain Project. • This allows a degree of alignment between datasets and ontologies in future collaborations Acknowledgements The authors would like to acknowledge the support of the UK Medical Research Council (Grant Ref no: GO600623 ID number 77729), the UK e-Science programme and the NeuroGrid Consortium. Imaging Issues: Artefact or Actuality? Researchers use innovative imaging techniques to detect features that can refine a diagnosis, classify cases, track normal or often subtle physiological changes over time and improve understanding of the structural correlates of clinical features. Variance is attributable to a complex variety of procedures involved in image acquisition, transfer and storage, and it is crucial, but difficult, for true disease-related effects to be separated from those which are artifacts of the process For further information For information on this and related projects contact Jenny.Ure@ed.ac.uk or go to www.neurogrid.ac.uk Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems John Geddesa, Clare Mackaya, Sharon Lloydb, Andrew Simpsonb , David Powerb, Douglas Russellb, Marina Jirotkab, Mila Katzarovab, Martin Rossorc, Nick Foxc, Jonathon Fletcherc, Derek Hilld, Kate McLeishd, Yu Chend , Joseph V Hajnale, Stephen Lawrief, Dominic Jobf, Andrew McIntoshf, Joanna Wardlawg, Peter Sandercockg, Jeb Palmerg, Dave Perryg, Rob Procterh, Jenny Ureh,[1], Mark Hartswoodh, Roger Slackh, Alex Vossh, Kate Hoh, Philip Bathi, Wim Clarkei, Graham Watsoni aDepartment of Psychiatry, University of Oxford, bComputing Laboratory, University of Oxford, cInstitute of Neurology, University College London, dCentre for Medical Image Computing (MedIC), University College London, eImaging Sciences Department, Imperial College London, fDepartment of Psychiatry, University of Edinburgh, gDepartment of Clinical NeuroSciences, University of Edinburgh, hSchool of Informatics, University of Edinburgh, iInstitute of Neuroscience, University of Nottingham [1] Corresponding Author: Jenny Ure, School of Informatics, University of Edinburgh, Jenny.Ure@ed.ac.uk The concept of the collaboratory is central to the e-Science vision, yet there has been limited concern with the generation of the community and coordination infrastructures which will coordinate and sustain it. • Real or artefactual differences? • Different scanners • Different populations • Different raters • Different centres • Different protocols

  4. Recurring problem: solution scenarios at different stages the human process 1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use the technical process

  5. Seven HealthGrid projects integrating data across sites and scales in schizophrenia • PsyGrid • NeuroGrid • BIRN • NeuroBase • CARMEN • DGEMap • EMAGE • P3G Observatory HealthGrid Share OntoGrid, Sealife, CARO https://wikis.ac.uk/mod/Main_Page

  6. use schizophrenia as a domain-specific test case to map problem/solution scenarios in data integration • across sites • across scales • across Grids in the same domain

  7. Range of Grid projects in the same domain

  8. CARMEN • create a grid-enabled, real time ‘virtual laboratory’ environment for neuro-physiological data • develop an extensible ‘toolkit’ for • data extraction, analysis and modelling • provide a repository for data archiving, sharing, integration, discovery • achieve wide community and commercial engagement in development and use http://www.carmen.org.uk neurone 1 neurone 2 neurone 3

  9. Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems • Conclusions • In organic communities, the processes of structuring collaboration, coordination and control structures happens as a matter of course. NeuroGrid is employing an early prototype to generate engagement and dialogue, to enable early discussion of requirements for more complex services, compute capability and workflows, as well as data quality and configurational issues. • In addition to ameliorating the recurring issue of requirements ‘creep’, late in the design process, it allows disparate groups to engage with the real issues, and possible solutions in a shared context. • Introduction • NeuroGRID www.neurogrid.ac.uk is a three-year, £2.1M project funded through the UK Medical Research Council to:- • develop a Grid-based research environment to facilitate the sharing of MR and CT scans of the brain and clinical patient data in the diagnosis of psychoses, dementia and stroke • bring together clinicians, researchers and e-scientists at Oxford, Edinburgh, Nottingham and London • create a toolset for image registration, analysis, normalisation, anonymisation, real-time acquisition and error trapping • ensure rapid, reliable and secure access, authentication and data sharing Data Quality Issues: The Social Life of Information Challenge: The large scale aggregation of diverse datasets offers both potential benefits and risks, particularly if the outputs are to be used with patients in a clinical context. Thus aggregating data is a key issue for e-Health, yet data is not independent of the context in which it is generated. Within small communities of practice a degree of shared and updated knowledge and experience allows judicious use of resources whose provenance is known and whose weaknesses are often already transparent. The same is not true of aggregated data from multiple sources. Approach: Early use of prototypes to provide a ‘sandpit’ for promoting both technical and inter-community dialogue and engagement, and start the process of identifying, sharing and updating knowledge of emerging issues. Early trials with known datasets aim to generate an awareness of the types of variance that can arise and ways in which it might be minimized, harmonized, or made transparent to users Socio-technical Issues Aligning Technical and Human Systems Challenge:Integrating the technical work of system building, with the socio-political work of generating the governance of the new risks and opportunities they generate Approach: The creation of real and virtual ‘shared spaces’ (e.g. via Access Grid) and the use of an early prototype for engagement in areas of shared professional concern, to help this new hybrid community develop its own rules of engagement, and start making collective sense of local requirements in relation to common project goals. • Semantic Issues Il nome della rosa • Challenge: • Multi-site studies raise issues such as different naming conventions for files, different coding and classification systems, different protocols, and different conceptualisations of domains. • Approach: • The project agreed on core and node specific metadata and will use an OWL-based ontology (logic-based domain map) to allow human and machine-readable searching and basic reasoning across the datasets. In this there is a trade-off between the benefits of share-ability and automated reasoning, on the one hand, and the formalisation of concepts and relationships that are evolving. • Challenge: • Aligning and representing datasets at different levels of granularity. While NeuroGrid uses MR and CT scans, other relevant datasets such as diffusion tensor imaging, genetic, proteomic datasets also contribute to an understanding of neurological processes. • Approach: • The project is adopting a two –pronged approach • developing task specific ontologies • developing a reference ontology based on the Foundational Model of Anatomy adopted by the BIRN • Human Brain Project. • This allows a degree of alignment between datasets and ontologies in future collaborations Acknowledgements The authors would like to acknowledge the support of the UK Medical Research Council (Grant Ref no: GO600623 ID number 77729), the UK e-Science programme and the NeuroGrid Consortium. Imaging Issues: Artefact or Actuality? Researchers use innovative imaging techniques to detect features that can refine a diagnosis, classify cases, track normal or often subtle physiological changes over time and improve understanding of the structural correlates of clinical features. Variance is attributable to a complex variety of procedures involved in image acquisition, transfer and storage, and it is crucial, but difficult, for true disease-related effects to be separated from those which are artifacts of the process For further information For information on this and related projects contact Jenny.Ure@ed.ac.uk or go to www.neurogrid.ac.uk Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems John Geddesa, Clare Mackaya, Sharon Lloydb, Andrew Simpsonb , David Powerb, Douglas Russellb, Marina Jirotkab, Mila Katzarovab, Martin Rossorc, Nick Foxc, Jonathon Fletcherc, Derek Hilld, Kate McLeishd, Yu Chend , Joseph V Hajnale, Stephen Lawrief, Dominic Jobf, Andrew McIntoshf, Joanna Wardlawg, Peter Sandercockg, Jeb Palmerg, Dave Perryg, Rob Procterh, Jenny Ureh,[1], Mark Hartswoodh, Roger Slackh, Alex Vossh, Kate Hoh, Philip Bathi, Wim Clarkei, Graham Watsoni aDepartment of Psychiatry, University of Oxford, bComputing Laboratory, University of Oxford, cInstitute of Neurology, University College London, dCentre for Medical Image Computing (MedIC), University College London, eImaging Sciences Department, Imperial College London, fDepartment of Psychiatry, University of Edinburgh, gDepartment of Clinical NeuroSciences, University of Edinburgh, hSchool of Informatics, University of Edinburgh, iInstitute of Neuroscience, University of Nottingham [1] Corresponding Author: Jenny Ure, School of Informatics, University of Edinburgh, Jenny.Ure@ed.ac.uk The concept of the collaboratory is central to the e-Science vision, yet there has been limited concern with the generation of the community and coordination infrastructures which will coordinate and sustain it. • Real or artefactual differences? • Different scanners • Different populations • Different raters • Different centres • Different protocols

  10. Data integration • across sites(horizontal) • across scales (vertical …think Google Earth)

  11. using Owl-Based Ontologies

  12. The MRC , the Wellcome Trust, the eScience Centre, the JISC, the DTI and the BBRC report on data sharing in the life sciences http://www.mrc.ac.uk/pdf-jdss_final_report.pdf cost of unshared data – re-use need to plan this into major projects such as Grid-enabled eScience and eHealth

  13. Recurring problem: solution scenarios at different stages the human process 1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use the technical process

  14. Sampling, collecting and coding scenarios • Different populations • Different collection protocols • Different contexts and criteria for collection • NeuroGrid • P3G Observatory

  15. Coding? Format?

  16. 30% Collection Errors ? • Missing of helpful data i.e. data that was almost certainly known but was not filled in • Incomplete data e.g. the patient ID being specified but not the issuer of the patient ID • Incorrect data e.g. the patient's name being entered as "brain" • Incorrectly formatted data • Data in the wrong field e.g. the series being described as "knee“ • Inconsistent data within a single file, e.g. if the patient's age is inconsistent with image date minus birth date.

  17. So what about data cleaning?: • A 46:36 waist/hip ratio reading – is it an input error or just a typical sample from West Lothian

  18. Strategies such as.. • Wireless notepads for data collection • Provenance metadata • Links to original data • Local QA/ethics/linkage committees • Error trawls and spot checks combined with error-trapping software

  19. What about shared protocols? • Trace a line around the region of interest in all subjects • Compare differences in area across control and experimental groups

  20. Harmonising different tools and platforms • Microarray • In situ hybridisation • Scanners

  21. Adapted from Keator et al (2006) Presentation to the UK-BIRN workshop Different Disease Effects or Different Scanners? Harmonisation strategies?

  22. Effect or Artefact? • Different equipment • Different populations • Different raters • Different contexts • Different protocols • Different coding • Different metadata

  23. P3G – Harmonisation across multiple national biobanks • Different study designs • Different tools • Different populations • Different formats

  24. Different study designs and procedures

  25. Different Questionnaires

  26. …there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

  27. Ethical and Legal Issues in Data Linkage. New technical infrastructures can outstrip the development of social governance structures the human process 1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use the technical process

  28. Data linkage enhances the potential for knowledge discovery • in relation to disease • in relation to patients and their families • DeCODE • different technical solutions distribute cost, risk and benefit in different ways

  29. Options - Role Based Access • The de facto standard • Persistent linked datasets more likely • Getting access is easier • Monitoring misuse is hard

  30. An additional layer? • Checking for risks arising from linkage between particular datasets

  31. An additional human layer • Linkage assessment panel • Also combines ethical and quality roles • Existing roles and responsibilities support effective intervention to enhance security and quality

  32. Information security is about the negotiation and agreement of cost, risks and benefits – not about technology Ross Anderson (2001) Why Information Security is hard.ACSAC23 http://www.cl.cam.ac.uk/~rja14/Papers/econ.pdf

  33. Many of the biggest risk, costs and delays are socio-political (e.g. ethical consent) and socio-technical (e.g. usability) • NHS Connect • Forum for renegotiating rights, risks and responsibilities

  34. Ethical and Legal issues - anonymising images as well as patient data • Reconstruction of face from raw anatomical data might be able to be used to identify subject Raw Skull Stripped

  35. Integrating Data for Human and Machine Use the human process 1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use the technical process

  36. Issues in data integration • across sites(horizontal) • across scales (vertical)

  37. across time-scales DGEMapwww.dgemap.org HDBR http://www.hdbr.org EMAGEhttp://genex.hgu.mrc.ac.uk

  38. Shared frames of reference for imaging data • Shared anatomical ‘map’ reference points • Somewhere to hang distributed data BIRN www.fbirn.net

  39. How to agree a common spatio-temporal infrastructure for sharing data? Site 3 Site 2 Site 1 organs organs organs tissues tissues tissues cells cells cells Stage 1 Stage 2 Stage 3

  40. Protein Gene in Species Disease caused by abnormality inFunction ofProtein coded bygene in species Protein coded bygene in species Function ofProtein coded bygene in species Species Genes Bridging scales and context with ontologies Function Disease

  41. Logic-based Ontologies: Conceptual Lego “Hand which isanatomicallynormal”

  42. so technical infrastructure needs community infrastructure to define.. • Shared spaces • Shared frames of reference • Shared tools • Shared naming conventions • Shared ethical and legal conventions • Shared costs and risks

  43. Current examples of data curation communitiessuch as wikipediacan • Can achieve shared aims faster – re-use • Can create de facto standards • Can cut cost & risk • Can achieve critical mass for funding

  44. Open Source projects in eHealth such as http://www.nbirn.net/ www.prg.org

  45. https://wikis.nesc.ac.uk/mod/Main_Page

  46. eHealth Technology interleaves.. • stable, standardised, scalable computing infrastructures • diverse, dynamic, distributed human infrastructures • synergies are possible

  47. Recurring problem scenarios • at different stages • at different interfaces

  48. Aligning human and technical processes • Alignment adds value, cuts cost, cuts risk (e.g. Napster, eBay, wikipedia) • Misalignment adds costs and risks (e.g. Challenger, deCode, MOD Defence procurement Initiative, UK eUniversity)

  49. Frankly my dear..

More Related