Data interoperability the vision of seamless data sharing in health informatics
Download
1 / 38

Data Interoperability - the vision of seamless data-sharing in Health Informatics - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

Data Interoperability - the vision of seamless data-sharing in Health Informatics. Jenny Ure School of Informatics Univ. of Edinburgh. Health Informatics Resource: Data Interoperability. Health informatics increasingly relates to data-sharing - across sites, scales and formats. ………….

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Interoperability - the vision of seamless data-sharing in Health Informatics' - crwys


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data interoperability the vision of seamless data sharing in health informatics

Data Interoperability -the vision of seamless data-sharing in Health Informatics

Jenny Ure

School of Informatics

Univ. of Edinburgh

Health Informatics Resource: Data Interoperability


Health informatics increasingly relates to data sharing across sites scales and formats
Health informatics increasingly relates to data-sharing - across sites, scales and formats

…………

Wrapper

Genes

Adapted from www.fbirn.net


Data interoperability allows
Data Interoperability allows across sites, scales and formats

  • pooling of data across sites and scales for knowledge discovery (think Google Earth)

  • faster turnaround times in translational research from lab bench to bedside

  • more re-use of research outcomes and less fragmentation and duplication of work in the same disease domain


This can involve huge datasets

Proteins across sites, scales and formats

sequence

2º structure

3º structure

DNA sequences

alignments

This can involve huge datasets…

billions

Protein-Protein

Interactions

metabolism

pathways

receptor-ligand

4º structure

Physiology

Cellular biology

Biochemistry

Neurobiology

Endocrinology

etc.

Polymorphism

and Variants

genetic variants

individual patients

epidemiology

millions

millions

Hundredthousands

ESTs

Expression patterns

Large-scale screens

Genetics and Maps

Linkage

Cytogenetic

Clone-based

MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT...

billions

...atcgaattccaggcgtcacattctcaattcca...

millions


e.g. linking genetic factors to clinical or scan data..

.. Though you all have to agree on how to name, code and format data sets, and how they relate to a disease!


Bridges Project data..

www.brc.dcs.gla.ac.uk/projects/bridges/



Or large scale computational analysis
Or large scale computational analysis data..

http://www.clinical-escience.org/



However - combining different datasets may help create a bigger picture……but it may be the wrong one


The social life of information
The social life of information bigger picture……but it may be the wrong one

  • As in the expression of genetic information - health information is also shaped by factors in the local environment

Seely Brown and Duguid, 2000 ‘The Social Life of Information’ Harvard School Press


Socio technical shaping factors
Socio-technical Shaping Factors bigger picture……but it may be the wrong one


Recurring problem scenarios at bigger picture……but it may be the wrong one

different stages

the human process

1.sampling

2.collecting

3.coding

4.cleaning

5.linkage

6.analysis

7.use

the technical process


Harmonisation across multiple national biobanks such as p3g identified issues in
Harmonisation across multiple national biobanks such as P3G identified issues in

  • Different populations

  • Different environments

  • Different study designs

  • Different tools

  • Different populations

  • Different formats



Different Questionnaires identified issues in


Coding? Format? identified issues in


30% Collection Errors ? identified issues in

  • Missing of helpful data i.e. data that was almost certainly known but was not filled in

  • Incomplete data e.g. the patient ID being specified but not the issuer of the patient ID

  • Incorrect data e.g. the patient's name being entered as "brain"

  • Incorrectly formatted data e.g. a patient name being specified so that the surname is “CameronDavid”.

  • Data in the wrong field e.g the series being described as "knee“

  • Inconsistent data within a single file, e.g. if the patient's age is inconsistent with image date minus birth date.


So what about data cleaning
So what about data cleaning?: identified issues in

  • A 46:36 waist/hip ratio reading – is it an input error or just a sample from West Lothian?


Other strategies
Other Strategies identified issues in

  • Wireless notepads for data collection

  • Provenance metadata

  • Links to original data

  • Local QA/ethics/linkage committees

  • Error trawls and spot checks combined with error-trapping software


The myth of shared protocols
The myth of shared protocols. identified issues in

  • Trace a line around the region of interest in all subjects

  • Compare differences in area across control and experimental grops


Harmonising different tools and platforms
Harmonising different tools and platforms identified issues in

  • Microarray

  • In situ hybridisation

  • Scanners


Different disease effects or different scanners harmonisation strategies

Adapted from Keator et al (2006) Presentation to the UK-BIRN workshop

Different Disease Effects or Different Scanners? Harmonisation strategies?


Effect or artefact
Effect or Artefact? workshop

  • Different equipment

  • Different populations

  • Different raters

  • Different contexts

  • Different protocols

  • Different coding

  • Different metadata


Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems

  • Conclusions

  • In organic communities, the processes of structuring collaboration, coordination and control structures happens as a matter of course. NeuroGrid is employing an early prototype to generate engagement and dialogue, to enable early discussion of requirements for more complex services, compute capability and workflows, as well as data quality and configurational issues.

  • In addition to ameliorating the recurring issue of requirements ‘creep’, late in the design process, it allows disparate groups to engage with the real issues, and possible solutions in a shared context.

  • Introduction

  • NeuroGRID www.neurogrid.ac.uk is a three-year, £2.1M project funded through the UK Medical Research Council to:-

  • develop a Grid-based research environment to facilitate the sharing of MR and CT scans of the brain and clinical patient data in the diagnosis of psychoses, dementia and stroke

  • bring together clinicians, researchers and e-scientists at Oxford, Edinburgh, Nottingham and London

  • create a toolset for image registration, analysis, normalisation, anonymisation, real-time acquisition and error trapping

  • ensure rapid, reliable and secure access, authentication and data sharing

Data Quality Issues:

The Social Life of Information

Challenge:

The large scale aggregation of diverse datasets offers both potential benefits and risks, particularly if the outputs are to be used with patients in a clinical context. Thus aggregating data is a key issue for e-Health, yet data is not independent of the context in which it is generated. Within small communities of practice a degree of shared and updated knowledge and experience allows judicious use of resources whose provenance is known and whose weaknesses are often already transparent. The same is not true of aggregated data from multiple sources.

Approach:

Early use of prototypes to provide a ‘sandpit’ for promoting both technical and inter-community dialogue and engagement, and start the process of identifying, sharing and updating knowledge of emerging issues.

Early trials with known datasets aim to generate an awareness of the types of variance that can arise and ways in which it might be minimized, harmonized, or made transparent to users

Socio-technical Issues

Aligning Technical and Human Systems

Challenge:Integrating the technical work of system building, with the socio-political work of generating the governance of the new risks and opportunities they generate

Approach: The creation of real and virtual ‘shared spaces’ (e.g. via Access Grid) and the use of an early prototype for engagement in areas of shared professional concern, to help this new hybrid community develop its own rules of engagement, and start making collective sense of local requirements in relation to common project goals.

  • Semantic Issues Il nome della rosa

  • Challenge:

  • Multi-site studies raise issues such as different naming conventions for files, different coding and classification systems, different protocols, and different conceptualisations of domains.

  • Approach:

  • The project agreed on core and node specific metadata and will use an OWL-based ontology (logic-based domain map) to allow human and machine-readable searching and basic reasoning across the datasets. In this there is a trade-off between the benefits of share-ability and automated reasoning, on the one hand, and the formalisation of concepts and relationships that are evolving.

  • Challenge:

  • Aligning and representing datasets at different levels of granularity. While NeuroGrid uses MR and CT scans, other relevant datasets such as diffusion tensor imaging, genetic, proteomic datasets also contribute to an understanding of neurological processes.

  • Approach:

  • The project is adopting a two –pronged approach

  • developing task specific ontologies

  • developing a reference ontology based on the Foundational Model of Anatomy adopted by the BIRN

  • Human Brain Project.

  • This allows a degree of alignment between datasets and ontologies in future collaborations

Acknowledgements

The authors would like to acknowledge the support of the UK Medical Research Council (Grant Ref no: GO600623 ID number 77729), the UK e-Science programme and the NeuroGrid Consortium.

Imaging Issues:

Artefact or Actuality?

Researchers use innovative imaging techniques to detect features that can refine a diagnosis, classify cases, track normal or often subtle physiological changes over time and improve understanding of the structural correlates of clinical features.

Variance is attributable to a complex variety of procedures involved in image acquisition, transfer and storage, and it is crucial, but difficult, for true disease-related effects to be separated from those which are artifacts of the process

For further information

For information on this and related projects contact [email protected] or go to www.neurogrid.ac.uk

Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems

John Geddesa, Clare Mackaya, Sharon Lloydb, Andrew Simpsonb , David Powerb, Douglas Russellb, Marina Jirotkab, Mila Katzarovab, Martin Rossorc, Nick Foxc, Jonathon Fletcherc, Derek Hilld, Kate McLeishd, Yu Chend , Joseph V Hajnale, Stephen Lawrief, Dominic Jobf, Andrew McIntoshf, Joanna Wardlawg, Peter Sandercockg, Jeb Palmerg, Dave Perryg, Rob Procterh, Jenny Ureh,[1], Mark Hartswoodh, Roger Slackh, Alex Vossh, Kate Hoh, Philip Bathi, Wim Clarkei, Graham Watsoni

aDepartment of Psychiatry, University of Oxford, bComputing Laboratory, University of Oxford, cInstitute of Neurology, University College London, dCentre for Medical Image Computing (MedIC), University College London, eImaging Sciences Department, Imperial College London, fDepartment of Psychiatry, University of Edinburgh, gDepartment of Clinical NeuroSciences, University of Edinburgh, hSchool of Informatics, University of Edinburgh, iInstitute of Neuroscience, University of Nottingham

[1] Corresponding Author: Jenny Ure, School of Informatics, University of Edinburgh, [email protected]

The concept of the collaboratory is central to the e-Science vision, yet there has been limited concern with the generation of the community and coordination infrastructures which will coordinate and sustain it.

  • Real or artefactual differences?

  • Different scanners

  • Different populations

  • Different raters

  • Different centres

  • Different protocols


…there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.


Usability issues in data integration
Usability issues in data integration are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

  • across sites(horizontal)

  • across scales (vertical …think Google Earth


Across time scales
across time-scales are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.

DGEMapwww.dgemap.org HDBR http://www.hdbr.org

EMAGEhttp://genex.hgu.mrc.ac.uk


How to agree a common spatio-temporal infrastructure for sharing data?

Site 3

Site 2

Site 1

organs

organs

organs

tissues

tissues

tissues

cells

cells

cells

Stage 1

Stage 2

Stage 3


Shared frames of reference for imaging data
Shared frames of reference for imaging data sharing data?

  • Shared anatomical ‘map’ reference points

  • Somewhere to hang distributed data

BIRN www.fbirn.net



So technical infrastructure needs community infrastructure to define
so technical infrastructure needs community infrastructure to define..

  • Shared spaces

  • Shared frames of reference

  • Shared tools

  • Shared naming conventions

  • Shared ethical and legal conventions

  • Shared costs and risks


Current examples of data curation communities such as wikipedia can
Current examples of data curation communities to define..such as wikipediacan

  • Can achieve shared aims faster – re-use

  • Can create de facto standards

  • Can cut cost & risk

  • Can achieve critical mass for funding


Open Source projects in eHealth such as to define..

http://www.nbirn.net/

www.prg.org


Identifying risks and the opportunities at each stage to define..

the human process

1.sampling

2.collecting

3.coding

4.cleaning

5.linkage

6.analysis

7.use

the technical process


The ehealth vision of seamless data sharing
The eHealth Vision of seamless data sharing? to define..

  • Still more of a vision than a reality !


[email protected] to define..


ad