1 / 18

Exploring Data Integration for Mouse Models of Human Disease: Centralized vs Distributed Solutions

This presentation discusses the critical need for data interoperability and integration in the context of mouse models for human disease. It evaluates centralized and distributed solutions, highlighting advantages like improved query performance and challenges such as the complexity of data deposition. Case studies including 1000 Genomes and the International Cancer Genome Consortium illustrate practical applications. Strategy recommendations for phenotype data integration through high-throughput phenotyping centers and database interoperability are also presented, emphasizing collaborative efforts within the scientific community.

velma
Download Presentation

Exploring Data Integration for Mouse Models of Human Disease: Centralized vs Distributed Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phenotype database interoperability and integration Damian Smedley, EBI Mouse models for human disease

  2. Why do we need data integration and interoperability? Mouse models for human disease

  3. Centralised vs distributed solutions Distributed solution Centralised warehouse v2 Centralised warehouse v1 Strains portal Genomics MGI JaxMice IMSR EMMA Ensembl Central database nightly data syncs web services IKMC projects Phenotype/Expression KOMP EUCOMM NorCOMM TIGM Eurexpress /GXD etc Europhenome Mouse models for human disease

  4. Centralised solutions Advantages • Better query performance for large datasets • Easier to analyse raw data in one location Disadvantages • Regular data deposition is non-trivial • Designing a single schema to store different types of data is not simple. • Persuading people to “give up” their data/databases/websites • Will still need to make interoperable with other data sources Mouse models for human disease

  5. Distributed solutions Advantages • Domain expertise at production site exploited • Different types of data easily integrated as long as they share something in common such as a gene identifier • No need for nightly data flow to keep data up to date • No need for redundant data in each database • Easier to persuade people to collaborate in a distributed scenario Disadvantages • Technical knowledge required to deploy the web services • Potential query performance problems for large datasets (may need to provide summary level data) • Potential problems performing analysis over all datasets • Problems with services going down Mouse models for human disease

  6. 1000 Genomes - centralisation Mouse models for human disease

  7. International Cancer Genome Consortium France Liver (alcohol-related) Breast (HER2+ve) UK Breast (several subtypes) Japan Liver (virus related) Canada Pancreas China Stomach Spain CLL India Oral Cavity Australia Pancreas Mouse models for human disease

  8. ICGC - distributed Mouse models for human disease

  9. Joint Ensembl and EurExpress query Mouse models for human disease

  10. TIGM GXD EUCOMM Eurexpress KOMP NorCOMM EMMA KOMP rep CMMR IMSR IKMC portal: knockoutmouse.org Europhenome Ensembl CREATE Mouse models for human disease

  11. IKMC interoperability strategy MGI ID MGI ID MGI ID MGI ID BioMart query interface(s) MGI ID MGI ID MGI ID CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC EURExpress Sanger, UK Edinburgh, UK ES cells + lines MGI EMMA (UK), KOMP (USA), CMMR (Canada) Phenotype(EuroPhenome etc) JAX, USA Harwell, UK Mouse models for human disease

  12. www.knockoutmouse.org/martsearch Mouse models for human disease

  13. Europhenome: raw and summary data Mouse models for human disease

  14. Possible strategy for phenotype data High thoughput phenotyping centres CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC MGI ID MGI ID Central database EURExpress MGI ID Sanger, UK MGI ID Edinburgh, UK MGI ID BioMart query interface(s) ES cells + lines Presentation of raw results Analysis to assign phenotypes to genes MGI ID MGI EMMA (UK), KOMP (USA), CMMR (Canada) JAX, USA MGI ID High throughput phenotyping Mouse models for human disease

  15. Linking from IKMC portal Phenotype searches Phenotyping Mouse models for human disease

  16. Linking from IKMC portal Mouse models for human disease

  17. Mouse models for human disease Mouse models for human disease

  18. Acknowledgements The whole CASIMIR consortium and in particular: Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock: MouseFinder tool. MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora Mouse models for human disease

More Related