1 / 18

Phenotype database interoperability and integration

Phenotype database interoperability and integration. Damian Smedley, EBI. Why do we need data integration and interoperability?. Centralised vs distributed solutions. Distributed solution . Centralised warehouse v2 . Centralised warehouse v1 . Strains. portal. Genomics. MGI. JaxMice.

velma
Download Presentation

Phenotype database interoperability and integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phenotype database interoperability and integration Damian Smedley, EBI Mouse models for human disease

  2. Why do we need data integration and interoperability? Mouse models for human disease

  3. Centralised vs distributed solutions Distributed solution Centralised warehouse v2 Centralised warehouse v1 Strains portal Genomics MGI JaxMice IMSR EMMA Ensembl Central database nightly data syncs web services IKMC projects Phenotype/Expression KOMP EUCOMM NorCOMM TIGM Eurexpress /GXD etc Europhenome Mouse models for human disease

  4. Centralised solutions Advantages • Better query performance for large datasets • Easier to analyse raw data in one location Disadvantages • Regular data deposition is non-trivial • Designing a single schema to store different types of data is not simple. • Persuading people to “give up” their data/databases/websites • Will still need to make interoperable with other data sources Mouse models for human disease

  5. Distributed solutions Advantages • Domain expertise at production site exploited • Different types of data easily integrated as long as they share something in common such as a gene identifier • No need for nightly data flow to keep data up to date • No need for redundant data in each database • Easier to persuade people to collaborate in a distributed scenario Disadvantages • Technical knowledge required to deploy the web services • Potential query performance problems for large datasets (may need to provide summary level data) • Potential problems performing analysis over all datasets • Problems with services going down Mouse models for human disease

  6. 1000 Genomes - centralisation Mouse models for human disease

  7. International Cancer Genome Consortium France Liver (alcohol-related) Breast (HER2+ve) UK Breast (several subtypes) Japan Liver (virus related) Canada Pancreas China Stomach Spain CLL India Oral Cavity Australia Pancreas Mouse models for human disease

  8. ICGC - distributed Mouse models for human disease

  9. Joint Ensembl and EurExpress query Mouse models for human disease

  10. TIGM GXD EUCOMM Eurexpress KOMP NorCOMM EMMA KOMP rep CMMR IMSR IKMC portal: knockoutmouse.org Europhenome Ensembl CREATE Mouse models for human disease

  11. IKMC interoperability strategy MGI ID MGI ID MGI ID MGI ID BioMart query interface(s) MGI ID MGI ID MGI ID CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC EURExpress Sanger, UK Edinburgh, UK ES cells + lines MGI EMMA (UK), KOMP (USA), CMMR (Canada) Phenotype(EuroPhenome etc) JAX, USA Harwell, UK Mouse models for human disease

  12. www.knockoutmouse.org/martsearch Mouse models for human disease

  13. Europhenome: raw and summary data Mouse models for human disease

  14. Possible strategy for phenotype data High thoughput phenotyping centres CREATE Ensembl GXD EBI, UK JAX, USA Sanger, UK IKMC MGI ID MGI ID Central database EURExpress MGI ID Sanger, UK MGI ID Edinburgh, UK MGI ID BioMart query interface(s) ES cells + lines Presentation of raw results Analysis to assign phenotypes to genes MGI ID MGI EMMA (UK), KOMP (USA), CMMR (Canada) JAX, USA MGI ID High throughput phenotyping Mouse models for human disease

  15. Linking from IKMC portal Phenotype searches Phenotyping Mouse models for human disease

  16. Linking from IKMC portal Mouse models for human disease

  17. Mouse models for human disease Mouse models for human disease

  18. Acknowledgements The whole CASIMIR consortium and in particular: Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock: MouseFinder tool. MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora Mouse models for human disease

More Related