1 / 21

Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson

Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine. “Step change”. Larsen. Why?. Technology Paradigm shift Genomic properties. EUCCONET Data Management Workshop. Clinical meaning ???????. Raw data.

Download Presentation

Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Future directions in genetic epidemiology, impact on IT and Data requirements Nic Timpson MRC CAiTE Centre Department of Social Medicine

  2. “Step change” Larsen

  3. Why? • Technology • Paradigm shift • Genomic properties EUCCONET Data Management Workshop

  4. Clinical meaning ??????? Raw data

  5. Two of the driving technologies: Chip based genotyping Next Generation Sequencing (NGS) EUCCONET Data Management Workshop

  6. Basic flat Illumina output… EUCCONET Data Management Workshop

  7. Derivation of flat file data from image based intensity reads: EUCCONET Data Management Workshop

  8. HAPMAP - Illumina - Affymetrix CHR16_HAPMAP.recode.ped CHR16_HAPMAP.recode.map red_test_run_assoc.txt genetic_map_chr16.txt EUCCONET Data Management Workshop

  9. EUCCONET Data Management Workshop

  10. NOD2 Crohn’s association Position (Mb) EUCCONET Data Management Workshop

  11. EUCCONET Data Management Workshop

  12. Consequent shifting budgets… ~$5 + billion Per genome ~$70 million ~$1 million ~$ 60 000 Venter & Watson HGP NGS ~20Tb 1- Candidate 2- CHIP (designer) 3- Affy 500 4- Intensity data 5- NGS data (*LC) ~10Gb Data (bytes) ~2Mb Based on n~5000 EUCCONET Data Management Workshop

  13. Based on the storage of re-sequence data, one can consider storage requirements for a next generation sequencing effort: Assuming a storage cost of about 1.5byte per bp of sequence reads for a low coverage ~2000 samples (as per UK10K for example) x 3 billion bp x 1.5 = 10 terabytes. That doesn't include any subsequent parsed data Double this just to have the data in all formats one might be able to use meaningfully. Yields ~20Tb “20 Tb is pretty small these days” if buying new storage capacity just to do this alone one may therefore be better accounting for up to 50-100Tb if buying bespoke. Cost – service costs can be as high as £1500 per Tb NGS project on some 2000 individuals can be as much as 40-50k on computing alone. EUCCONET Data Management Workshop

  14. Also receiving data on: Copy number variation across the genome Expression data (e.g. records of messenger RNA to track gene activity) Methylome (markers of the epigenome) Not to mention phenotype data (a retrospective effort and an ever increasing pool) Raises the issue of linkage and data USE… EUCCONET Data Management Workshop

  15. Not just storage… EUCCONET Data Management Workshop

  16. EUCCONET Data Management Workshop

  17. EUCCONET Data Management Workshop

  18. D’ vs r^2 Varying matrix properties and overlaid ribbon plots: (here MAF) Male vs Female EUCCONET Data Management Workshop

  19. Combinations of data processing/visualisation methods: e.g. follow-up of the dissection of the TCF2 locus and the counter results for T2D and prostate cancer - other T2D loci? CDKAL See: Amundadottir et al Nature Genetics 2007 EUCCONET Data Management Workshop

  20. Not to mention iterative approaches! Generation of empirical distributions for the purpose of comparison, e.g. expression data Gene X Gene (and possibly environment) interation analysis which may span the genome EUCCONET Data Management Workshop

  21. Overall As would expect, data requirements are increasing Genetic epidemiology has been boosted into a realm of real findings and Exciting capability by the existence of new technology Increases may (or may not) be more rapid than once thought Storage and manipulation of large data sets present new challenges A new breed of analysts is emerging The computer scientist with a passion for biology Perhaps windows is dead… EUCCONET Data Management Workshop

More Related