1 / 9

Vertical Data Integration for Clinical Genomics

Andrea Calabria CNR - Institute for Biomedical Technologies Università degli Studi Milano Bicocca – DISCO Thursday, September 4, 2014. Vertical Data Integration for Clinical Genomics. PhD Thesis, cycle XXIII. Project Context and Domain. Genetic Studies Genome Wide Association Studies

wilda
Download Presentation

Vertical Data Integration for Clinical Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andrea Calabria CNR - Institute for Biomedical Technologies Università degli Studi Milano Bicocca – DISCO Thursday, September 4, 2014 Vertical Data Integration for Clinical Genomics PhD Thesis, cycle XXIII

  2. Project Context and Domain • Genetic Studies • Genome Wide Association Studies • Family Based and Population Based • Domain • Complex Diseases, focus on Brain dysfunctions and NeuroPathologies: Alzheimer (medium stage), Schizophrenia • Data Types • Personal Data • Phenotypes: clinical data, functional magnetic resonance • Genotypes: using SNPs (Single Nucleotide Polymorphism) A.Calabria PhD Thesis XXIII cycle

  3. Motivation and Objective • Data Mining and Genetics Studies (Linkage analysis, CNV, etc) on brain diseases need for Data Integration and High Performance Infrastructures with distributed environmet • Data Integration must be both Vertical and Horizontal • security and privacy policies for experimental data • Grid Environment merges security and privacy issues and distributed computing • Project’s Objective • to designVertical Integration on experimental data in Grid environment for genetics studies and data mining analyses purpose A.Calabria PhD Thesis XXIII cycle

  4. Why Grid Environment A.Calabria PhD Thesis XXIII cycle

  5. Application Layer – Genetics Studies • Algorithm domain related (population genetics studies) • linkage analysis: the problem is computational intensive, NP-hard. Limits are related to number of markers (<40). • Our problem: chip of 1M SNPs (markers), need to compute linkage analysis for population with 1M SNPs • Solution: heuristic for distributing linkage analysis • Preliminary results on Cluster: 70% average time improvement respect to single CPU • Work in progress • grid porting algorithm and comparison performance test • specific monitoring and job controlling system • Next steps • release linkage on grid as web services A.Calabria PhD Thesis XXIII cycle

  6. Data Layer – Horizontal Integration • Database of Genotypes • genotypic database design and creation • standard HL7 analysis • Work in progress • HL7 application to global database schema • database porting in EGEE grid with AMGA • Next steps • studies of grid db problems related to distribution, federation and hub and spoke paradigm adaptation for extension to biobanks approach • testing of data integration A.Calabria PhD Thesis XXIII cycle

  7. Data Layer – Vertical Integration • Objective • integrate genes’ knowledge with data fusion approach • Genes’ Knowledge quality control • predicted genes can present conflicts among main different databases (NCBI, EnsEMBL, UCSC) • conflicts could affect analyses  need for evaluating conflict impact within the genome • Work in progress and Next Steps • data extraction (API, Web Services, DB access, parsing) • data integration • data fusion: conflict analysis and evaluation A.Calabria PhD Thesis XXIII cycle

  8. Project Plan • Linkage algorithm Grid enabling (May-September) • grid porting • application testing and performance measurements • Gene-oriented Data quality (September-November) • data extraction • genes’ knowledge integration • conflicts evaluation • Database Design for Grid porting (November-March) • HL7 schema design • AMGA database creation • Query and data management issues, data import and testing A.Calabria PhD Thesis XXIII cycle

  9. References • Bibliografy • Pubblications A.Calabria PhD Thesis XXIII cycle

More Related