CSCE555 Bioinformatics. Lecture 21 Integrative Genomics Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555. University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. Outline.
Meeting: MW 4:00PM-5:15PM SWGN2A21
Instructor: Dr. Jianjun Hu
Course page: http://www.scigen.org/csce555
University of South Carolina
Department of Computer Science and Engineering
Information is not knowledge - Albert Einstein
Integrative Genomics - what is it?
Acquisition, Integration, Curation, and Analysis of biological data
Integrative Genomics: the study of complex interactions between genes, organism and environment, the triple helix of biology. Gene <–> Organism <-> Environment
It is definitely beyond the buzzword stage - Universities now have programs named 'Integrated Genomics.'
How to integrate multiple types of genome-scale data across experiments and phenotypes in order to find genes associated with diseases
Bioinformatics & the “omes
382 “omes” so far………
and there is “UNKNOME” too - genes with no function known
With Some Data Exchange…
No. of Human Gene Records currently in NCBI: 29413 (excluding pseudogenes, mitochondrial genes and obsolete records).
Includes ~460 microRNAs
NCBI Human Genome Statistics – as on February12, 2008
A researcher would have to scan 130 different journals and read 27 papers per dayto follow a single disease, such as breast cancer (Baasiri et al., 1999 Oncogene 18: 7958-7965).
Data is downloaded, filtered, integrated and stored in a warehouse. Answers to queries are taken from the warehouse.
Biomedical WorldNo Integrative Genomics is Complete without Ontologies
Example Study: Disease Gene Identification and Prioritization
Hypothesis: Majority of genes that impact or cause disease share membership in any of several functional relationships OR Functionally similar or related genes cause similar phenotype.
Known Disease Genes
Mining human interactome
Direct Interactants of Disease Genes
Indirect Interactants of Disease Genes
New features added
• string algorithms
• dynamic programming
• machine learning (NN, k-NN, SVM, GA, ..)
• Markov chain models
• hidden Markov models
• Markov Chain Monte Carlo (MCMC) algorithms
• stochastic context free grammars
• EM algorithms
• Gibbs sampling
• tree algorithms
• text analysis
• hybrid/combinatorial techniques and more…