210 likes | 349 Views
Sai Moturu. Introduction. Current approaches to microarray data analysis Analysis of experimental data followed by a posterior process where biological information is incorporated to make inferences Integrative analysis technique in this paper
E N D
Introduction • Current approaches to microarray data analysis • Analysis of experimental data followed by a posterior process where biological information is incorporated to make inferences • Integrative analysis technique in this paper • Integrate gene annotation with expression data to discover intrinsic associations among both data sources based on co-occurrence patterns
Methods and Data • Association Rules Discovery • Gene expression data • Gene annotation: Gene ontology categories, metabolic pathways and transcriptional regulators • Applied to two previously studied experiments
Association Rules Discovery • Antecedent -> Consequent X -> Y • Measures of Quality • Support: P(XυY) • Confidence: P(Y|X) = P(XυY)/P(Y) • Improvement: Confidence/Consequent = P(XυY)/(P(X)*P(Y))
Association Rules Discovery • Itemsets • Genes and the set of experiments in which gene is over or underexpressed • Gene characteristics • Constraint • Antecedent needs to be gene annotation • Expression Thresholds • Genes with log expression values >1 are overexpressed and <-1 are underexpressed (two fold)
Mining Association Rules • The association rules that we are interested in have low support values and high confidence values • A variant of the apriori algorithm is used that has helped previously with mining low support-high confidence biologically significant patterns
Filtering • Major drawback with association rules is the number of rules generated is huge • Also there is redundancy • This is taken care of with two filters • Redundant filter • Single antecedent filter
Diauxic shift dataset • Gene expression accompanying the metabolic shift from fermentation to respiration that occurs when fermenting yeast cells • Expression levels recorded at 7 time points • External information • Metabolic pathways • Transcriptional regulators
Results • Association rules among metabolic pathways and expression patterns • 1126 out of over 6000 genes were annotated with at least one pathway • Association rules with minimum support of 5, minimum confidence of 40% and minimum improvement of 1 • Redundant and single antecedent filters applied • 21 association rules
Results • Association rules among transcriptional regulators and expression patterns • 3490 genes were annotated with at least one regulator • Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 • Redundant filter applied • 28 association rules
Results • Association rules among transcriptional regulators, metabolic pathways and expression patterns • 3882 genes • Association rules with minimum support of 5, minimum confidence of 80% and minimum improvement of 1 • Redundant filter applied • 37 association rules
Serum stimulation dataset • Gene expression program of human fibroblast after serum exposure • External information • Gene ontology terms
Results • Association rules among biological process annotation and expression patterns • 4092 genes of over 8000 • Support of 4, min confidence of 10% and min improvement of 1 • Single antecedent and redundant filters applied • 12 associations
Results • Association rules among terms from all GO categories • 4630 genes of over 8000 • Support of 4, min confidence of 10% and min improvement of 1 • Redundant filter applied • 31 associations
Conclusions • Some of the biological implications matched the ones found experimentally • The others could be explored further • Integrative data analysis is very useful for meaningful discoveries using gene expression data