220 likes | 336 Views
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts. http://acgt.cs.tau.ac.il/matisse Igor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). . Microarray data analysis.
E N D
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts http://acgt.cs.tau.ac.il/matisse Igor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).
Microarray data analysis • Input: expression levels of (all) genes in several conditions • Analysis methods: • Clustering (CLICK) • Biclustering (SAMBA) • Extraction of regulatory networks
Protein interaction network analysis • Input: Network with nodes=proteins/genes edges=interactions • Analysis methods: • Global properties • Motif content analysis • Complex extraction • Cross-species comparison
Integrated analysis • Combined support for low quality data • Joint visualization • Statistics of known pathways • Detection of “hot spots”
MATISSE • Identify sets of genes (modules) that • Have highly correlated expression patterns • Induce connected subgraphs in the interaction network Interaction High Similarity
MATISSE workflow • Seed generation • Greedy optimization • Significance filtering
Advantages of MATISSE • No need for confidence estimation on individual measurements • Works even when only a fraction of the genes have expression patterns • Can handle any similarity data, not only expression • Produces connected modules • No need to specify the number of modules
Osmotic shock response of S. cerevisiae • Network of 6,246 genes and 65,990 protein-protein and protein-DNA interactions • 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke and Herskowitz, 2004) • 2,000 genes filtered based on variation criterion
Pheromone response subnetwork Back Front
Back Front Proteolysis subnetwork
Performance comparison % of modules % of modules with category enrichment at p< 10-3
Performance comparison (2) % of annotations % annotations w enrichment at p<10-3 in modules
Human cell cycle • Constructed a network with 6,000 nodes, 25,000 edges • HPRD • BIND • Y2H studies • SPIKE • HeLa cell cycle time series (Whitfield ’02) • Produced subnetworks enriched with all the phases of the cell cycle
Extensions of MATISSE • CEZANNE • Utilizes confidence-based networks • Extracts subnetworks that are connected with high confidence and co-expressed • Applied to 11 studies of gene expression in the blood • Not yet implemented in the MATISSE application
Extensions of MATISSE • DEGAS • Utilizes case-control expression data • Identifies disregulated pathways – areas in the network in which many genes are dysregulated in most of the cases • Beta version implemented in the MATISSE software • Ulitsky, Karp and Shamir RECOMB 2008
Difficulties with prior approaches • In case-control data, gene pattern correlation can be due to diverse non-disease related factors • Patients are different • Genetic background • Other diseases/confounding factors • Disease grade • Current methods assume that the same genes are dysregulated in all the patients • A weaker assumption – a lot of dysregulated genes appear in the same dysregulated pathway www.hrphotocontest.com
HD down-regulated • The pathway down-regulated in Huntington’s disease (HD) • Enriched with: • HD modifiers • HD relevant genes • Calcium signalling Clear outlier Huntingtin
Extensions of MATISSE • Identification of modules correlated with external parameters • Numerical parameters: Age, tumor grade etc. • Logical parameters: Gender, tumor type • Identifies subnetworks with genes that are both • Correlated with the clinical parameter • Correlated with one another
MATISSE tool capabilities • MATISSE algorithm execution • Dynamic subnetwork layout • Customized node/edge highlighting • Dynamic expression matrix viewer • Module annotation • TANGO – Gene Ontology • Annotations with custom datasets • Calculation of different coefficients based on network/expression