Detecting Network Motifs in Gene Co-expression Networks

Detecting Network Motifs in Gene Co-expression Networks Shahar Artzi

Outline • IntroductiontoNetwork Motifs • Introduction to Detecting Network Motifs • Co-expression Network • Network Motif Discovery • Scale-Free Networks • Result • Discussion

Network Motifs • What is “Network Motifs” ? Network Motifs are defined as patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. • Why we need them? * To help us understand how biological networks work. * Exact forecasting of operation and reaction in the network under given situations.

The concept of “network motifs” was first proposed by Uri Alon’s group :

Schematic View of Network Motif Detection

Examples for motifs • FeedForward Loop (היזון קדמי) Found in neural networks. It seems to be Used for neutralize “Biological Noise”. X Y Z • Single-Input Module(פס יצור) X Implemented in gene control networks a b c d

Examples for motifs • Parallel paths W Found in neural networks. (and not so much in gene networks) Y X Z

INTRODUCTIONto Detecting Network Motifs • Functionally related genes could be clustered together based on similar expression profiles. Problem : • General clustering algorithms produce clusters of relatively large size, making it difficult to test the cluster of interest using wet-lab experiments. • General clustering algorithms do not provide reasonably detailed information about the relationship among genes in a cluster. • This makes it even more difficult for individual researchers to verify the predictions experimentally.

INTRODUCTIONto Detecting Network Motifs • We need a way to break big clusters into smaller ones and thereby provide more detailed insights into relationships among genes within subclusters. • To achieve this goal: we need additional independent information. In this researchthey used Protein sequence information • The algorithms are able to decompose the clusters of genes into smaller ones by integrating protein domain information into the clustering algorithm.

Co-expression Network • The genes are vertices (nodes). • An unweighted, and undirected edge (connection) is put between two genes if they are co-expressed with correlation higher than a specified threshold. Correlation Matrix Adjacency Matrix Cutoff : 0.8

Network Motif Discovery • Here we extended the concept of network motif (Alon’s group proposal) to the labeled graph by also considering patterns of vertex labels.

Network Motif Discovery • Figure B shows 7 hypothetical genes in a co-expression network forming three distinct instances of the network motif as described in Figure A. II and III are overlapping I and II are non-overlapping (I and III are also non – overlapping)

Network Motif DiscoverySchematic View < G, k, f > Enumeration of k-vertex cliques Protein domain Groups of cliques f := number of non-overlapping cliques Network motifs

Network Motif DiscoveryScoring Method • First step : Starting from a labeled graph whose vertices (genes) were labeled with their corresponding annotated Pfam protein domain information. A generic clique-based clustering algorithm was applied to this labeled co-expression network to search for patterns of highly co-expressed genes or network motifs. We’ll go over the clustering algorithm briefly. The algorithm work in Scoring Method :

Network Motif DiscoveryScoring Method • Our first step is to determine which genes appear to discriminate best among sample types. • To accomplish this, a discrimination score is calculated for each gene. • Only the best genes (those with the highest scores) are retained for subsequent steps. • Subtracting the sum of the standard deviations of a gene within each group allows us to eliminate, or at least diminish, the importance of any gene whose expression levels vary excessively. • The data is obtained in an n x m matrix.

Network Motif DiscoveryScoring Method • The algorithm can be described in pidgin ALGOL as follows: From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening” *Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3

Network Motif DiscoveryScoring Method • Alternative approach : Elimination of outliers before computing the scores - outliers might affect both the median and the standard deviation. • we modified our approach by adding a screening phase, in which we first compute the medians and the standard deviations for each gene within each group, then check the expression values corresponding to that gene, discarding those at least three standard deviations away from the group median. • We subsequently re-compute all medians and standard deviations using only the retained values.

Network Motif DiscoveryScoring Method • We describe this modified algorithm in pidgin ALGOL: continuance From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening” *Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3

Network Motif DiscoveryScoring Method Algorithm cont’ :

Network Motif DiscoverySubsequent Steps • For a specified k, we scanned all k-vertex cliques and grouped all cliques found based on the protein domain information. • Next a parameter f specifying the minimum number of mutual non-overlapping instances in a network motif was used to trim the list of putative network motifs. • Only the putative network motifs having at least f non-overlapping instances were kept as network motifs. • We further assessed the statistical significance of each detected network motif by comparison to randomized networks.

Network Motif DiscoverySubsequent Steps • Starting from the real co-expression network, we generated a randomized network by randomly permuting the domain labels of all genes while leaving the connection structure of the graph untouched, and then ran the same network motif detection procedure on the resulting randomized network. • This process was repeated 1,000 times and the percentage of times the same network motif was found in the randomized networks was defined as the p-value for the network motif.

Network Motif Discoverydomain matching levels • Here we propose only domain matching levels: • Matching level 2 : requires that two proteins have the exact same type of domain, the same number of each type of domain and all domains in the same order in the respective protein sequences. • Matching level 4 :only requires the same type of domain. • The network motif detection procedure was run separately using different domain matching levels.

C C B B A A C A C A A Examples Domain Matching level 2 1 1’ Domain Matching Level 4 2’ 2

Scale-Free Networks • DEFINITION: Scale-free networks, including the Internet, are characterized by an uneven distribution of connectedness. Instead of the nodes of these networks having a random pattern of connections, some nodes act as "very connected" hubs, a fact that dramatically influences the way the network operates. • Barabasi and his colleagues mapped the connectedness of the Web. • Their experiment yielded a connectivity map that they christened "scale-free."

Scale Free Vs Random Random networks suffer from random failures because each node important as any other "scale free" networks are more immune to random failure due to the redundancy of paths linking nodes connectivity ensured by few highly connected nodes "scale free" networks are prone to catastrophic failure when key "hubs" are attacked

Scale Free Vs Random (a) constructed by laying down N nodes and connecting each pair with probability p. This network has N = 10 and p = 0.2. (b) A new node (red) connects to two existing nodes in the network (black) at time t + 1. This new node is much more likely to connect to highly connected nodes, a phenomenon called preferential attachment. (c) The network connectivity can be characterized by the probability P(k) that a node has k links. For random graphs P(k) is strongly peaked at k = <k> and decays exponentially for large k. (d) A scale-free network does not have a peak in P(k), and decays as a power law P(k) ~ kg at large k. (e) A random network - most nodes have approximately the same number of links. (f) The majority of nodes in a scale-free network have one or two links, but a few nodes have a large number of links; this guarantees that the system is fully connected

Scale Free Vs Random • More than 60% of nodes (green) can be reached from the • five most connected nodes (red) • compared with only 27% in the random network. • This demonstrates the key role that hubs • play in the scale-free network. • Both networks contain the 130 nodes and 430 links.

Results • To convert a correlation matrix to a corresponding co-expression network, a suitable cutoff value for the correlation coefficient must be chosen. • Based on the previous reports that biological networks, including co-expression networks, follow a scale-free distribution of connectivities, we chose a cutoff value which gave fewer vertices with higher degree (connectivity). • The correlation cutoff value of 0.95 is appropriate

Results • Using a series of values for parameters k and f, we found a number of putative network motifs under different domain matching levels

Results • As shown in Table 1, both increasing k, the size of network motifs and f, the minimum number of non overlapping instances, decrease the number of network motifs detected. • To gain further confidence in our predictions, we used yeast protein interaction data that includes rich protein complex information, and searched within this data for instances of putative malaria network motifs. • We can see (in the next slides) that more malaria network motifs were supported by yeast interaction data as the parameters became more stringent.

ResultsConfirmation of Prediction by Yeast Protein Interactions Percentage of network motifs having instance in yeast PPI network. Domain matching level 2

ResultsConfirmation of Prediction by Yeast Protein Interactions Percentage of network motifs having instance in yeast PPI network. Domain matching level 4

Results -Example 1 • This figure shows a putative network motif detected under domain matching level 2, k = 6 and f = 2. This motif consists of six highly co-expressed genes.

Results -Example 1 • This figure shows the instances formed by different combinations of 27 genes detected in the yeast protein interaction network for the malaria network motif shown in the last figure. • Protein complex 11635 contains six genes forming an exact instance of • the predicted network motif.

Results -Example 2 • We hypothesized that individual instances of a network motif could function in different locations and times, dependent upon regulation. • The malaria time series data enables us to test this hypothesis by examining the temporal expression profiles of instances of network motifs. • Next figure shows such an example network motif detected with parameter values at domain matching level 2, k = 3 and f = 2.

Results -Example 2

Results -Example 2 • Apparently these six genes all have similar expression profiles and the only major difference is the timing. • There is a phase difference between two instances while all three genes within each of two instances have the same expression profiles. • Having instances in yeast protein interaction data provide further support that these genes do interact directly • More Results • http://mouse.ornl.gov/~xpv/camda04/index.html

DISCUSSION • massive amounts of experimental data have been collected • New computational approaches are needed to analyze these data in an integrative way • provide more reliable results with finer resolution for experimental verification • Here we propose a new strategy to analyze gene expression data by integrating a diversity of additional information, such as: • primary sequence information • protein interaction data.

DISCUSSION • Our approach can easily make use of cross-species information. • Biological hypothesis Modularity of biological networks • How genes within a module are orchestrated at mRNA level is our next research question.

THE END

References • Detecting Network Motifs in Gene Co-expression Networks Xinxia Peng et al • http://www.weizmann.ac.il/mcb/UriAlon/index.html • Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al

Detecting Network Motifs in Gene Co-expression Networks