1 / 9

Network randomization

Network randomization. Need for graph randomization. Negative controls in biology In bioinformatics as well as in wet lab biology, negative controls are essential to estimate the relevance of an observation.

iria
Download Presentation

Network randomization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network randomization Sylvain Brohée <sylvain@bigre.ulb.ac.be> Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  2. Need for graph randomization • Negative controls in biology • In bioinformatics as well as in wet lab biology, negative controls are essential to estimate the relevance of an observation. • The principle of the negative control is that you perform the same (wet or in silico) experiment but in conditions where the results has to be negative. • If a method is good, the negative control will return a negative answer. • Randomization as in silico negative control methodology • The methods allowing to study a network and its properties (topology, network comparison, cluster structure) can be applied to randomized networks, as a negative control. • There are many ways to randomize a network. • The choice of the randomization method is essential and depends on the question to be addressed. • Applications • To compare methods (e.g. protein complexes from protein interaction networks), in order to estimate their respective merits in terms of difference between real data and on a randomized network. • To estimate the “signal” that is present in a given data set: if the data set contains signal, it should behave differently from a randomized dataset (considered as made of “noise”).

  3. Major ways of randomizing networks • Randomization with Erdös-Rényi model • Keeps the same number of nodes and edges as in the original graph. • First create all the nodes, and then draw edges between nodes, with equal probability for all nodes. • Does not preserve the network topology (e.g. “hubs” of the original networks are lost in the randomized one). • Example of application: assessing the impact of the presence of hubs on path finding methods. ER Randomization

  4. Major ways of randomizing networks • Randomization with node degree conservation • consists in shuffling the edges between the nodes. • keeps the same number of nodes and edges as in the original graph. • Moreover, each node keeps the same degree. • good random control for biological networks as it keeps the global network topology. Hubs stay hubs but are not connected to the same nodes. • Example of application: assessing the random expectation of clustering methods. ND Randomization

  5. Major ways of randomizing networks • Randomization with node degree conservation • consists in shuffling the edges between the nodes. • keeps the same number of nodes and edges as in the original graph. • Moreover, each node keeps the same degree. • good random control for biological networks as it keeps the global network topology. Hubs stay hubs but are not connected to the same nodes. • Example of application: assessing the random expectation of clustering methods. ND Randomization

  6. Major ways of randomizing networks • Permutation of node labels • Consists in switching the node names. • Keeps the same number of nodes and edges as in the original graph. • Completely keeps the network topology, but the identities of the nodes are changed. • This mode of randomization is relevant only for applications where the node identity matters. • Example of application: comparison between graph clusters and functional catalogs (e.g. Metabolic pathways, Gene Ontology, regulons, ...). Node label randomization

  7. Comparison example. Uetz et al (2000) vs Ito et al (2001)‏ Number of edges in common : 122 Jaccard : 8% P-value : 2.5e-228 Comparison example - Uetz (2000) versus Ito (2001)

  8. Comparison example. Uetz et al (2000) vs Ito et al (2001)‏ Node degree conservation randomization of Uetz dataset Number of edges in common : 4 Jaccard : 0.24% P-value : 6.8e-03 Random control for the comparison Uetz (2000) versus Ito (2001)

  9. Tools for network randomization • NeAT • http://rsat.ulb.ac.be/neat/ • GraphCrunch • http://www.ics.uci.edu/~bio-nets/graphcrunch/ • Network Workbench • http://nwb.slis.indiana.edu/

More Related