Statistical physics of complex networks

Statistical physics of complex networks Sergei Maslov Brookhaven National Laboratory

Short history: complex systems before & after networks • Statistical physics of complex systems was active in 80’s-90’s (following the chaos boom of 70’s) • Fractals (Mandelbrot and many others) • Self-Organized Criticality (Per Bak and co-authors)  sandpiles  granular systems • Complex==multiple time and length scales (e.g. avalanches)  Cult of power-laws • Cellular automata (mostly in real space+time) • Examples: • earthquakes • disordered moving interfaces • (co)-evolution of species • agent-based modeling (“ants”) • By the end of 90’s breakup of the community and specialization • Biology • Economics and finance • Internet • Social sciences

Networks in complex systems • Complex systems • Large number of components interacting with each other • All components and/or interactions are different from each other (unlike in traditional physics where 1023 electrons are all the same!) • Paradigms: • 104 types of proteins in an organism, • 106 routers in the Internet • 109 web pages in the WWW • 1011 neurons in a human brain • The simplest property: who interacts with whom? can be visualized as a network • Complex networks are just a backbone for complex dynamical processes

Why study the topology of complex networks? • Lots of easily available data: that’s where the state of the art information is (at least in biology) • Large networks may contain information about basic design principles and/or evolutionary history of the complex system • This is similar to paleontology: learning about an animal from its backbone

Inside single cells

A small part of a metabolic network: the citric acid cycle

Metabolic pathway chart by ExPASy

Protein binding networks Baker’s yeast S. cerevisiae(only nuclear proteins shown) Nematode worm C. elegans

Transcription regulatory networks Single-celled eukaryote:S. cerevisiae Bacterium:E. coli

GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM bio-chemical reactions slide after Reka Albert

Between cells in a multi-cellular organism

Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

C. elegans neurons

Between organisms

Freshwater food web by Neo Martinez and Richard Williams

Sexual contacts: M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, 167-256 (2003).

Social

High school dating: Data drawn from Peter S. Bearman, James Moody, and Katherine Stovel visualized by Mark Newman

Network of actor co-starring in movies

Networks of scientists’ co-authorship of papers

Webpages connected by hyperlinks on the AT&T website circa 1996 visualized by Mark Newman Citation networks are similar to the WWW but time-ordered

Technological

Internet as measured by Hal Burch and Bill Cheswick's Internet Mapping Project.

transportation networks: airlines

transportation networks: railway maps Tokyo rail map

Lecture 1: General introduction into networks • Node degrees, its distribution, and correlations • Simple models • preferential attachment and Simon model • Growth model for protein families • Percolation transition on networks • Clustering coefficient • Lectures 2-3: Biomolecular (mostly protein) networks • Regulatory and signaling networks • How many regulators? Bureaucratic collapse • Network motifs in directed (e.g. regulatory) networks • Protein binding networks • Broad degree distributions in protein binding networks and possible explanations • Evolutionary (duplication-divergence) • Biophysical (stickiness) • Functional • Beyond degree distributions: How it all is wired together? Correlations in degrees • Randomization of networks • Law of Mass Action and propagation of perturbations • Lecture 4: Technological and information networks • Diffusion and modules in the Internet, WWW, and scientific citations • Predicting opinions of customers on products (e.g. movies) using knowledge networks

Degree (or connectivity) of a node – the # of neighbors Degree K=2 Degree K=4

Directed networks havein- and out-degrees In-degree Kin=2 Out-degree Kout=5

Degree distributionsin random and real networks

Poisson distribution Degree distribution in a random network • Randomly throw E edges among N nodes • Solomonoff, Rapaport, Bull. Math. Biophysics (1951)Erdos-Renyi (1960) • Degree distribution – Binominal  Poisson • K~ with no hubs(fast decay of N(K))

Degree distribution in real protein binding network • Histogram N(K) is broad: most nodes have low degree ~ 1, few nodes – high degree ~100 • Can be approximately fitted with N(K)~K- functionalformwith ~=2.5

Many real world networkshave broad degree distributions

3 1 2 Basic BA-model • Very simple algorithm to implement • start with an initial set of m0 fully connected nodes • e.g. m0 = 3 • now add new vertices one by one, each one with exactly m edges • each new edge connects to an existing vertex in proportion to the number of edges that vertex already has → preferential attachment • easiest if you keep track of edge endpoints in one large array and select an element from this array at random • the probability of selecting any one vertex will be proportional to the number of times it appears in the array – which corresponds to its degree 1 1 2 2 2 3 3 4 5 6 6 7 8 ….

3 1 1 2 2 3 3 3 4 1 1 1 1 1 2 2 2 3 3 34 4 2 2 2 5 3 4 1 1 2 2 2 3 3 3 3 4 4 4 5 5 generating BA graphs – cont’d • To start, each vertex has an equal number of edges (2) • the probability of choosing any vertex is 1/3 • We add a new vertex, and it will have m edges, here take m=2 • draw 2 random elements from the array – suppose they are 2 and 3 • Now the probabilities of selecting 1,2,3,or 4 are 1/5, 3/10, 3/10, 1/5 • Add a new vertex, draw a vertex for it to connect from the array • etc.

The tale of linear vs exponential growth • Linear growth: Barabasi-Albert model with =3 is a version of the Simon’s word usage model: =2+ • dnk/dt=(k-1)nk-1/(t+t)-knk/(t+t) • Exponential growth: Protein duplication-deletion model: =2+/(dup-del) • dnk/dt=dup (k-1)nk-1- (dup+del )knk++del (k+1)nk+1; NF=knk also grows exponentially: dNF/dt=  NG=  kknk

Preferential attachment with fitness • Bianconi-Barabasi (2001) • Attractiveness of a node to new edges is given by fiki/rfrkr • For uniform (f): Pk ~ k-(1+C*)/ln(k), where C*=1.255 • Generally C depends on (f) • Some (f) result in “Bose-Einstein condensation” in which super-hubs emerge

Percolation transition in networks

Why should we care? • The most important property of a network. It quantifies how broken-up is a network • Below the percolation threshold: many small components • At the percolation threshold: scale-free distribution of component sizes: P(S)=S-2.5 • Above the percolation threshold: giant connected component and a few small ones? • Determines the propagation of perturbations which affect neighbors with probability p (e.g. infections)

Naïve (and wrong) argument • An average node has <K>first neighbors, <K><K-1>second neighbors, <K><K-1><K-1>third neighbors • We neglect overlap between e.g. second and first neighbors: in random networks a small effect ~1/N • If <K-1>  1 a single node is connected to a finite fraction of all nodes in the network

Where is it wrong? • Probability to arrive at a node with K neighbors is proportional to K! • All averages have to be modified <F(K)> <F(K) K>/<K> • The right answer: <K(K-1)>/<K>  1a perturbation would spread • In directed networks it is <KinKout>/<Kin>  1 • Correlations between degrees of neighbors and an abnormally large number of triangles (clustering) would affect the answer

How many clusters? • If <K(K-1)>/<K> <<1 there are only small clusters • If<K(K-1)>/<K>  1cluster sizes S have a scale-free distribution: P(S)~S-2.5. • If <K(K-1)>/<K> >> 1 there is one “giant” cluster and a few small ones • Perturbation which affects neighbors with probability p propagates if p<K(K-1)>/<K> 1 • For scale-free networks P(K)~K- with <3, <K2>=  perturbation always spreads in a large enough network

Diameter and mean cluster size are determined by <k(k-1)>/<k> • Mean diameter L: 1+<k>+ <k><k(k-1)>/<k>+ <k>(<k(k-1)>/<k>)L==N  L  log(N/<k>)/log(<k(k-1)>/<k>)+1 • Mean cluster size below pc:<S>=1+<k>/(1-<k(k-1)>/<k>)

Amplification ratios • A(dir): 1.08 - E. Coli, 0.58 - Yeast • A(undir): 10.5 - E. Coli, 13.4 – Yeast • A(PPI): ? - E. Coli,26.3 - Yeast

Clustering coefficient C • C=3 N/knk k(k-1)/2 • Could be defined for individual nodes or as a function of k: C(k)=3 N(k)/nk k(k-1)/2 • C=1 could not be realized if k is heterogeneous • Needs to be compared to its value in randomized networks with the same degree sequence

End lecture 1

Lecture 2

Protein networks

Places to learn molecular biology • Molecular Biology of the Cell. Fourth Edition. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter. Garland Science. 2002. • DNA from the beginning. http://www.dnaftb.org/ • Online Biology Book. http://gened.emc.maricopa.edu/bio/bio181/BIOBK/BioBookTOC.html • Kimball’s Biology Pages. http://www.ultranet.com/~jkimball/BiologyPages/ • Gene expression. http://vlib.org/Science/Cell_Biology/gene_expression.shtml • Human Genome Project. http://www.ornl.gov/hgmis/ • Microarrays. http://www.gene-chips.com/ From Prof. Michael Hallett (McGill) online lectures

Protein networks • Nodes – proteins • Edges – interactions between proteins • Metabolic (protein enzymes on sharing common metabolites are connected) • Physical (binding interactions) • Regulatory and signaling (transcriptional regulation, protein modifications) • Co-expression networks from microarray data (connect genes with similar expression (abundance) patterns under many conditions) • Genetic interactions e.g. synthetic lethal protein pairs (removal of any one of the two proteins doesn’t kill the cell, but removal of both proteins does) • Etc, etc, etc.

Statistical physics of complex networks

Statistical physics of complex networks

Presentation Transcript

Statistical Physics

Biological networks and statistical physics

Basic Models of Complex Networks

Vulnerability of Complex Networks

SURVIVABILITY OF COMPLEX NETWORKS

Complex Networks

Complex Networks

Statistical Physics 2

Architecture of Complex Weighted Networks

Statistical mechanics approach to complex networks: from abstract to

Hyperbolic mapping of complex networks

Measuring Properties of Complex Networks

Application of statistical physics to random graph models of networks

Statistical physics of transportation networks

Statistical Physics 1

Complex (Biological) Networks

Complex Networks

An Introduction of Complex networks

Complex Networks

Statistical Physics

Complex (Biological) Networks

Complex Networks