1 / 96

Statistical physics of complex networks

Statistical physics of complex networks. Sergei Maslov Brookhaven National Laboratory. Short history: complex systems before & after networks. Statistical physics of complex systems was active in 80’s-90’s (following the chaos boom of 70’s) Fractals (Mandelbrot and many others)

mrubin
Download Presentation

Statistical physics of complex networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical physics of complex networks Sergei Maslov Brookhaven National Laboratory

  2. Short history: complex systems before & after networks • Statistical physics of complex systems was active in 80’s-90’s (following the chaos boom of 70’s) • Fractals (Mandelbrot and many others) • Self-Organized Criticality (Per Bak and co-authors)  sandpiles  granular systems • Complex==multiple time and length scales (e.g. avalanches)  Cult of power-laws • Cellular automata (mostly in real space+time) • Examples: • earthquakes • disordered moving interfaces • (co)-evolution of species • agent-based modeling (“ants”) • By the end of 90’s breakup of the community and specialization • Biology • Economics and finance • Internet • Social sciences

  3. Networks in complex systems • Complex systems • Large number of components interacting with each other • All components and/or interactions are different from each other (unlike in traditional physics where 1023 electrons are all the same!) • Paradigms: • 104 types of proteins in an organism, • 106 routers in the Internet • 109 web pages in the WWW • 1011 neurons in a human brain • The simplest property: who interacts with whom? can be visualized as a network • Complex networks are just a backbone for complex dynamical processes

  4. Why study the topology of complex networks? • Lots of easily available data: that’s where the state of the art information is (at least in biology) • Large networks may contain information about basic design principles and/or evolutionary history of the complex system • This is similar to paleontology: learning about an animal from its backbone

  5. Inside single cells

  6. A small part of a metabolic network: the citric acid cycle

  7. Metabolic pathway chart by ExPASy

  8. Protein binding networks Baker’s yeast S. cerevisiae(only nuclear proteins shown) Nematode worm C. elegans

  9. Transcription regulatory networks Single-celled eukaryote:S. cerevisiae Bacterium:E. coli

  10. GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM bio-chemical reactions slide after Reka Albert

  11. Between cells in a multi-cellular organism

  12. Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

  13. C. elegans neurons

  14. Between organisms

  15. Freshwater food web by Neo Martinez and Richard Williams

  16. Sexual contacts: M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, 167-256 (2003).

  17. Social

  18. High school dating: Data drawn from Peter S. Bearman, James Moody, and Katherine Stovel visualized by Mark Newman

  19. Network of actor co-starring in movies

  20. Networks of scientists’ co-authorship of papers

  21. Webpages connected by hyperlinks on the AT&T website circa 1996 visualized by Mark Newman Citation networks are similar to the WWW but time-ordered

  22. Technological

  23. Internet as measured by Hal Burch and Bill Cheswick's Internet Mapping Project.

  24. transportation networks: airlines

  25. transportation networks: railway maps Tokyo rail map

  26. Lecture 1: General introduction into networks • Node degrees, its distribution, and correlations • Simple models • preferential attachment and Simon model • Growth model for protein families • Percolation transition on networks • Clustering coefficient • Lectures 2-3: Biomolecular (mostly protein) networks • Regulatory and signaling networks • How many regulators? Bureaucratic collapse • Network motifs in directed (e.g. regulatory) networks • Protein binding networks • Broad degree distributions in protein binding networks and possible explanations • Evolutionary (duplication-divergence) • Biophysical (stickiness) • Functional • Beyond degree distributions: How it all is wired together? Correlations in degrees • Randomization of networks • Law of Mass Action and propagation of perturbations • Lecture 4: Technological and information networks • Diffusion and modules in the Internet, WWW, and scientific citations • Predicting opinions of customers on products (e.g. movies) using knowledge networks

  27. Degree (or connectivity) of a node – the # of neighbors Degree K=2 Degree K=4

  28. Directed networks havein- and out-degrees In-degree Kin=2 Out-degree Kout=5

  29. Degree distributionsin random and real networks

  30. Poisson distribution Degree distribution in a random network • Randomly throw E edges among N nodes • Solomonoff, Rapaport, Bull. Math. Biophysics (1951)Erdos-Renyi (1960) • Degree distribution – Binominal  Poisson • K~ with no hubs(fast decay of N(K))

  31. Degree distribution in real protein binding network • Histogram N(K) is broad: most nodes have low degree ~ 1, few nodes – high degree ~100 • Can be approximately fitted with N(K)~K- functionalformwith ~=2.5

  32. Many real world networkshave broad degree distributions

  33. 3 1 2 Basic BA-model • Very simple algorithm to implement • start with an initial set of m0 fully connected nodes • e.g. m0 = 3 • now add new vertices one by one, each one with exactly m edges • each new edge connects to an existing vertex in proportion to the number of edges that vertex already has → preferential attachment • easiest if you keep track of edge endpoints in one large array and select an element from this array at random • the probability of selecting any one vertex will be proportional to the number of times it appears in the array – which corresponds to its degree 1 1 2 2 2 3 3 4 5 6 6 7 8 ….

  34. 3 1 1 2 2 3 3 3 4 1 1 1 1 1 2 2 2 3 3 34 4 2 2 2 5 3 4 1 1 2 2 2 3 3 3 3 4 4 4 5 5 generating BA graphs – cont’d • To start, each vertex has an equal number of edges (2) • the probability of choosing any vertex is 1/3 • We add a new vertex, and it will have m edges, here take m=2 • draw 2 random elements from the array – suppose they are 2 and 3 • Now the probabilities of selecting 1,2,3,or 4 are 1/5, 3/10, 3/10, 1/5 • Add a new vertex, draw a vertex for it to connect from the array • etc.

  35. The tale of linear vs exponential growth • Linear growth: Barabasi-Albert model with =3 is a version of the Simon’s word usage model: =2+ • dnk/dt=(k-1)nk-1/(t+t)-knk/(t+t) • Exponential growth: Protein duplication-deletion model: =2+/(dup-del) • dnk/dt=dup (k-1)nk-1- (dup+del )knk++del (k+1)nk+1; NF=knk also grows exponentially: dNF/dt=  NG=  kknk

  36. Preferential attachment with fitness • Bianconi-Barabasi (2001) • Attractiveness of a node to new edges is given by fiki/rfrkr • For uniform (f): Pk ~ k-(1+C*)/ln(k), where C*=1.255 • Generally C depends on (f) • Some (f) result in “Bose-Einstein condensation” in which super-hubs emerge

  37. Percolation transition in networks

  38. Why should we care? • The most important property of a network. It quantifies how broken-up is a network • Below the percolation threshold: many small components • At the percolation threshold: scale-free distribution of component sizes: P(S)=S-2.5 • Above the percolation threshold: giant connected component and a few small ones? • Determines the propagation of perturbations which affect neighbors with probability p (e.g. infections)

  39. Naïve (and wrong) argument • An average node has <K>first neighbors, <K><K-1>second neighbors, <K><K-1><K-1>third neighbors • We neglect overlap between e.g. second and first neighbors: in random networks a small effect ~1/N • If <K-1>  1 a single node is connected to a finite fraction of all nodes in the network

  40. Where is it wrong? • Probability to arrive at a node with K neighbors is proportional to K! • All averages have to be modified <F(K)> <F(K) K>/<K> • The right answer: <K(K-1)>/<K>  1a perturbation would spread • In directed networks it is <KinKout>/<Kin>  1 • Correlations between degrees of neighbors and an abnormally large number of triangles (clustering) would affect the answer

  41. How many clusters? • If <K(K-1)>/<K> <<1 there are only small clusters • If<K(K-1)>/<K>  1cluster sizes S have a scale-free distribution: P(S)~S-2.5. • If <K(K-1)>/<K> >> 1 there is one “giant” cluster and a few small ones • Perturbation which affects neighbors with probability p propagates if p<K(K-1)>/<K> 1 • For scale-free networks P(K)~K- with <3, <K2>=  perturbation always spreads in a large enough network

  42. Diameter and mean cluster size are determined by <k(k-1)>/<k> • Mean diameter L: 1+<k>+ <k><k(k-1)>/<k>+ <k>(<k(k-1)>/<k>)L==N  L  log(N/<k>)/log(<k(k-1)>/<k>)+1 • Mean cluster size below pc:<S>=1+<k>/(1-<k(k-1)>/<k>)

  43. Amplification ratios • A(dir): 1.08 - E. Coli, 0.58 - Yeast • A(undir): 10.5 - E. Coli, 13.4 – Yeast • A(PPI): ? - E. Coli,26.3 - Yeast

  44. Clustering coefficient C • C=3 N/knk k(k-1)/2 • Could be defined for individual nodes or as a function of k: C(k)=3 N(k)/nk k(k-1)/2 • C=1 could not be realized if k is heterogeneous • Needs to be compared to its value in randomized networks with the same degree sequence

  45. End lecture 1

  46. Lecture 2

  47. Protein networks

  48. Places to learn molecular biology • Molecular Biology of the Cell. Fourth Edition. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter. Garland Science. 2002. • DNA from the beginning. http://www.dnaftb.org/ • Online Biology Book. http://gened.emc.maricopa.edu/bio/bio181/BIOBK/BioBookTOC.html • Kimball’s Biology Pages. http://www.ultranet.com/~jkimball/BiologyPages/ • Gene expression. http://vlib.org/Science/Cell_Biology/gene_expression.shtml • Human Genome Project. http://www.ornl.gov/hgmis/ • Microarrays. http://www.gene-chips.com/ From Prof. Michael Hallett (McGill) online lectures

  49. Protein networks • Nodes – proteins • Edges – interactions between proteins • Metabolic (protein enzymes on sharing common metabolites are connected) • Physical (binding interactions) • Regulatory and signaling (transcriptional regulation, protein modifications) • Co-expression networks from microarray data (connect genes with similar expression (abundance) patterns under many conditions) • Genetic interactions e.g. synthetic lethal protein pairs (removal of any one of the two proteins doesn’t kill the cell, but removal of both proteins does) • Etc, etc, etc.

More Related