Quantifying contributions of mutations and homologous recombination to e coli genomic diversity
This presentation is the property of its rightful owner.
Sponsored Links
1 / 55

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on
  • Presentation posted in: General

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity. Sergei Maslov Department of Biosciences Brookhaven National Laboratory, New York. Bacterial genome evolution happens in cooperation with phages. +. =. Variation between E. coli strains.

Download Presentation

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity

Sergei Maslov

Department of BiosciencesBrookhaven National Laboratory, New York


Bacterial genome evolution happens in cooperation with phages

Bacterial genome evolution happens in cooperation with phages

+

=


Variation between e coli strains

Variation between E. coli strains

FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009)

Pan-genome of E. coli

Comparison of B vs K-12 strains of E. coli

M Touchon et al. PLoS Genetics (2009)

Copy and Insert

Copy and Replace


Usual suspects are there but do not explain heterogeneity

Usual suspects are there but do not explain heterogeneity

  • Negative correlation with protein abundance: 2.5% of variation, P-value=10-5

  • Positive correlation with distance from origin of replication: 0.4% of variation, P-value=10-2


High snp numbers are clustered along the chromosome

High SNP numbers are clustered along the chromosome


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Clonal

Recombined


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Clonal regions

Recombined regions

SNPs by recombination/SNPs by clonal mutations

r/μ=6±1

Recombined regions

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Strains:

K-12 vsETEC-H10407

HS

O157-H7-Sakai

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Neutral model:

Mutations and

Recombinations among

70 “genes”, population of 104

C. Fraser et al.(2007) and (2009)


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Phase transition Δc=1.5%

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


Why exponential tail

Why exponential tail?

  • Time to coalescence: Prob(t)= 1/Ne (1-1/Ne)t-1=exp(  exponential slope =1/2μNe or 1/θ

  • Population size Ne=1±0.1 x 109consistent with earlier estimates


Why n e n

Why Ne<< N ?

  • Phages:

    • But: there are phages that cross species boundaries.

    • Also slope is similar for different species

  • Restriction modification system:

    • Recombined segments are not continuous[Milkman R, Bridges MM. Genetics 1990]

  • Recombination efficiency:

    • Need 20-30 identical bases to start recombination

    • Our slope predicts 60 bases which roughly matches30 in the neginnng and 30 in the end

  • Species are defined by recombination


Are our 30 strains a representative sample

Are our 30+ strains a representative sample?

  • Fully sequenced genomes:

    • 1000s of genes (unbiased and complete)

    • 10s of strains (biased)

  • MLST data:

    • 10s of genes (biased)

    • 1000s of strains (unbiased, I hope)

  • Databasehttp://mlst.ucc.ie

    • ∼3000 E. coli strains

    • 7 short regions of ~500 base pairs eachin housekeeping genes


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

  • MLST

  • -- Genomes


Is it really phages

Is it really phages?

1kb: gene length

K-12 to B comparison

Phage capacity: 20kbOther strains up to 40kb


Does neutral model explain everything

Does neutral model explain everything?

  • At 3 standard deviations

  • 19 1kb regions supervariable

  • 29 1kb regionssuperconserved


Collaborators funding

Collaborators& funding

  • Bill Studier (BNL)

  • Purushottam Dixit (BNL)

  • Tin Yau Pang (Stony Brook)

  • Rich Lenski (Michigan State)

  • Patrick Daegelen (France)

  • JinhyunKim (Korea)

  • DOE Systems Biology Knoledgebase (KBase)

  • Adam Arkin (Berkley)

  • Rick Stevens (Argonne)

  • Bob Cottingham (Oak Ridge)

  • Mark Gerstein (Yale)

  • Doreen Ware (Cold Spring Harbor)

  • Mike Schatz (Cold Spring Harbor)

  • Dave Weston (ORNL)

  • 60+ other collaborators


Thank you

Thank you!


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

~

Genes encoded in bacterial genomes

Packages installed on Linux computers


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

  • Complex systems have many components

    • Genes (Bacteria)

    • Software packages (Linux OS)

  • Components do not work alone: they need to be assembled to work

  • In individual systems only a subset of components is used

    • Genome (Bacteria) – bag of genes

    • Computer (Linux OS) – installed packages

  • Components have vastly differentfrequencies of use


Ikea has many components

IKEA: has many components

Justin Pollard, http://www.designboom.com


They need to be assembled to work

They need to be assembled to work

Justin Pollard, http://www.designboom.com


Different frequencies of use

Different frequencies of use

vs

Common

Rare


What determines the frequency of use

What determines the frequency of use?

  • Popularity: AKA preferential attachment

    • Frequency ~ self-amplifying popularity

    • Relevant for social systems: WWW links, facebook friendships, scientific citations

  • Functional role:

    • Frequency ~ breadth or importance of the functional role

    • Relevant for biological and technologicalsystems where selection adjusts undeserved popularity


Empirical data on component frequencies

Empirical data on component frequencies

  • Bacterial genomes (eggnog.embl.de):

    • 500 sequenced prokaryotic genomes

    • 44,000 Orthologous Gene families

  • Linux packages (popcon.ubuntu.com):

    • 200,000 Linux packages installed on

    • 2,000,000 individual computers

  • Binary tables: component is either present or not in a given system


Frequency distributions

Frequency distributions

Cloud

Shell

Core

ORFans

P(f)~ f-1.5 except the top √N “universal” components with f~1


How to quantify functional importance

How to quantify functional importance?

  • Components do not work alone

  • Breadth/Importance ~ Component is needed for proper functioning of other components

  • Dependency network

    • A  B means A depends on B for its function

    • Formalized for Linux software packages

    • For metabolic enzymes given by upstream-downstream positions in pathways

  • Frequency ~ dependency degree, Kdep

    • Kdep= thetotal number of components that directly or indirectly depend on the selected one


Frequency is positively correlated with functional importance

Frequency is positively correlated with functional importance

Correlation coefficient ~0.4 for both Linux and genes

Could be improved by using weighted dependency degree


Tree like metabolic network

Tree-like metabolic network

TCA cycle

Kdep=15

Kdep=5


Dependency degree distribution on a critical branching tree

Dependency degree distribution on a critical branching tree

  • P(K)~K-1.5for a critical branching tree

  • Paradox: Kmax-0.5 ~ 1/N  Kmax=N2>N

  • Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N

    • present in almost all systems (ribosomal genes or core metabolic enzymes)

  • Need a new model: in a tree D=1, while in real systems D~2>1


Dependency network evolution

Dependency network evolution

  • New components added gradually over time

  • New component depends on D existing components selected randomly

  • Kdep(t) ~(t/N)-D

  • P(Kdep(t)>K)=P(t/N<K-1/D)=K-1/D

  • P(Kdep)=Kdep-(1+1/D) =Kdep-1.5for D=2

  • Nuniversal=N(D-1)/D=N0.5 forD=2


K dep decreases layer number

Kdep decreases layer number

Linux

Model with D=2


Zipf plot for k dep distributions

Zipf plot for Kdep distributions

Metabolic enzymes

vs

Model

Linux

vs

Model


Frequency distributions1

Frequency distributions

Cloud

Core

Shell

ORFans

P(f)~ f-1.5 except the top √N “universal” components with f~1


Why should we care about p f

Why should we care about P(f)?


Metagenomes and pan genomes

Metagenomes and pan-genomes

For P(f) ~ f -1.5: (Pan-genome size)~ ~(# of samples)0.5

The Human MicrobiomeProject Consortium, Nature (2012)


Pan genome of e coli strains

Pan-genome of E. coli strains

M Touchon et al. PLoS Genetics (2009)


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Genome evolution in E. coliStudier FW, Daegelen P, Lenski RE, Maslov S, Kim JF J. Mol Biol. (2009)P. Dixit, T. Y. Pang, Studier FW, Maslov S, submitted (2013)


How many transcription factors does an organism need

S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

TY Pang, S. Maslov, PLoS Comp Bio (2011)

How many transcription factorsdoes an organism need?

Regulator genes

Worker genes


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Figure adapted from S. Maslov, TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

NR~ NG2 NR/NG ~ NG

+


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Cyril Northcote Parkinson (1909 -1993)

“… bureaucracy grew by 5-7% per year "irrespective of any variation in the

amount of work (if any) to be done."

Why?

"An official wants to multiply subordinates, not rivals"

"Officials make work for each other.“ so that “Work expands so as to fill the time available for its completion”

Is this what happens in bacterial genomes? Probably not!


Economies of scale in bacterial evolution

Economies of scale in bacterial evolution

  • NR=NG2/80,000  NG/NR=80,000/NG

  • Economies of scale: as genome gets larger: new pathways get shorter


Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

nutrient

Horizontal gene transfer:entire pathways could be added in one step

nutrient

Redundant enzymes are removed

Central metabolic core  anabolic pathways  biomass production


Minimal metabolic pathways from reactions in kegg database

Minimal metabolic pathwaysfrom reactions in KEGG database

NR

NG

Adapted from “scope-expansion” algorithm by R. Heinrich et al.

(# of pathways or their regulators) ~(# of enzymes )2


What it all means for regulatory networks

What it all means for regulatory networks?

  • Scale-free regulatory networks with “hubs” due to power law distribution of branch sizes: P(S)~S-3

  • Trends in complexity of regulation vs. genome size

  • NR<Kout>=NG<Kin>=number of regulatory interactions E. van Nimwegen, TIG (2003)

  • NR/NG= <Kin>/<Kout> increases with NG

    • Either <Kout> decreases with NG: functions become more specialized

    • Or <Kin>grows with NG:regulation gets more coordinated & interconnected

    • Most likely both trends at once


Regulatory templates one worker one boss

nutrient

Regulatory templates:one worker – one boss

<Kout>: 

<Kin>=1=const

TF1

nutrient

TF2


Regulatory templates long top to bottom regulation

nutrient

Regulatory templates:long top-to-bottom regulation

<Kout>=const

<Kin>:

TF1

nutrient

<Kout>:

<Kin> :

TF2


Regulatory templates hierarchy middle management

nutrient

Regulatory templates:hierarchy & middle management

TF1

nutrient

TF2

TF3


Histogram of the of snps in genes

Histogram of the # of SNPsin genes

  • 50% of genes have very few SNPs

    • 1253: 0 SNPs

    • 445: 1 SNP

    • 232: 2 SNP

  • The remaining 50% are in exponential tail up to 100 SNPs (10% divergence) and higher

Comparison of B vs K-12 strains of E. coli

FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009)


  • Login