Quantifying contributions of mutations and homologous recombination to e coli genomic diversity
Download
1 / 55

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity - PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity. Sergei Maslov Department of Biosciences Brookhaven National Laboratory, New York. Bacterial genome evolution happens in cooperation with phages. +. =. Variation between E. coli strains.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity' - evan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Quantifying contributions of mutations and homologous recombination to e coli genomic diversity

Quantifying contributions of mutations and homologous recombination to E. coli genomic diversity

Sergei Maslov

Department of BiosciencesBrookhaven National Laboratory, New York


Bacterial genome evolution happens in cooperation with phages

Bacterial genome evolution recombination to E. coli happens in cooperation with phages

+

=


Variation between e coli strains
Variation between E. coli strains recombination to E. coli

FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009)

Pan-genome of E. coli

Comparison of B vs K-12 strains of E. coli

M Touchon et al. PLoS Genetics (2009)

Copy and Insert

Copy and Replace


Usual suspects are there but do not explain heterogeneity
Usual suspects are there but recombination to E. coli do not explain heterogeneity

  • Negative correlation with protein abundance: 2.5% of variation, P-value=10-5

  • Positive correlation with distance from origin of replication: 0.4% of variation, P-value=10-2


High snp numbers are clustered along the chromosome
High SNP numbers are recombination to E. coli clustered along the chromosome


Clonal recombination to E. coli

Recombined


P. Dixit, T. Y. Pang,  recombination to E. coli Studier FW, Maslov S, PNAS submitted (2013)


Clonal regions recombination to E. coli

Recombined regions

SNPs by recombination/SNPs by clonal mutations

r/μ=6±1

Recombined regions

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


Strains: recombination to E. coli

K-12 vsETEC-H10407

HS

O157-H7-Sakai

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)

Neutral model:

Mutations and

Recombinations among

70 “genes”, population of 104

C. Fraser et al.(2007) and (2009)


Phase transition recombination to E. coli Δc=1.5%

P. Dixit, T. Y. Pang, Studier FW, Maslov S, PNAS submitted (2013)


P. Dixit, T. Y. Pang,  recombination to E. coli Studier FW, Maslov S, PNAS submitted (2013)


Why exponential tail
Why exponential tail? recombination to E. coli

  • Time to coalescence: Prob(t)= 1/Ne (1-1/Ne)t-1=exp(  exponential slope =1/2μNe or 1/θ

  • Population size Ne=1±0.1 x 109consistent with earlier estimates


Why n e n
Why N recombination to E. coli e<< N ?

  • Phages:

    • But: there are phages that cross species boundaries.

    • Also slope is similar for different species

  • Restriction modification system:

    • Recombined segments are not continuous[Milkman R, Bridges MM. Genetics 1990]

  • Recombination efficiency:

    • Need 20-30 identical bases to start recombination

    • Our slope predicts 60 bases which roughly matches30 in the neginnng and 30 in the end

  • Species are defined by recombination


Are our 30 strains a representative sample
Are our 30+ strains recombination to E. coli a representative sample?

  • Fully sequenced genomes:

    • 1000s of genes (unbiased and complete)

    • 10s of strains (biased)

  • MLST data:

    • 10s of genes (biased)

    • 1000s of strains (unbiased, I hope)

  • Databasehttp://mlst.ucc.ie

    • ∼3000 E. coli strains

    • 7 short regions of ~500 base pairs eachin housekeeping genes


  • MLST recombination to E. coli

  • -- Genomes


Is it really phages
Is it really phages? recombination to E. coli

1kb: gene length

K-12 to B comparison

Phage capacity: 20kbOther strains up to 40kb


Does neutral model explain everything
Does neutral model explain everything? recombination to E. coli

  • At 3 standard deviations

  • 19 1kb regions supervariable

  • 29 1kb regionssuperconserved


Collaborators funding
Collaborators recombination to E. coli & funding

  • Bill Studier (BNL)

  • Purushottam Dixit (BNL)

  • Tin Yau Pang (Stony Brook)

  • Rich Lenski (Michigan State)

  • Patrick Daegelen (France)

  • JinhyunKim (Korea)

  • DOE Systems Biology Knoledgebase (KBase)

  • Adam Arkin (Berkley)

  • Rick Stevens (Argonne)

  • Bob Cottingham (Oak Ridge)

  • Mark Gerstein (Yale)

  • Doreen Ware (Cold Spring Harbor)

  • Mike Schatz (Cold Spring Harbor)

  • Dave Weston (ORNL)

  • 60+ other collaborators


Thank you
Thank you! recombination to E. coli


~ recombination to E. coli

Genes encoded in bacterial genomes

Packages installed on Linux computers


  • Complex systems recombination to E. coli have many components

    • Genes (Bacteria)

    • Software packages (Linux OS)

  • Components do not work alone: they need to be assembled to work

  • In individual systems only a subset of components is used

    • Genome (Bacteria) – bag of genes

    • Computer (Linux OS) – installed packages

  • Components have vastly differentfrequencies of use


Ikea has many components
IKEA: has many components recombination to E. coli

Justin Pollard, http://www.designboom.com


They need to be assembled to work
They need to be assembled to work recombination to E. coli

Justin Pollard, http://www.designboom.com


Different frequencies of use
Different frequencies of use recombination to E. coli

vs

Common

Rare


What determines the frequency of use
What determines the frequency of use? recombination to E. coli

  • Popularity: AKA preferential attachment

    • Frequency ~ self-amplifying popularity

    • Relevant for social systems: WWW links, facebook friendships, scientific citations

  • Functional role:

    • Frequency ~ breadth or importance of the functional role

    • Relevant for biological and technologicalsystems where selection adjusts undeserved popularity


Empirical data on component frequencies
Empirical data on component frequencies recombination to E. coli

  • Bacterial genomes (eggnog.embl.de):

    • 500 sequenced prokaryotic genomes

    • 44,000 Orthologous Gene families

  • Linux packages (popcon.ubuntu.com):

    • 200,000 Linux packages installed on

    • 2,000,000 individual computers

  • Binary tables: component is either present or not in a given system


Frequency distributions
Frequency distributions recombination to E. coli

Cloud

Shell

Core

ORFans

P(f)~ f-1.5 except the top √N “universal” components with f~1


How to quantify functional importance
How to quantify functional importance? recombination to E. coli

  • Components do not work alone

  • Breadth/Importance ~ Component is needed for proper functioning of other components

  • Dependency network

    • A  B means A depends on B for its function

    • Formalized for Linux software packages

    • For metabolic enzymes given by upstream-downstream positions in pathways

  • Frequency ~ dependency degree, Kdep

    • Kdep= thetotal number of components that directly or indirectly depend on the selected one


Frequency is positively correlated with functional importance
Frequency is positively correlated recombination to E. coli with functional importance

Correlation coefficient ~0.4 for both Linux and genes

Could be improved by using weighted dependency degree


Tree like metabolic network
Tree-like metabolic network recombination to E. coli

TCA cycle

Kdep=15

Kdep=5


Dependency degree distribution on a critical branching tree
Dependency degree recombination to E. coli distribution on a critical branching tree

  • P(K)~K-1.5for a critical branching tree

  • Paradox: Kmax-0.5 ~ 1/N  Kmax=N2>N

  • Answer: parent tree size imposes a cutoff:there will be √N “core” nodes with Kmax=N

    • present in almost all systems (ribosomal genes or core metabolic enzymes)

  • Need a new model: in a tree D=1, while in real systems D~2>1


Dependency network evolution
Dependency network evolution recombination to E. coli

  • New components added gradually over time

  • New component depends on D existing components selected randomly

  • Kdep(t) ~(t/N)-D

  • P(Kdep(t)>K)=P(t/N<K-1/D)=K-1/D

  • P(Kdep)=Kdep-(1+1/D) =Kdep-1.5for D=2

  • Nuniversal=N(D-1)/D=N0.5 forD=2


K dep decreases layer number
K recombination to E. coli dep decreases layer number

Linux

Model with D=2


Zipf plot for k dep distributions
Zipf recombination to E. coli plot for Kdep distributions

Metabolic enzymes

vs

Model

Linux

vs

Model


Frequency distributions1
Frequency distributions recombination to E. coli

Cloud

Core

Shell

ORFans

P(f)~ f-1.5 except the top √N “universal” components with f~1


Why should we care about p f
Why should we care about P(f)? recombination to E. coli


Metagenomes and pan genomes
Metagenomes recombination to E. coli and pan-genomes

For P(f) ~ f -1.5: (Pan-genome size)~ ~(# of samples)0.5

The Human MicrobiomeProject Consortium, Nature (2012)


Pan genome of e coli strains
Pan-genome of E. recombination to E. coli coli strains

M Touchon et al. PLoS Genetics (2009)


Genome evolution in E. recombination to E. coli coliStudier FW, Daegelen P, Lenski RE, Maslov S, Kim JF J. Mol Biol. (2009)P. Dixit, T. Y. Pang, Studier FW, Maslov S, submitted (2013)


How many transcription factors does an organism need

S. Maslov recombination to E. coli , TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

TY Pang, S. Maslov, PLoS Comp Bio (2011)

How many transcription factorsdoes an organism need?

Regulator genes

Worker genes


Figure adapted from S. Maslov recombination to E. coli , TY Pang, K. Sneppen, S. Krishna, PNAS (2009)

NR~ NG2 NR/NG ~ NG

+


Cyril recombination to E. coli Northcote Parkinson (1909 -1993)

“… bureaucracy grew by 5-7% per year "irrespective of any variation in the

amount of work (if any) to be done."

Why?

"An official wants to multiply subordinates, not rivals"

"Officials make work for each other.“ so that “Work expands so as to fill the time available for its completion”

Is this what happens in bacterial genomes? Probably not!


Economies of scale in bacterial evolution
Economies of scale in recombination to E. coli bacterial evolution

  • NR=NG2/80,000  NG/NR=80,000/NG

  • Economies of scale: as genome gets larger: new pathways get shorter


nutrient recombination to E. coli

Horizontal gene transfer:entire pathways could be added in one step

nutrient

Redundant enzymes are removed

Central metabolic core  anabolic pathways  biomass production


Minimal metabolic pathways from reactions in kegg database
Minimal metabolic pathways recombination to E. coli from reactions in KEGG database

NR

NG

Adapted from “scope-expansion” algorithm by R. Heinrich et al.

(# of pathways or their regulators) ~(# of enzymes )2


What it all means for regulatory networks
What it all means for regulatory networks? recombination to E. coli

  • Scale-free regulatory networks with “hubs” due to power law distribution of branch sizes: P(S)~S-3

  • Trends in complexity of regulation vs. genome size

  • NR<Kout>=NG<Kin>=number of regulatory interactions E. van Nimwegen, TIG (2003)

  • NR/NG= <Kin>/<Kout> increases with NG

    • Either <Kout> decreases with NG: functions become more specialized

    • Or <Kin>grows with NG:regulation gets more coordinated & interconnected

    • Most likely both trends at once


Regulatory templates one worker one boss

nutrient recombination to E. coli

Regulatory templates:one worker – one boss

<Kout>: 

<Kin>=1=const

TF1

nutrient

TF2


Regulatory templates long top to bottom regulation

nutrient recombination to E. coli

Regulatory templates:long top-to-bottom regulation

<Kout>=const

<Kin>:

TF1

nutrient

<Kout>:

<Kin> :

TF2


Regulatory templates hierarchy middle management

nutrient recombination to E. coli

Regulatory templates:hierarchy & middle management

TF1

nutrient

TF2

TF3


Histogram of the of snps in genes
Histogram of the recombination to E. coli # of SNPsin genes

  • 50% of genes have very few SNPs

    • 1253: 0 SNPs

    • 445: 1 SNP

    • 232: 2 SNP

  • The remaining 50% are in exponential tail up to 100 SNPs (10% divergence) and higher

Comparison of B vs K-12 strains of E. coli

FW Studier, P Daegelen, RE Lenski, S Maslov, JF Kim, JMB (2009)


ad