Genome dynamics in Bacillus megaterium. October 29, 2009 Dept. of Biological Sciences NIU. What genomic sequencing tells us about the genetic forces that shape Bacillus genomes. The Genus Bacillus. Gram-positive, aerobic endospore-forming rod-shaped bacteria
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Genome dynamics in Bacillus megaterium October 29, 2009 Dept. of Biological Sciences NIU What genomic sequencing tells us about the genetic forces that shape Bacillus genomes
The Genus Bacillus Gram-positive, aerobic endospore-forming rod-shaped bacteria “Normal” habitat: the soil (plus lots of other places) Mostly mesophilic, but some grow as low as 0°and as high as 65°. Pathogens: B. anthracis and B. cereus Industrial uses: enzyme production, Bt insecticidal corn Endospores very resistant to heat and chemicals
A Bit of History • Bacillus subtilis, originally named Vibrio subtilis, by Christian Gottfried Ehrenberg in 1835. He was the first to use the name “bacteria”. • Ferdinand Cohn (1872) renamed the species Bacillus subtilis, as part of his description of bacteria by their shape (“bacillus” = “little stick”). • --He is also responsible for bacteria being considered plants and not animals • Robert Koch first showed that a specific bacterium caused a specific disease: B. anthracis and anthrax. (1876) • B. megaterium was first described by Heinrich Anton de Bary in 1884.
Bacillus’s Position in the Tree of Life Anything called “Bacillus” in the 1800’s would now be a member of the Firmicutes (“strong skin”), a phylum that contains the Gram-positive low G+C bacteria. An alternative model, based on indels in universal genes, puts the Firmicutes near the root of the tree.
Bacillus Taxonomy • Bacillus is a very old genus name, and it has been split several times. • Bergey’s Manual of Systematic Bacteriology, first edition (1986) lists 32 valid species, with about an equal number of synonyms. • Based on morphology, biochemistry, some DNA-DNA hybridization, numerical taxonomy • Carl Woese introduced the use of 16S rRNA sequences for phylogeny in 1977. • Bergey’s Manual second edition (2004) splits the Bacillus genus into 4 families, with 37 genera in the Bacillaceae. Over 200 species. • Bacillus is still a genus, and still contains both B. subtilis and B. megaterium. • As in other taxa, a common phenotype is well correlated with a common genotype
Ash et al. (1991) Lett. Appl. Microbiol. 13:202-206.
Genome Sequencing • Strain QM B1551, containing 7 plasmids • NSF Grant, to Pat Vary and Jacques Ravel • Most lab work done at TIGR/U. Maryland • NIU’s role: annotating the 6000 genes • Joined forces with Dieter Jahn’s group at the Technische Universität in Braunschweig, Germany, who were sequencing the DSM 319 plasmidless strain • In addition, there are about 20 other fully sequenced genomes from Bacillus and related genera • DSM319 has no plasmids, but at least 70 genes on the QM plasmids have good homologues on the DSM chromosome (purple ring near the middle)
Common Features, Genetic Forces Assuming that all Bacillus species descended from a common ancestor, what is similar and different between them, and why? Common Features • Morphological and biochemical characteristics • 16S rRNA genes • A group of common protein-coding genes • Chromosomal synteny • rRNA operons Genetic Forces • Vertical descent • Background substitution and indel mutations • Horizontal gene transfer (about 10% different genes between QM and DSM) • Intragenomic recombination • Homogenization of rRNA operons, presumably by gene conversion
16S Variation, Phylogeny, and Species Identification • B. megaterium has 11 rRNA (rrn) operons on the chromosome in both sequenced strains, in the same genomic positions. • QM also has an rrn operon on plasmid pBM400, which is not found in DSM. • The 16S genes in B. megaterium are 1540 bp long and very similar, but not identical. • Gene conversion is thought to homogenize rRNA operons • Recombination between rrn operons leads to deletions • The question addressed here: what effect does 16S variation within the genome have on phylogeny and species identification?
Differences between 16S genes with B. megaterium • Positions 461 and 474 are probably a stem-loop: • all genes with an A at 461 have a T at 474, and all lines with a G at 461 have a C at 474. • Seven identical 16S genes: the rrnE, rrnF, and rrnI genes in QM and the rrnA, rrnB, rrnF, and rrnK alleles in DSM. • Also, the rrnA and rrnB alleles in QM were identical to each other • Note the lack of clear vertical descent in this pattern • Total of 20 sites with polymorphisms. • All but 4 are unique to a single operon • All but one shared polymorphism are found in both QM and DSM
Mismatch Differences in Completely Sequenced Genomes num compared max internal diffs B. megaterium QM B1551 B. megaterium DSM319 B. anthracis B. cereus B. thuringiensis B. weihenstephanensis B. cytotoxis B. pumilus B. subtilis B. amyloliquefaciens B. licheniformis B. halodurans B. clausii O. iheyensis A. flavithermus G. kaustophilus G. themodenitrificans Paenibacillus sp. QM 12 8 DSM 11 4 0 Banthra 11 5 81 78 Bcereus 14 8 80 78 2 Bthur 14 4 74 71 2 2 Bweihen 14 5 80 78 11 9 10 Bcyto 13 9 80 81 32 32 31 38 Bpumil 7 4 84 83 87 87 86 90 84 Bsub 10 12 94 93 91 90 87 97 93 49 Bamylo 9 7 97 96 91 90 87 95 94 47 12 Blichen 7 5 98 97 88 87 84 91 93 54 30 27 Bhalo 8 5 106 105 99 99 100 101 98 100 89 85 80 Bclaus 7 16 123 121 120 119 117 125 116 119 109 110 111 89 Oceano 6 14 109 107 114 112 107 112 118 113 109 116 116 124 123 Anoxy 8 8 126 125 129 128 121 134 113 128 128 127 121 111 135 136 Gkausto 9 12 150 150 140 138 129 143 133 145 140 138 131 135 142 163 105 Gthermo 10 14 145 146 133 132 122 137 127 138 140 135 125 130 138 161 91 33 Paeni 12 8 173 171 171 169 161 173 174 167 171 175 172 163 159 172 174 194 194
Differences in Completely Sequenced Genomes • Maximum differences within any genome = 16 (B. clausii) • My basic argument: there is no point in having two different species which are less different than 16S genes within the same genome. • Among the cereus group genomes, there are fewer differences between genes in B. cereus, B. anthracis, and B. thurengiensis, than there are between genes in the same genome. • Also, B. weihenstephanesis has only very few differences from these • B. subtilis and B. amyloliquifaciens are also very similar. • Effects on phylogeny: pick a random 16S gene from each genome, align, count differences, do a neighbor-joining tree. 1000 reps.
Neighbor-Joining Trees with Completely Sequenced Genomes • Different choices of which 16S genes to use leads to different phylogenies, both at the species/subspecies level and at higher levels. • The variable nodes in the cereus group and the halodurans/clausii group are independent. Thus, these three tree represent 9 variants.
Defining B. megaterium and distinguishing it from other species, using 16S • Comparison of B. megaterium isolates from Genbank to QM-rrnA • A total of 185 isolates that were >1390 bp (i.e. > 90% of full length) and had fewer than 10 ambiguity characters were aligned with QM_rrnA, and the number of variant positions were counted. • 70% have 9 or fewer differences; • 86% have 20 or fewer differences; • 95% have 46 or fewer differences. • Most isolates seem to fall into a single group, but there may be some significantly different subtypes in B. megaterium. • Or, new species may be defined
Positions of Nucleotide Variants in Genbank Isolates • 43% of the 1540 nucleotide positions in the 16S gene have at least one variant in the B. megaterium strains from Genbank. • Most of the variation occurs at the ends of the 16S gene. This is also the region where missing data is most common. • PCR primers for 16S need to be internal to the gene • The variant positions in the middle were seen in QM and DSM: the paired 461/474 positions, and 1140. There are no major polymorphisms outside the end regions that are not seen in QM and DSM.
Closely Related Species species count average minimum asahii 4 80.5 67 azotoformans 1 82 82 bataviensis 5 67.8 62 benzoevorans 4 72 65 circulans 17 99.5 66 cohnii 7 68.9 58 fastidiosus 1 74 74 firmus 51 81.5 66 flexus 47 29.4 0 fumarioli 17 103.1 71 funiculus 7 97 92 halmapalus 6 78.3 66 horikoshii 16 83.8 69 infernus 2 96 96 jeotgali 1 78 78 koreensis 1 87 87 luciferensis 3 86.7 85 megaterium 185 10.9 0 methanolicus 4 76.5 74 niacini 14 79.1 59 novalis 3 72.7 71 panaciterrae 5 90.2 82 psychrosaccharolyticus 1 73 73 simplex 39 79.7 17 soli 7 68.4 64 sphaericus 75 119.4 103 vireti 6 76.3 71 • How easy is it to distinguish between B. megaterium and closely related species? • What species are closely related to B. megaterium? Different phylogenetic trees give different answers. • All of the species on the next slides appear to be more similar to B. megaterium than members of the cereus group on at least one phylogenetic tree. • All are in genus Bacillus except Lysinibacillus sphaericus. • Total of 344 strains used
Number of Differences from QM-rrnA for Different Species • Except for B. flexus and one B. simplex isolate, all strains are well-differentiated from B. megaterium with a minimum of 58 differences. • B. flexus overlaps the B. megaterium distribution heavily. The average B. flexus isolaate had 29.4 differences from QM_rrnA, with some isolates indistinguishable. Type strain differs at 16 positions; the B. megaterium type strain differs at 4 positions. • The average B. simplex isolate had 79.4 differences from B. megaterium; the one exceptional strain had 17 differences (maybe it’s a mis-labeling)
Conclusions about 16S genes • Choosing different 16S genes from within genomes can affect the resulting phylogenetic trees. • The 16S genes within B. megaterium and other completely sequenced Bacillus genomes differ from each other by up to 16 positions. • Some species differ from other species at fewer positions than 16S genes differ within individual genomes. • Although most B. megaterium strains are very similar to QM and DSM, there are a few strains with very different 16S genes that may represent subtypes within B. megaterium, or which may ultimately be assigned to new species. • Most of the polymorphisms in the 16S genes are almost unique; all of the widespread . megaterium polymorphisms are found in QM and DSM. • Most of the closely related species fall outside the range of variation seen within B. megaterium, but B. flexusis a major exception. • some isolates of B. flexus are indistinguishable from B. megaterium, and most fall within the same range of variation seen in B. megaterium
Common Genes and Synteny Bacillus is a relatively well-sequenced genus, with 11 complete genomes publicly available (not including B. megaterium). What genes are found in all Bacillus species, the core genome? Where on the chromosome are the conserved genes?
Synteny Results The syntenic region around the origin of replication is shared throughout the Bacillaceae, including the genera Bacillus, Geobacillus, Oceanobacillus, and Anoxybacillus. 99% of the 2000 core genes are in the syntenic region. Next: rRNA operons and adjacent genes: concrete examples of conserved synteny.
rRNA operons (rrn) • There are 11 rRNA operons on the B. megaterium chromosomes, plus one on plasmid pBM400 in the QM strain. • Other Bacillus species have 8-15 rrn operons • The rrn operons are in the conserved synteny region. • Only in Bacillus and relatives • rrn operons are all on the leading DNA strand: transcribed in the same direction as the replication fork moves. • Most Bacillus rrn operons are on the right replichore, near the origin of replication
From Stewart and Cavanaugh, 2007, J. Mol. Evol. 65:44-67
Common Sites • Nearly all the rrn operons in the Bacillaceae can be found between sets of common flanking genes. • Sometimes with DNA insertions separating the rrn locus from one side • A few unique rrn operons, including 2 in B. megaterium • Not in Paenibacillus A: DNA repair protein recF B: DNA gyrase, subunit B; C: DNA gyrase, subunit A; D: inosine-5'-monophosphate dehydrogenase; E: D-alanyl-D-alanine carboxypeptidase. F: glutamine amidotransferase, synthase subunit
Variations • Seven sites on the right replichore, plus one on the left replichore. • Also, a site shared within the cereus group, and two sites shared in Geobacillus and Anoxybacillus. • Individual rrn sites can contain 0-5 rrn operons. • Some sites are empty: the flanking genes are adjacent, with no rrn operon between • A few sites are missing: the flanking genes are not present in the genome or are dispersed to very different locations. • Tandem duplications of rrn operons are common • Several variations caused by apparent intragenomic recombination
Intragenomic Recombination at rrn Sites • rrn operons are almost identical, among the very few repeated sequences in bacterial genomes • A second example: insertion sequences (IS) , which are mobile genetic elements found in many genomes (very few in B. megaterium ). • The presence of highly conserved genes and the consequences of intragenomic recombination in a circular genome constrains genome rearrangements. • The arrangement of rrn operons and their sites can be understood as the result of three forces: • intragenomic recombination between rrn operons, • insertions/deletions of blocks of protein-coding genes, • recombination events within tandem arrays of rrn operons.
Symmetrical Inversion Between Replichores Anoxybacillus flavithermus
Double Crossover Re-orders Flanking Genes B. pumilis
Double Crossover in Flanking tRNA Regions • B. amyloliquifaciens rrnE. • Resulted in loss of 2/3 of the 16S gene. • 23S and 5S OK • very little obvious homology on the right side.
Tandem Duplication Events: Duplication by Unequal Crossing Over rrnD in Oceanobacillus iheyensis
Tandem Duplications in the cereus group rrnG site • Every deletion between adjacent rrn operons can be seen. • Deletion of genes between rrn 2 and rrn3 (preserving one gene in the middle). • Region between rrn 3 and rrn 4 completely replaced.
Intragenomic Recombination Conclusions • Most rrn operons are found in the same sites in all Bacillus genomes • Differences in rrn operon number are mostly due to tandem duplications within these sites • Intragenomic recombination is well documented in Bacillus genomes • Anoxybacillus: symmetric inversion across ori • B. pumilis: double crossover involving 3 regions • Oceanobacillus rrnD: CO between tandem copies • rrnD in other species: at least 2 events • cereus group rrnG: deletions between tandem copies (at least 4 different events) • cereus group rrnG: replacement of inter-rrn region by presumed 2CO • cereus group rrnG: deletion of inter-rrn region, leaving a central portion intact (2 deletions?). • B. amyloliquifaciens rrnE: 2CO involving 3 regions, with the central section having the CO’s 150 bp apart • B. megaterium rrnBC: 2CO involving 3 regions, with little homology at one end • Several other duplication/deletion events within tandem duplications
Some Events NOT Observed • The lack of certain events supports several current ideas. • to the extent that lack of evidence constitutes evidence. • Crossovers between rrn sites: despite numerous CO events within rrnG in the cereus group, plus many other CO events • supports the idea that the flanking genes are necessary • Asymmetric CO across ori: only one symmetric one observed, so evidence is not strong. • Supports the idea that symmetric replichores are selectively advantageous • Inversions within a replichore: all rrn in all species are on the leading strand, in both replichores. • supports the idea that replication and rrn transcription must proceed in the same direction
Some Unsolved Questions • Replichore asymmetry: • most of the rrn are in the right replichore • compositional bias between replichores • Mechanism of insertion/deletion/horizontal gene transfer • a big question. We are examining insertion sites for clues. • Is there a common phylogeny for the conserved synteny region in B. megaterium? • Finding and analyzing allegedly unique events (indels and recombinations)
Thanks! NIU Biology Dept. Pat Vary Janaka Edirisinghe Kirthi Kumar Kutumbaka Sandhya Balasubramanian Jenn Hintzsche Chris Braun Denise Tombolato Judy Luke Scott Grayburn NIU Computer Science Dept. Stephen Snow Reva Freedman Minmei Hou Argonne National Labs Ross Overbeek Gordon Pusch Terry Disz TIGR/U. Maryland Jacques Ravel Mark Eppinger MJ Rosovitz Technische U. Braunschweig Dieter Jahn Boyke Bunk