1 / 48

Genomic Firsts

Genomic Firsts. 1976: RNA virus -- Phage MS2 (3 kbp) 1977: DNA virus -- Phage Φ-X174 (6 kbp) 1995: Bacteria -- Haemophilus influenzae (1.8 Mbp) 1995: Eukarya -- Saccharomyces cerevisiae (12 Mbp) 1996: Archaea -- Methanococcus jannaschii (1.6 Mbp)

trevet
Download Presentation

Genomic Firsts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Firsts 1976: RNA virus -- Phage MS2 (3 kbp) 1977: DNA virus -- Phage Φ-X174 (6 kbp) 1995: Bacteria -- Haemophilus influenzae (1.8 Mbp) 1995: Eukarya -- Saccharomyces cerevisiae (12 Mbp) 1996: Archaea -- Methanococcus jannaschii (1.6 Mbp) 2000: draft human genome -- J. Craig Venter (3 Gbp)

  2. Genome Sequencing Explosion

  3. Genome Sequencing Explosion

  4. Three domains of life 16S rRNA sequences Woese 1987

  5. Global phylogeny of 191 organisms derived from 31 conserved protein genes. Tree is fairly well resolved and agrees mostly with rRNA tree. Ciccarelli et al (2006) Science

  6. Genomic streamlining in prokaryotes Proteobacteria (from Higgs & Attwood) ~1000 bp/gene short intergenic regions

  7. Efficiency in the Genome • Small organisms care about DNA replication time. • No wasted space • High coding density (85-90%) • 1 gene per 1000 bases in prokaryotes • Haemophilus influenzae • 1762 genes in 1.8 Mb • Human • 23000 genes in 3080 Mb • Eukaryotic genomes have lots of transposons and repetitive sequences. • The larger organelle genomes also have a greater fraction of non-coding sequence, but small animal mitochondria fit the trend of the bacteria. Hou and Lin – PLoS ONE 2009

  8. large variation in genome size between bacteria Sorangium cellulosum (14000kb) 11599 codong sequences Soil bacterium Tremblaya princeps (140kb) 121 coding sequences Endosymbiont in insect cells

  9. McCutcheon and Moran Nature Reviews (2012)

  10. McCutcheon and Moran Nature Reviews (2012) Reduced size genomes evolve independently in different lineages. Usually on long branches = fast sequence evolution.

  11. Subdivisions of proteobacteria identified using 16S rRNA originally • proteobacteria • - Agrobacterium tumefaciens - genetic engineering • Rickettsia conorii – ticks – spotted fever • Rickettsia prowazeckii – lice – typhus • proteobacteria • Neisseria meningitidis • N. gonorrhoea • proteobacteria • Escherichia coli – commensal – lab study • Yersinia pestis – plague • Haemophilus influenzae – respiratory pathogen. (First bacterial genome) • Xanthomonas / Xylella – plant pathogens • proteobacteria • - Helicobacter pylori – intestinal infections Considerable change in GC content among related genomes. Short genomes are derived from longer genomes – lots of deletions in cases of intracellular parasites and endosymbionts.

  12. Pathogens and intracellular bacteria have low GC content – May be a result of metabolic cost of synthesis of G and C being higher (Rocha and Danchin, 2002) These genomes are also small – use it or lose it! This may explain correlation of GC content with genome size It has also been argued that there is a general mutation bias towards AT, and that selection for GC keeps this from going to very low GC in most organisms. This stabilizing selection might be weaker in smaller intracellular organisms. Therefore smaller genomes have more AT. ...However, two extremely small genomes break the trend. Maybe these have a mutation bias in the other direction (towards GC) – this is not yet measured.

  13. Circular representation of the R. conorii genome (strain Malish 7). The outermost circle indicates the nucleotide positions. The second and third circles locate the ORFs on the plus and minus strands, respectively. Function categories are color-coded [see Web fig. 1 (10)]. The fourth and fifth circles locate tRNAs. The locations of three rRNAs are indicated by black arrows. The sixth and seventh circles indicate the locations of repeats. The eighth circle shows the G-C skew (G- C/G+C) with a window size of 10 kb. The region locally breaking the genome colinearity with R. prowazekii is indicated by a shaded sector. The four major genomic segments involved in this rearrangement are colored in blue, yellow, green, and red. Ogata et al – Science (2001)

  14. Illustration of the colinearity. Three distinct segments from the R. conorii genome aligned with the homologous segments from the R. prowazekii genome are shown. These segments were chosen to show three types of gene alteration: split genes in R. prowazekii (top), a split gene in R. conorii (middle), and a gene remnant in R. prowazekii (bottom).

  15. Comparison of genomes of related organisms shows synteny – but relatively rapid evolution of gene order Mycoplasma genitalium and M. pneumoniae Each dot shows a high-scoring BLAST match between a gene of one species and a gene of the other species

  16. Gene gain via Horizontal Gene Transfer (mostly prokaryotes)

  17. Gene gain via Gene Duplication (mostly eukaryotes)

  18. Genomic streamlining in symbionts and pathogens McCutcheon & Moran (2012)

  19. Host-restricted parasites and endosymbionts • Fewer essential genes because of environment provided by host • Smaller effective population size (bottlenecks) • Reduced selection against slightly deleterious mutations & Reduced opportunity for homologous recombination  faster sequence evolution, reduced functionality and stability of proteins (need for high level of chaperones) • Reduced selection against the deletion of slightly beneficial genes, inherent bias toward deletions, & reduced opportunity to acquire genes horizontally  gene loss much faster than gene gain. • Free-living bacteria • Selection to maintain reasonably large set of functional genes. • Gene acquisition balances gene loss • HGT mediated by viruses and plasmids  gain of functions • Some cells are competent for DNA uptake (transformation) • Homologous recombination can eliminate some deleterious mutations

  20. Balance between selection and mutation in a large population nk = number of individuals with k deleterious mutations N = total population size U = number of deleterious mutations per genome per generation Assume no advantageous mutations. Back-mutations are very rare. Fitness w = (1-s)k For a very large population, selection balances mutation. There is a stationary state:

  21. Muller’s Ratchet – Acumulation of deleterious mutations in asexual species with small populations If N is fairly small, then the number of individuals in the fittest class, n0, can be very small. This fluctuates, and eventually goes to zero. If there are no back-mutations, the fittest class is gone forever. This is one click of the ratchet. More and more deleterious mutations with time until “mutational meltdown” kills the species fitness

  22. Muller’s Ratchet is stopped by recombination mutation After one click of the ratchet, every chromosome has at least one deleterious mutation, but they don’t all have the same one. Initial population recombination Cross-over can recreate the fittest class. This is much more likely than back-mutation in sexual species.

  23. Muller’s Ratchet and the Evolution of Sex • Two-fold cost of males in sexual species  must be a big benefit of sex to outweigh this cost • A few parthenogenetic species are derive from sexual ancestors. These do not do well in the long term. • The ability of recombination to stop Muller’s ratchet is one large advantage of sex, and is one possible reason for the prevalence of sexual species. • Host-parasite co-evolution is probably another important reason. • Maybe most free-living bacteria should be thought of as sexual, not asexual. • Uptake of fragments of DNA from similar cells gives the possibility of homologous recombination. This functions like sex in eukaryotes. It can remove deleterious mutations. • Uptake of DNA from distantly related organisms (Horizontal Gene Transfer) can lead to the spread of beneficial genes • When bacteria become obligate parasites or endosymbionts, they become truly asexual. • Consequences are gene loss and accumulation of deleterious mutations.

  24. Global phylogeny of 191 organisms derived from 31 conserved protein genes. Tree is fairly well resolved and agrees mostly with rRNA tree. Ciccarelli et al (2006) Science

  25. Need to consider Eukaryotes separately for 2 reasons. • Almost everyone believes there is a tree for Eukaryotes. • Origin of Eukaryotes is a later unique event that is very likely not tree-like. Do prokaryotic taxa mean anything? -Proteobacteria? Enterobacteriaceae? E. coli?

  26. Criticisms of the Prokaryotic Tree of Life (Bapteste et al. 2009) “Belief in the universal tree of life is stronger than the evidence from genomes that supports it.” Circularity of tree methods – Phylogenetic methods always produce a tree of some kind. Statistical problems – weak signals from many individual genes. Failure to reject the consensus tree is not necessarily support for it. Systematic biases in phylogenetic methods. Large-scale exclusion of conflicting data. Core genes not necessarily representative of a species tree. Closely related species may exchange genes more frequently. Unrelated species in similar niches may exchange genes more frequently. Convergent evolution? This is an interesting paper but take it with a pinch of salt

  27. Spectrum of Opinions • The tree of rRNA and translational genes is the species tree. Other genes appear to give different trees just because of noise and phylogenetic errors. HGT is unimportant. • The tree of rRNA and translational genes is the best information we have about the tree of cell divisions and speciations. Most genes follow this tree most of the time, even if most genes may have been horizontally transferred at some point in their history. • The tree of rRNA and translational genes tells us only about the history of these genes, and is therefore not particularly important. There are other essential groups of genes that follow other evolutionary paths. We need a network representation, not a single tree. • HGT is so frequent that all genes follow different histories. Therefore tree-building is a waste of time. We only get results that look like trees because our methods are designed to produce trees.

  28. Gene Content Variation among E. coli genomes. Evidence for horizontal transfer – Welch et al (2002). Core genome = intersection of sets Pangenome = union of sets

  29. Core and Pan-genome of E. coli Core genome Pan-genome Rasko et al (2008) J. Bacteriol.

  30. Rapid Gain and Loss of genes among closely related genomes of Bacillus Hao and Golding (2006) Genome Research • Assumes a tree to begin with (many conserved genes) • Only two of the patterns shown require more than one character change • Does not distinguish HGT from innovation

  31. Tree of Archaea based on signature genes Gao and Gupta (2007) BMC Genomics • Signature genes are those that are shared by all members of a group and are not posessed by any other speies. • Can the tree be constructed from gene content alone? • Does not show events that do not fit the hierarchical tree. • What about transfers within niches? Groups of genes confer metabolic activity

  32. Phylogeny of three domains of life based on shared gene content SHOT – Korbel et al (2002) S = fraction of genes that are orthologues between two species d = -lnS Input d to NJ method Major domains and groups of bacteria are obtained the same as for rRNA Does not work for very reduced genomes of parasites & symbionts

  33. Always possible to explain a presence/absence pattern by either multiple deletions or by horizontal transfer. Examples from Dagan et al (2007) Loss only, (b) Single origin, (c) Origin + 1 HGT, (d) Orign + 2 HGTs The problem is, we don’t know the ratio of HGT to deletions….

  34. Reconstructing ancestral genomes using parsimony (Dagan et al 2007) If HGT is disallowed or penalized too much, then ancestral genomes must have been far larger than any current genomes. If HGT is too frequent then ancestral genomes are apparently too small. This helps to find a moderate value for the ratio of HGT to deletions.

  35. Collect genomes from NCBI Method of Collins & Higgs (2012) All-vs-All BLASTP Single-linkage clustering Identification of universal single-copy clusters Global amino acid alignment Concatenation of alignments Phylogenetic reconstruction using Maximum Likelihood

  36. Core and Pangenomes Closed – means that pangenome size tends to a maximum as number of genomes increases Open – means that pangenome keeps increasing as you add new genomes Fitting the data suggests that the pangenome is open for most groups of bacteria and that Gpan (n) increases in proportion to ln(n). This is expected on a tree like a coalescent (a). On a star tree (b), it would increase linearly with n.

  37. Gene Frequency Spectra 9 Prochlorococcus genomes Baumdicker et al (2009) G(k) is the number of genes found in k genomes from a group of n. There is a U-shape: many genes found in only 1 or 2 genomes, a certain number of core genes in (almost) all n, and fewer genes in between. The U-shape applies at all scales from species to the full bacerial domain. 293 Bacterial genomes Lapierre and Gogarten (2009) Collins and Higgs (2012)

  38. Core, Shell and Cloud genes (Koonin and Wolf – 2012)

  39. The role of gene duplication:Gene family size distributions Collins and Higgs (2011)

  40. Modelling duplication and deletion of genes 2  u 3 2 1 0 3 etc. 3 2  4

  41. Origin of Mitochondria Sequence similarity to Rickettsia – within  proteobacteria Also conserved gene order between Rickettsia and the mitochondrial genome of the protist Reclinomonas (one of the largest mitochondrial genomes).

  42. Gene order and phylogeny for Hodgkinia (very small endosymbiont – see assignment 3) Shows it has evolved independently of the lineage leading to Rickettsia and mitochondria Derived change in Rickettsia not shared with Hodgkinia Hodgkinia placed within Rhizobiales – raises questions of GC content bias and long branch attraction

  43. Long Branch Attraction - An artefact of phylogenetic methods that tends to put unrelated species with rapid evolution together. It can also draw long branch species closer to the root, because they are attracted to the outgroup. Rooting the tree of life using ancient gene duplications

  44. Long Branch attraction and the tree of rRNA (Gribaldo and Philippe 2002) Typical tree in older papers shows many lineages on long branches close to the roots of Bacteria and Eukarya Are there any eukaryotes that never had mitochondria? Were ancestral organisms hyperthermophiles? Root is usually inferred from ancient gene duplications – eg EFTu and EFG

  45. After correcting for long branch attraction... Microsporidia are now related to fungi. They have small genomes with lots of gene loss and rapid sequence evolution. Current thought says there may never have been eukaryotes without mitochondria. Eukaryotes evolved by fusion of an  protobacterium with an archaeon. The event that created the mitochondria also created the nucleus. Seems strange! This would make prokaryotes monophyletic after all Phylogeny of major bacterial groups is still uncertain. Deduction of temperature at base of tree is difficult. Most papers still argue for hyperthermophiles at common ancestor of archaea and bacteria. Root is still most likely here, although this paper questions it.

  46. Or was there a mesophilic origin after all? Growth temperature mapped onto the rRNA tree

  47. Competing hypotheses for the origin of the eukaryotic host cell. TA Williams,et al. Nature 504, 231-236 (2013) doi:10.1038/nature12779 Eocyte hypothesis: The root is (still) on the bacterial branch Eukaryotes fall within the archaea. They have a common ancestor with Eocytes/Crenarchaeota. Only Two Domains! Standard picture: The root is on the bacterial branch There is a common ancestor of archaea amd eukaryotes

  48. Maybe Giant Viruses are a Fourth Domain?RNA polymerase sequences from Global Ocean SurveyGOS Wu et al. (2011)

More Related