IB404 - 14 - Other plants - March 10 1. The next plant genome was the 450 Mbp genome of rice, Oryza sativa. Several groups contributed to this effort, including two large companies, Syngenta and Monsanto, who produced WGS drafts, a WGS draft by a Chinese genome center, and detailed clone-by-clone efforts by the Japanese. Several conclusions are worth noting: A. Despite about at least 200 Myr divergence between these two major lineages of angiosperms (dicots and monocots), there is still considerable microsynteny between their genes. B. The rice genome is much larger because of many transposon insertions between genes, that is, genes are islands in a sea of transposons. C. The rice and other grass/cereal genomes are essentially colinear, being around 70 Myr old, so rice serves as a model for the others, including maize/corn at 3 Gbp and wheat at 16 Gbp.
2. An enormous effort has gone into sequencing the maize genome, but at ~2.5 Gbp it is nearly the same size as ours, except even more of it is transposons, and these transposons form long stretches of seemingly barren genomic landscape. Like rice, but even more so, genes are islands in this sea of transposons. Various strategies were employed to separate out the gene islands and sequence only them, but eventually the genome was completed, using old-fashioned clone-by-clone approach involving sequencing ~17,000 BACs, and published in 2009. It is 85% transposons, and the authors predict around 32,000 genes. The transposon numbers are simply amazing.
3. Like Arabidopsis, rice, and sorghum, the genes are mostly on the chromosome arms of the 10 metacentric chromosomes, with massive pericentromeric regions completely dominated by transposons and other repeats. The first line is exons, the rest are various transposons.
4. Since these are all reasonably young flowering plants or angiosperms, we expect to see most genes shared, especially amongst the three grains, but even across to Arabidopsis the number of shared genes is high, with only a few losses in each lineage, and of course some rapidly evolving lineage-specific genes. This Venn diagram is simplified by looking only at gene families, of which there are ~12,000 per genome (recall that animals have ~10,000 gene families).
5. The soybean (Glycine max) genome was published in 2010. It also took a long time because it is relatively large at around 1 Gbp with tons of transposon repeats, and again has a lot of duplicated genes. The genome is in 20 chromosomes, and extensive genetic and physical maps allowed assigning most scaffolds to these chromosomes. In this image of them, genes are in blue, transposons in green, yellow, and light blue, while the upper grey layer is “none of the above”. Notice that the genes are mostly at the ends of the chromosome arms, with the pericentromeric transposon repeats making up most of the genome. Centromeres are pink.
6. The ~46,000 genes reveal two cycles of polyploidization, estimated by these authors at 13 and 60 Myr ago (which is completely different from the diagram at the end of the last lecture??). In this histogram, the ~31,000 genes that are paralogous are shown as pairs, with the younger polyploidization event having a Ks mode of ~0.15 (15% of possible synonymous changes have occurred), whereas the older event has a Ks mode of ~0.5. The remaining ~15,000 genes are singletons, presumably because their paralog was lost.
7. One of the most important features of soybeans is their production of lipids, with soybean oil being one of the major products. They tried to annotate all the genes possibly involved in lipid metabolism, and came up with 1,157. In every category there are more than in Arabidopsis, presumably due to retention of gene duplicates after the ~13 MYA polyploidization. Other analyses include an extensive discussion of the genes implicated in facilitating and regulating the formation of nodules, the root structures peculiar to legumes which harbor nitrogen-fixing bacteria.
8. Just to show the complexity of these genomes, here are their pie charts of the many different kinds of transcription factors found in these two plant genomes (5,673 genes in 63 families for soybean, fully 12% of their gene catalog). Remember that the MADS family genes are the major regulators of plant form. They do have homeodomain/homeobox TFs, but they have different functions in plants. Soybean is on the left, with roughly double the number of each class of TF.
9. Several other plants have been sequenced, including sorghum, grape, and Populus, and more recently cucumber and strawberry. Our own Ray Ming in Plant Biology led sequencing of the papaya genome, starting when he was working in Hawaii generating transgenic strains resistant to viral infection. They did a fairly rough draft 3X WGS genome assembly with lots of gaps and a total size around 400 Mbp. Unlike all these other plants, it does not appear to have undergone a recent polyploidization, and hence has only ~13,000 genes. Indeed, this is close to the estimate for what the ancestral angiosperm genome looked like. 10. The Populus genome is reminiscent of the soybean genome, with ~45,000 protein-coding genes, and a relatively recent whole-genome duplication or polyploidization event contributing ~8000 paralog pairs that persist in this genome in large microsyntenic chunks. Populus is being promoted as a biofuel.
11. Many plants have large genomes, especially amongst the gymnosperms like pines (15-25 Gbp), so they are unlikely to be completely sequenced any time soon. Instead, large scale sequencing of cDNAs, plus massive scale EST projects are being undertaken, allowing comparisons across all plants. Here is a phylogenomic tree based on large numbers of proteins from both genomes and transcriptomes, rooted with mosses, ferns, and liverworts at the bottom. The authors used the tree and protein dataset to conclude that several aspects of microRNA biology only evolved in the angiosperms, and might have contributed to their diversity.