1 / 58

The gene family play and the chromosomal theater

The gene family play and the chromosomal theater. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Outline. Large-scale duplication and loss of genes in the angiosperms Looking into the future of plant phylogenomics A case study in gene family demography

jerry
Download Presentation

The gene family play and the chromosomal theater

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

  2. Outline • Large-scale duplication and loss of genes in the angiosperms • Looking into the future of plant phylogenomics • A case study in gene family demography • Duplication and functional divergence

  3. Paul Franz, University of Amsterdam

  4. Arabidopsis as a hub for plant comparative maps data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218

  5. Tomato-Arabidopsis synteny Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121

  6. Duplicated genes in Arabidopsis

  7. Modes of gene duplication • Tandem (T) • unequal crossing-over • mostly young • Dispersed (D) • transposition • all ages • Segmental (S) • polyploidy • all old

  8. Paleotetraploidy? The Arabidopsis Genome Initiative. 2000. Nature 408:796

  9. Vision et al. (2000) Science 290:2114-7.

  10. Microsynteny within blocks

  11. distribution of dA in blocks not in blocks Problems • proteins diverge at different rates • high dA is difficult to estimate Solution • average dA within blocks

  12. A B C E F D Rosids (Arabidopsis) 110-160 Mya 160-240 Mya Asterids (tomato) monocots (rice) Mya 50 100 0 150 200 discrete duplication events

  13. the 2-4 complex(one ancestral segment broken up by 4 large inversions)

  14. coefficient of variation = 0.67 coefficient of variation = 0.53

  15. Rice-Arabidopsis microsynteny Mayer et al. (2001) Genome Res. 11, 1167

  16. Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.

  17. Arabidopsis Arabidopsis Arabidopsis Arabidopsis Rice Rice Rice Rice duplication

  18. Block 37 after Asterid-Rosid split Block 57 before monocot-dicot divergence Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129

  19. Divergence among duplicated genes in rice Goff et al. (2002) Science 296: 92

  20. Hidden syntenies Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627

  21. Interspecies comparison can reveal hidden syntenies Vandepoele, Simillion, Van de Peer (2002) TIG 18, 606-608

  22. Comparative mapping in a phylogenetic context

  23. Major plant genome datasets Family GenusgenomeESTmap Aizoaceae Mesembryanthemum crystallinum X Brassicaceae Arabidopsis thaliana X X X Brassica spp. X Fabaceae Glycine max X X Medicago truncatula X X Phaseolus spp. X Malvaceae Gossypium arboreum X X Solanaceae Capsicum annuum X Lycopersicon esculentum X X Solanum tuberosum X X Poaceae Hordeum vulgare X X Oryza sativa X X X Sorghum bicolor/propinguim X X Triticum aestivum X X Zea mays X X Other Beta vulgaris X Chlamydomonas reinhardtii X X Pinus taeda X X Populus spp. X Prunus spp. X

  24. Plant unigene datasets species TIGRPlantGDB barley 49885 74621 beet na 13565 chlamydomonas 30296 na citrus na 4266 coffee na 392 cotton 24350 27854 grape 49885 74621 iceplant 84558945 lettuce 21960na lotus 11025na maize 55063 71655 marchantia na 1059 medicago 3697643384 oat na 361 onion 11726 na pine 26882 24668 poplar na 20935 potato 24275 24839 rice 6077852156 rye 5199 5384 sorghum 33273 34363 soybean 67826 73946 sunflower 20520 na tomato 3101235725 wheat 109509 95949 + Arabidopsis 27170

  25. Wikström et al (2001) Proc R Soc Lond B 268, 2211

  26. Plant phylogenomics: Phytome • The goal is to integrate • Organismal phylogeny • Gene family • sequence • alignment • phylogeny • Genetic and physical maps

  27. Some uses for Phytome • Starting with a chromosome segment • Identify homologous segments • Predict unobserved gene content (candidate QTL) • Starting with a gene family • Resolve orthology/paralogy relationships • Identify coevolving families • Starting with a species • Explore lineage-specific diversification • Guide comparative mapping wet-work

  28. Current pipeline Protein sequence prediction Homolog identification Unigene collections Protein family clustering Annotations Multiple sequence alignment Phytome Phylogenetic inference

  29. Lineage specific diversification Arabidopsis 1033 173 436 Cotton 334 836 696 Medicago 715 Tomato 919 Rice 152 genes are “single copy” in all four species

  30. A tale of two sisters: the ARF and the Aux/IAA gene families • Modulate whole plant response to auxin • Interact via dimerization • ARFs are transcription factors • Aux/IAAs bind and repress ARFs in the absence of auxin

  31. The chromosomal context

  32. Diversification of ARFs

  33. Diversification of the Aux/IAAs

  34. Why the different patterns of diversification? • 12% (ARF) vs 40% (Aux/IAA) segmental duplications • Presumably reflects differential retention • Possible explanations • Dosage requirements • Coevolution with other interacting genes • Regional transcriptional regulation

  35. Divergence of duplicated genes Divergence in expression profile Age of duplication

  36. Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003) • Appx. 50% of pairs diverge very rapidly • Proportion of divergent pairs increases with Ks and Ka • Plateaus at Ka ~0.3 in human • In humans, • Immune response genes over-represented among young, divergent pairs • Distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific

  37. Retention of duplicated genes • Nonfunctionalization, or loss of one copy • The fate of most pairs • Neofunctionalization (NF) • Positive selection on a new mutation can maintain the pair • Subfunctionalization (SF) • Mutations that increase the specificity of duplicates can fix due to drift provided that, combined, the two copies provide the functionality of the ancestral gene. Once SF happens, both copies are indispensable and are retained. • One prediction of the model is that SF more likely for tandem than dispersed pairs (due to linkage)

  38. Digital expression profiling • Massively Parallel Signature Sequencing (MPSS) • Count occurrence of 17-20 bp mRNA signatures • Cloning and sequencing is done on microbeads • Similar to Serial Analysis of Gene Expression (SAGE) • “Bar-code” counting reduces concerns of • cross-hybridization • probe affinity • background hybridization • Advantages • Accurate counts of low expression genes • Can distinguish expression profiles of duplicate genes

  39. AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA mRNA AAAAAAA extract mRNA from tissue Convert to cDNA TTTTTTT Add linker AAAAAAA Cut w/ Sau3A TTTTTTT AAAAAAA 3’ - Add unique 32 bp tag and standard primer 5’ - Add standard primer TTTTTTT AAAAAAA (added by cloning) Anneal to beads coated with unique anti-tag (32 bp, complementary to tag on mRNA) PCR TTTTTTT AAAAAAA Remove 3’ primer and expose single stranded unique tag (digest, 3'  5' exonuclease) MPSS library construction Brenner et al., PNAS 97:1665-70. GATC

  40. AAAAAAA AAAAAAA AAAAAAA MPSS library construction AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA Brenner et al., PNAS 97:1665-70. AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA AAAAAAA Sort by FACS to remove ‘empty’ beads The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript. Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end.

  41. NNNN 4 3 2 1 + NNNX CODEX1 RS NNXN CODEX2 RS NXNN CODEX3 RS Sequence by hybridization XNNN CODEX4 RS Add adaptors 16 cycles for 4 bp Digest with Type IIS enzyme to uncover next 4 bases 13 bp Repeat Cycle Steps of four bases; overhang is shifted by four bases in each round ^ GNNN CODEC4 RS DECODERED CNNN ^ 4 3 2 1 NNNN 9 bp 8 7 6 5 MPSS Sequencing Brenner et al., Nat. Biotech. 18:630-4.

  42. TGA ATG MPSS Sequencing Each bead provides a signature of 17-20 bp Signature Sequence # of Beads (Frequency) Tag # 1 2 3 4 5 6 7 8 9 . . 30,285 GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG . . GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814 . . 2,935 Total # of tags: >1,000,000 Two sets of signatures are generated from each sample in different reading frames staggered by two bases

  43. Duplicated: expression may be from other site in genome Potential alternative splicing or nested gene Potential alternative termination Anti-sense transcript or nested gene? Potential anti-sense transcript Potential un-annotated ORF Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or or or or or or Classifying signatures Typical signatures

  44. Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware Signatures Distinct Library sequenced signatures Root 3,645,414 48,102 Shoot 2,885,229 53,396 Flower 1,791,460 37,754 Callus 1,963,474 40,903 Silique 2,018,785 38,503 TOTAL 12,304,362 133,377

  45. http://www.dbi.udel.edu/mpss • Query by • Sequence • Arabidopsis gene identifier • chromosomal position • BAC clone ID • MPSS signature • Library comparison • Site includes • Library and tissue information • FAQs and help pages

  46. Chr. I Chr. II Chr. III Chr. IV Chr. V Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures

  47. Dataset of duplicate pairs • Gene families of size two in Arabidopsis classified as • Dispersed (280) • Segmental (149) • Tandem (63) • For each pair • Measure similarity/distance in expression profile • Estimate of Ks and KA

  48. library 2 library 1 library 3 Expression distance

More Related