1 / 26

MRC Laboratory of Molecular Biology Cambridge

Evolution of transcription factors from selfish elements: The tale of Rcs1, a global regulator of cell size in yeast. M. Madan Babu. MRC Laboratory of Molecular Biology Cambridge. Evolution of biological systems. Evolution of networks within and across genomes.

trammellg
Download Presentation

MRC Laboratory of Molecular Biology Cambridge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of transcription factors from selfish elements: The tale of Rcs1, a global regulator of cell size in yeast M. Madan Babu MRC Laboratory of Molecular Biology Cambridge

  2. Evolution of biological systems Evolution of networks within and across genomes Evolution of transcriptional networks Evolution of transcription factors C H H Nuc. Acids. Res (2003) Nature Genetics (2004) J Mol Biol (2006a) C Structure and function of biological systems Uncovering a distributed architecture in networks Structure and dynamics of transcriptional networks Methods to study network dynamics J Mol Biol (2006b) J Mol Biol (2006c) Nature (2004) Data integration, function prediction and classification Discovery of transcription factors in Plasmodium Discovery of novel DNA binding proteins Evolution of global regulatory hubs Nuc. Acids. Res (2005) Cell Cycle (2006) Overview of research

  3. Transcriptional regulatory network in yeast Sub-network of Rcs1 and Aft2 Aft2p Rcs1p 314 123 41 Number of target genes regulated How did Rcs1 and its paralog Aft2, which are two global regulators, evolve? Reasons why we became interested in Rcs1 A fundamental developmental process we are interested in understanding is the regulation of cell size Rcs1: DNA binding domain not known What is the DNA binding domain in Rcs1?

  4. We find that the following parameters that were used to define cell-size were at least 2 Standard deviation (2 s) from the mean values of the wild-type Mother cell-size 874 760 Contour length of mother cell 108 100 Long axis length of mother cell 36 33 Short axis length of mother cell 30 27 Roundness of mother cell 1.29 1.20 Micrographs and data from SCMD Rcs1: regulator of cell size S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant Size of mutant cells are twice that of the parental strain The critical size for budding in the mutant is similarly increased Rcs1 binds specific DNA sequences

  5. Sequence analysis to identify members and distant homologs Structural analysis to infer function and distant homologs Cladistic analysis to group proteins into families and infer relationship Domain context analysis to infer function of individual members Comparative genomics and phylogenetic analysis to infer evolution of the family Expression data and network analysis to infer spatio-temporal behaviour Outline: Data integration to infer function & evolution

  6. Candida albicans (ascomycete) Yarrowia lipolytica (ascomycete) Ustilago maydis (basidiomycete) Cryptococcus sp (basidiomycetes) E. cuniculi (microsporidia) . . . + Rcs1 . Giardia lamblia (diplomonad) Dictyostelium discoideum Entamoeba histolytica Non-redundant database Lineage specific expansion in several fungi and is seen in lower eukaryotes WRKY domain (Arabidopsis) + FAR-1 type transposase (Medicago truncatula) Profiles + HMM of this region Non-redundant database Globular region maps to WRKY DNA-binding domain Relationship to WRKY DNA binding domain – Sequence analysis I

  7. Rcs1 (S. cerevisiae) + WRKY DNA-binding Domain from Arabidopsis WRKY4 Gcm1 (Mouse) PEB-1 (C. elegans) Non-redundant Database & PDB WRKY maps to the same globular region, Gcm1 & FLYWCH S1 S2 S3 S4 JPRED/PHD Sequence of secondary structure is similar to the WRKY DNA-binding domain and GCM1 protein seen in mouse Multiple sequence alignment of all globular domains Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain Confirmation of relationship to WRKY DBD – Sequence analysis II

  8. Predicted SS of Rcs1 DBD Predicted SS of Rcs1 DBD S1 S2 S3 S4 S1 S2 S3 S4 SS of WRKY4 SS of GCM1 S1 S2 S3 S4 S1 S2 S3 S4 Template structure Mus musculus Glial Cell Missing - 1 (GCM-1:1odh:X-ray structure) A. thaliana transcription factor (WRKY4:1wj2:NMR structure) Both WRKY and GCM1 have similar network of stabilizing interactions Characterization of the globular domain – structural analysis I

  9. S4 S1 S2 S3 4 residues involved in metal co-ordination and 10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone in the four strands of the GCM1-WRKY domain show a strong pattern of conservation. Core fold of the Rcs1 DBD will be similar to the WRKY-GCM1 domain and may bind DNA in a similar way Characterization of the globular domain – structural analysis II

  10. Sequence analysis to identify members and distant homologs Structural analysis to infer function and distant homologs Cladistic analysis to group proteins into families and infer relationship Domain context analysis to infer function of individual members Comparative genomics and phylogenetic analysis to infer evolution of the family Expression data and network analysis to infer spatio-temporal behaviour Outline: Data integration to infer function & evolution

  11. HxC containing version (HxC) Classical WRKY (C) Insert containing version (I) FLYWCH domain (F) GCM domain (G) C C C H C H C H C H Zn2+ Zn2+ Zn2+ H H H Zn2+ Zn2+ H C C H C W C C S1 S1 S1 S2 S2 S3 S3 S2 S3 W S4 S4 S4 S1 S1 S2 S3 S2 S3 S4 S4 WRKY motif in S1 Short loop between S2 & S3 N-terminal helix Conserved W in S4 Large insert between S2 & S3 HxC instead of HxH N-terminal helix Short insert between S2 & S3 Conserved W in S2 Sequence features Insertion of Zn ribbon between S2 and S3 C F HxC Gcm1 Far1 WRKY4 Rcs1 Mdg G I Classification of WRKY-GCM1 superfamily – Cladistic analysis I S1 S2 S3 S4 . C H . . > 4500 proteins from over 450 genomes + Zn2+ H C S1 S2 S3 S4 Template structure

  12. HxC containing version (HxC) Classical WRKY (C) Insert containing version (I) FLYWCH domain (F) GCM domain (G) C C C H C H C H C H Zn2+ Zn2+ Zn2+ H H H Zn2+ Zn2+ H C C H C W C C S1 S1 S1 S2 S2 S3 S3 S2 S3 W S4 S4 S4 S1 S1 S2 S3 S2 S3 S4 S4 e.g. At2g23500 e.g. Far1 Mobile element OTU protease MULE Tpase MULE Tpase Stand alone Stand alone Stand alone Stand alone Stand alone BED finger Zn cluster Zn knuckle POZ SMBD Tandem Tandem e.g. WRKY4 e.g. Rcs1 e.g. 101.t00020 e.g. Mod (mdg) e.g. Gcm1 C C C F F HxC HxC G G I I I Domain context for the different families – Domain network analysis I WRKY is seen both in transcription factors and transposases

  13. TF only TF only TF + TP Transcription factor Transposase Human Higher Eukaryotes Fly Worm Fungi Fungi Entamoeba Lower eukaryotes Slime mould Plants Plants C F HxC G I Phyletic distribution – Comparative genome analysis I GCM1 and FLYWCH versions evolved from an insert containing version that is a transposase HxC and Insert containing versions are seen as both transcription factors and as transposases only in fungi e.g. Rcs1 Classical version of the WRKY evolved from an insert containing version that is a transposase Domain context and phyletic analysis suggests that transcription factors could have evolved from transposases

  14. Evolutionary relationship of the insert containing WRKY domains Recent duplication event within Saccharomycetales has resulted in two hubs Independent duplication in candida MULE Transposase Insert- WRKY MULE Selfish elements in Yarrowia are seen as standalone ORFs & can regulate their own expression Rcs1 Aft2 MULE Transposase Insert- WRKY Subsequently recruited as transcription factors by the host Insert- WRKY TFs have evolved from TPs in multiple instances within fungi Functional transition in evolution captured by genomic studies Comparative genomics using >30 different fungal genomes provides convincing evidence

  15. Plants Fungi Animals Classical type WRKY has expanded in plants and are expressed in a tissue specific manner across all developmental stages Insert containing WRKY domains have been recruited to be regulators of cell size and morphology in yeast GCM1 and FLYWCH type WRKY domains have been recruited in the differentiation of stem-cells Apex Flower Seeds Floral Root Stem Leaf organs WRKY domain is seen in developmentally important proteins Transposases have been recruited to become developmentally important global regulatory proteins in all the three eukaryotic kingdoms of life

  16. Data integration allowed us to elucidate that developmentally important transcription factors in the different lineages have evolved from transposases Sequence Structure Cladistics & phylogenetics Expression Interaction Conclusion Integration of different types of publicly experimental data allowed us to identify that Rcs1 and several other developmentally important proteins in different lineages contains a WRKY-type DNA binding domain

  17. Acknowledgements L Aravind S Balaji Lakshminarayan Iyer National Center for Biotechnology Information National Institutes of Health

  18. WRKY (1wj2) GCM-type WRKY (1odh) Bed-finger (2ct5) Classical Zn-finger (1m36) C C Zn C C C H C C C H C C H H Zn2+ H Zn2+ Zn2+ Zn2+ H C H H C S1 S1 S3 S2 S3 S2 S1 S2 H1 S4 S4 S1 S2 S3 S4 Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger

  19. Aft2 (171 genes) Rcs1 (381 genes) Aft2 regulates genes involved in metal ion transport, again specifically iron Iron homeostasis Cu ion homeostasis Vacuolar protein catabolism Co-factor synthesis Vitamin B6 biosynthesis Pyridoxine metabolism Thiamin biosynthesis Rcs1 regulates genes involved in metal ion transport, specifically iron Siderophore transport Cu ion homeostasis Vacuolar protein catabolism Intracellular transport Vesicle mediated transport Golgi vesicle transport Membrane fusion Secretory pathway Common targets include: Genes involved in metal ion transport, again specifically iron Iron homeostasis Cu ion homeostasis Vacuolar protein catabolism Common targets (41 genes)

  20. * * TTR1_Atha_30694675 gcm_Dmel_17137116 WRKY41_Osat_46394336 hGCMa_Hsap_1769820 LOC411361_Amel_66547010 At2g23500_Atha_3242713 1- 5 KIAA1552_Hsap_10047169 mod(mdg4)_Dmel_24648712 LOC_Os11g31760_Osat_77551147 C26E6.2_Cele_32565510 C20orf164_Hsap_13929452 CG13845_Dmel_24649011 NtEIG-D48_Ntab_10798760 mutA_Ylip_49523824 WRKY58_Atha_22330782 T24C4.2_Cele_17555262 Drosophila melanogaster LOC374920_Hsap_27694337 Caenorhabditis elegans AFT2_Scer_6325054 FAR1_Atha_18414374 Homo sapiens AT4g19990_Atha_7268794 Fungi At2g34830_Atha_27754312 Plants Animals Encephalitozoon cuniculi Ciliates ECU05_0180_Ecun_19173554 Entamoeba histolytica 101.t00020_Ehis_67474280 GLP_9_36401_35940_Glam_71071693) Giardia lamblia Dictyostelium discoideum dd_03024_Ddis_28829829 GLP_79_64671_67418_Glam_71077115) Domain architectures of WRKY-GCM1 domain proteins Apicomplexa Plant specific Zn-cluster Plant specific N-all-beta WRKY domain GCM-type WRKY SWIM domain DUF1723 POZ STAND ATPAse TIR domain MudR transposase FLYWCH-type WRKY BED finger Zinc knuckle LRR

  21. Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis WRKY proteins show tissue specific expression WRKY proteins show light specific expression

  22. Relationship between Rcs1p and Aft2p homologs Multiple independent evolution of TFs from Transposons AAL026Wp Agos 44980144 UM03656.1 Umay 71019145 CHGG 06963 CGLO 88178242 CHGG 06785 CGLO 88182698 CHGG 09478 CGLO 88177996 CHGG 00175 CGLO 88184472 CHGG 10902 CGLO 88175616 FG05699.1 Gzea 46122643 NCU06551.1 Ncra 85106835 NCU05145.1 Ncra 85081010 YALI0F07128g Ylip 50555399 MG05295.4 Mgri 39939890 FG04147.1 Gzea 46116610 NCU07855.1 Ncra 85109845 MG06795.4 Mgri 39977821 NCU08168.1 Ncra 85093270 CHGG 09951 CGLO 88176079 Kwal 24045 waltii CHGG 08318 CGLO 88179597 ORFP 7853 mikatae ORFP 21513 mikatae NCU04492.1 Ncra 32406464 AFT2 SCER 6325054 ORFP 8601 paradoxus RCS1 SCER 51830313 ORFP 22109 paradoxus FG09606.1 Gzea 46136181 AFL087C AGOS 44984319 CaO19.2272 Calb 68482460 NCU06975.1 Ncra 85108658 UM03656.1 Umay 71019145 CHGG 05063 CGLO 88180976 KLLA0D03256g Klac 50306475 ORFP Scas Contig690.14 castelli ORFP Scas Contig720.21 castelli DEHA0F25124g Dhan 50425555 ORFP Sklu Contig1830.2 kluyveri CAGL0G09042G CGLA 49526062 CAGL0H03487G CGLA 49526254 HOP78 FOXY 30421204 ORFP Skud Contig1659.3 kudriavzeii ORFP Skud Contig2057.12 kudriavzeii CHGG 00311 CGLO 88184608 CIMG 00825 CIMM 90305840 AN6124.2 Anid 67539908 * * ISOCHOR AFUM 71001046 CNC00740 CNEO 57225606 CNBH2400 Cneo 50256416 AN0859.2 ANID 67517161 Rbf1 cluster YALI0A16269g Ylip 50545173 CaO19 12424 Calb 68467239 DEHA0E17127g Dhan 50422877 RBF1P CALB 2498834 DEHA0A05258g Dhan 50405817 CaO19.2272 Calb 68482460 Rcs1 Aft2p cluster DEHA0F25124g Dhan 50425555 CAGL0H03487G CGLA 49526254 Transcriptional network involving Aft2p and Rcs1p AFL087C AGOS 44984319 KLLA0D03256g Klac 50306475 CAGL0G09042G CGLA 49526062 RCS1 SCER 51830313 AFT2 SCER 6325054 YALI0A05313g Ylip 50543230 YALI0A02266g Ylip 50543034 Mutyl Ylip 50545163 YALI0C17193g.c Ylip 50548927 Mutyl.c Ylip 50545161 YALI0C00781g.d Ylip 50547661 YALI0C00781g.a Ylip 50547661 YALI0C00781g.b Ylip 50547661 YALI0C00781g.c Ylip 50547661 YALI0C17193g.a Ylip 50548927 Mutyl.a Ylip 50545161 YALI0D22506g Ylip 50551361 Fungi Mutyl.b Ylip 50545161 YALI0C17193g.b Ylip 50548927 Aft2p MG07557.4 Mgri 39972511 Aft2p Rcs1p MG09992.4 Mgri 39965911 101.T00020 EHIS 67474280 Rcs1p Entamoeba 4.T00052 EHIS 67483840 FAR1 ATHA 18414374 AT2G27110 ATHA 18401324 AT2G43280 ATHA 30689328 41 314 123 AT4G38180 ATHA 15233732 AT3G59470 ATHA 18411179 Plants AT5G28530 ATHA 22327146 AT1G52520 ATHA 15219020 AT1G80010 ATHA 15220043 Number of target genes regulated C20ORF164 HSAP 13929452 LOC428161 GGAL 50759053 T24C4.2 CELE 17555262 Animals SJCHGC04823 SJAP 56758936 6330408A02RIK MMUS 50053999 LOC374920 HSAP 27694337

  23. YAP1 YAP7 YAP6 GCN4 YHP1 CAD1 TOS8 YAP5 PHO2 YOX1 MET4 CIN5 (227) STE12 (357) CUP9 CST6 ACA1 SKO1 HMRa2 HMLa2 HAC1 ARR1 MET28 YAP3 HMLa1 HMRa1 PHD1 MBP1 SOK2 (471) SWI4 XBP1 c a b Basic Leucine Zipper family Homeodomain family Apses family MET4 MET28 HMLALPHA2 GCN4 HMRA2 ARR1 SWI4 YAP3 CUP9 MBP1 YAP1 TOS8 CAD1 XBP1 CIN5 HMRA1 PHD1 YAP6 YAP5 PHO2 SOK2 YAP7 YOX1 HAC1 SKO1 YHP1 ACA1 Fig 3 CST6

  24. MET4 MET28 GCN4 ARR1 YAP3 YAP1 CAD1 CIN5 YAP6 YAP5 YAP7 HAC1 SKO1 ACA1 CST6 HMLa2 HMRA2 CUP9 TOS8 HMRA1 PHO2 YOX1 YHP1 SWI4 MBP1 XBP1 PHD1 SOK2

  25. * 40 - Each box represents a TF member with a specific DBD family, arranged according to evolutionary conservation A red box represents a regulatory hub (A TF regulating > 150 genes), and a blue box represents a non-hub regulator The intensity of color represents the fraction of the 14 fungal genomes in which the protein has an ortholog 0 100 Fraction of the 14 fungal genomes in which a non-hub transcription factor is evolutionarily conserved (i.e. an ortholog exists) 30 - 0 100 Fraction of the 14 fungal genomes in which a regulatory hub is evolutionarily conserved (i.e. an ortholog exists) * Fungal specific DNA-binding domain DNA-binding domain family which evolved from a transposon Number of members in the family (non-hub : hub) 20 - 10 - * 0 - Tig Hsf Fkh Tea Myb bZip Abf1 Gata Ime1 Ace1 Rcs1 Mads bHLH Dal82 Apses Tigger HMG1 Homeo AT-Hook C2H2-Zn C6-Fungal LisH+CTLH Gcr1p+Msn1p P53-Cytochrome Fig 1

  26. XXXY ZZZW XXXY ZZZW XXXY ZZZW XXXY ZZZW Extant proteins share less target genes than expected by chance Extant proteins share less target genes than expected by chance Extant proteins share less target genes than expected by chance Extant proteins share less target genes than expected by chance XXXY ZZZW XXXY ZZZW XXXY ZZZW Sok2 Phd1 Extant proteins share more target genes than expected by chance Extant proteins share more target genes than expected by chance Extant proteins share more target genes than expected by chance Extant proteins share more target genes than expected by chance Possible evolutionary trajectories of transcriptional regulators d b c a Common ancestor was not a regulatory hub. Extant proteins are not regulatory hubs Common ancestor was not a regulatory hub. One of the extant proteins is a regulatory hub Common ancestor was a regulatory hub. One of the extant proteins is a regulatory hub Common ancestor was a regulatory hub. Both extant proteins are regulatory hubs Gene duplication Gene duplication Gene duplication Gene duplication

More Related