470 likes | 585 Views
This article examines the vast complexity of mammalian transcriptomes, highlighting why initial estimates of gene counts were inaccurate. It emphasizes the prevalence of alternative splicing and transcript variation, which contribute to the extensive diversity of transcript variants compared to protein products. With each gene potentially encoding multiple isoforms, our understanding of gene expression measurement needs to evolve. By delving into mRNA processing, regulatory elements, and the implications of these findings for was can significantly influence cellular function and gene regulation.
E N D
Interrogating the transcriptome in all its diversity Joel H Graber
Why were so many predictions of the number of genes in a mammalian genome wrong? • Nature Genetics, June 2000, v25, n2.
Mammalian genomes contain far more transcript variants than protein variants • Average protein products per locus = 1.7 • Average distinct transcripts per locus = 5.7 Genome Biology(2009) 10:201.
A processed, protein coding mRNA molecule includes distinct functional regions Genomic sequence Protein coding sequence 5’-untranslated region (5’-UTR) 3’-untranslated Region (3’-UTR)
~ 1-100 Mbp ~ 1-1000 kbp 5’ 3’ 3’ 5’ 5’ 3’ … … … … 3’ 5’ promoter (~103 bp) Polyadenylation site (~10-100 bp) enhancers (~10-100 bp) other regulatory sequences (~ 10-100 bp) Pieces of a (Eukaryotic) Protein -Coding Gene(on the genome) exons (cds&utr) / introns (~ 102-103 bp) (~ 102-105 bp)
Alternate mRNA processing can lead to multiple transcript and/or protein products … … 3 transcripts 1 protein product
Translation control mRNA localization DNA = water in pipes Protein = water in pool Transcription control mRNA degradation mRNA = water in hose Protein degradation Carolyn demonstrates gene regulation
A somewhat more formal view of regulation in the various stages of gene expression
Systematic changes to mRNA processing can significantly change the regulatory program of a cell • Changes can be in a single gene or systemic • Regulatory control during transcript generation • Transcription initiation site • Splicing pattern • 3’-processing (polyadenylation and cleavage) site • RNA editing • Subsequent isoform-specific regulatory control • Stability • Translational efficiency • Localization
Implications of transcript variation for gene expression measurement • Most large scale expression studies report one level per gene per sample • Microarrays: • One reported value of expression per probeset; • Duplicate probesets are either averaged or discarded • mRNAseq • RPKM (reads per kilobase of transcript per million reads) • For many genes, summarization to one expression level in a given cell type is inadequate
Every time we find a new way to measure RNA, we find previously unknown types Mattick et al, Trends Genet 2009
Classes of alternative transcripts • Alternative splicing • Alternative transcript initiation sites • Alternative cleavage and polyadenylation (3’-processing) • Combinations of one or more of these
The cascade of alternative mRNA processing in gene regulation mRNA processing selections during mRNA generation can have a profound effect on downstream regulation of the resulting transcript
Processing and specifically alternative processing are controlled by cis-elements and transfactors • mRNA processing signals are typically constrained in both sequence content and positioning • Activity of specific sites is a function of the strength of the local signals and the cell/environment specific concentrations/activities of transfactors
Alternative splicing can occur in several ways http://www.wormbook.org/
Cis elements required for splicing 3‘ss 5‘ss BP Yeast GUAUGU UACUAAC YAG ESE ESE Vertebrates YYYY AG GUAAGU CURAY NCAG GU 10-15 ESE? ESE? Plants AG GUAAGU CURAY UGYAG GU UA-rich UA-rich 62 100 70 49 64 95 100 44 79 99 58 53 42 100 57 5‘ss – 5‘ splice site (donor site) 3‘ss – 3‘ splice site (acceptor site) BP – branch point (A is branch point base) YYYY10-15 – polypyrimidine track Y – pyrimidine R – purine N – any base
Frequency of bases in each position of the splice sites Donor sequences: 5’ splice site exon intron %A 30 40 64 9 0 0 62 68 9 17 39 24 %U 20 7 13 12 0 100 6 12 5 63 22 26 %C 30 43 12 6 0 0 2 9 2 12 21 29 %G 19 9 12 73 100 0 29 12 84 9 18 20 A GGU A A G U Acceptor sequences: 3’ splice site intron exon %A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17 %U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37 %C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22 %G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25 Y Y Y Y Y Y Y Y Y Y Y N Y AGG Polypyrimidine track (Y = U or C; N = any nucleotide)
Example 1: Insulin-like growth factor 1 (Igf1) • AKA somatomedin C or mechano growth factor • Produced primarily by the liver as an endocrine hormone • Primary action is mediated by binding to IGF1R • Natural activator of the AKT pathway • A primary mediator of the effects of growth hormone • Expression has been • Negatively correlated with lifespan • Positively correlated with body size • Its regulatory control remains poorly understand after 30y
IGF1 is subject to extensive alternative mRNA processing ~83,000 nt
IGF1 mRNA data indicates at least 15 or more transcript isoforms
Salient features of IGF1 expression • Mature, circulating IGF1 protein is a cleavage product, coded entirely in exons 3 and 4 • Exon 5 contains an additional peptide cleavage product, with demonstrated independent functionality • Exons 1 and 2 are mutually exclusive, and likely not the only upstream, transcript initiating exons • Exon 5 can be skipped, included or 3’-terminal • Exon 6’s reading frame changes depending on whether it is spliced from exon 4 or 5
Alternative 3’-processing can arise in several ways with varying consequences Adapted from Yan J, et al.,Genome Research. 2005; 15(3):369-75.
PolyA site selection depends on sequence elements and abundance/stochiometry of trans-factors PAS 5’ UGUA AAUAAA 30 kD PAPOL 160 kD 73 kD 68 kD 25 kD 100 kD CPSF 50 kD 77 kD 77 kD Symplekin 64 kD UG-rich 50 kD 64 kD CSTF DSE U-rich hnRNP H G-rich Up to >80 proteins in complex 3’
NMF defines patterns of signals that control 3’-processing (cleavage and polyadenylation)
Example 2: Insulin-like growth factor 2 mRNA binding protein 1 (Igf2bp1) • Contains four K homology domains and two RNA recognition motifs • Binds to the 5’-UTR of IGF2 mRNA, regulating translation • Can act as an oncogene if misregulated • Evolutionarily conserved, with critical role in mRNA localization and translational control
Consequences: Igf2bp1 has transforming potential only when expressed in its truncated isoform ~50,000 nt ~6,500 nt 5’ 3’ AAA… AAA… Mayr and Bartel, Cell 2009
Inclusion (or exclusion) of regulatory sequences in the 3’-UTR fine tune expression and response • Spicheret al, Mol Cell Biol 1998
Example 3: Regulated control of polyA site selection for anitbodies during B-cell maturation
Alternative transcription initiation can arise in several ways with varying consequences
CAGE tags showed an unexpectedly high frequency in the 3’-UTR
3’-UTR CAGE tags occur in evolutionarily conserved contexts with a common local sequence
The definition of a gene becomes much more fluid: Ins2-IGF2 • Two genes with spurious connection? • One large genes with distinct, disjoint transcripts?
Cleaved 3’-UTR RNA products (uaRNAs) are often tissue-specific and can localize differentially
Next time: Details of measuring transcript differences in large-scale