570 likes | 696 Views
Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia RECOMB, 20 May 2008. % of alternatively splic ed human and mouse genes , by year of publication. 100%.
E N D
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia RECOMB, 20 May 2008
% of alternatively spliced human and mouse genes, by year of publication 100% 2008 C.Burge Human (genome / random sample) All genes Human (individual chromosomes) Only multiexon genes Genes with high EST coverage Mouse (genome / random sample)
Roles of alternative splicing • Functional: • creating protein diversity • human: ~30.000 genes, >100.000 proteins • maintaining protein identity • e.g. membrane (receptor) and secreted isoforms • dominant negative isoforms • combinatorial (transcription factors, signaling domains) • regulatory • e.g. via chanelling to NMD (nonsense-mediated decay) • Evolutionary
Plan • Evolution of alternative exon-intron structure • Origin of new (alternative) exons and sites • Evolutionary rates in constitutive and alternative regions
Elementary alternatives Cassette exon Mutually exclusive exons Alternative donor site Alternative acceptor site Retained intron
Sources of data • ESTs: 1999 global 2002-3 comparative • mapping exon-intron structure to genome • global alignment of genomes • identifying non-conserved exons and splice sites • oligonucleotide arrays (chips):2001 global2004 comparative • qualitative analysis (inclusion values) • genome-specific constitutive / alternative exons • mRNA-seq (new generation high-throughput):2008 globalexpected 2009-10 comparative
Alternative exons are often genome-specific (Modrek & Lee, 2003)
~ 25% AS events in ~50% genes are not conserved Na/K-ATPaseFxyd2/FXYD2 p53 Nurtdinov…Gelfand, 2003
Alternative exon-intron structure in fruit flies and malarial mosquito • Same procedure (AS data from FlyBase) • cassette exons, splicing sites • also mutually exclusive exons, retained introns • Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes • Technically more challenging: • incomplete genomes • the quality of alignment with the Anopheles genome is lower, especially for terminal exons • frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles) Malko…Gelfand, 2006
Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved • retained introns are the least conserved (are all of them really functional?) • mutually exclusive exons are as conserved as constitutive exons
Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved • ~30% joined, ~10% divided exons (less introns in Aga) • mutually exclusive exons are conserved exactly • cassette exons are the least conserved
Genome-specific AS: real or noise?young or deteriorating? • minor isoforms, small inclusion rate • often frameshifting and/or stop-containing => NMD • regulatory role? Sorek, Shamir & Ast, 2004
Alternative exon-intron structure in the human, mouse and dog genomes • Human-mouse-dog triples of orthologous genes • We follow the fate of human alternative sites and exons in the mouse and dog genomes • Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: • conservation of the corresponding region (homologous exon is actually present in the considered genome); • conservation of splicing sites (GT and AG) Nurtdinov…Gelfand, 2007
Caveats • we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes • we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog • functionality assignments (translated / NMD-inducing) are not very reliable
Gains/losses: loss in mouse Commonancestor
Gains/losses: gain in human (or noise) Commonancestor
Gains/losses: loss in dog (or possible gain in human+mouse) Commonancestor
Triple comparison Human-specific alternatives: noise? Human-specific alternatives: noise? Lost in mouse Lost in dog Conserved alternatives Conserved alternatives
Translated and NMD-inducing cassette exons • Mainly included exons are highly conserved irrespective of function • Mainly skipped translated exons are more conserved than NMD-inducing ones • Numerous lineage-specific losses • more in mouse than in dog • more of NMD-inducing than of translated exons • ~40% of almost always skipped (<1% inclusion) human exons are conserved in at least one lineage (mouse or dog)
Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise Nurtdinov…Gelfand, 2009
The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate • Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time
Conserved rodent-specific exons and pseudoexons Estimation of “FDR” by analysis of conservation of pseudoexons • intronic fragments with the same characteristics (length distribution etc.) • apply standard rules to estimate “conservation” • obtain the number (fraction) of rodent-specific exons that could be pseudoexons conserved by chance (brown) • obtain the number (fraction) of real rodent-specific exons (dark green): ~50%, that is, ~15% of mouse-specific exons (the rest is likely noise)
Alternative donor and acceptor sites: same trends • Higher conservation of ~uniformly used sites • Internal sites are more conserved than external ones (as expected)
Dyak Dmel Dmoj Dere Dsec Dvir Dgri Dpse Evolution of (alternative) exon-intron structure in 11 Drosophila spp. Dana D. melanogasterD. sechelia D. yakuba D. erecta D. ananassae D. pseudoobscura D. mojavensis D. virilis D. grimshawi D.persimilis D.willistonii D. Pollard, http://rana.lbl.gov/~dan/trees.html
Unique events per 1000 substitutions. Caveat: We cannot observe exon gain outside and exon loss within the D.mel. lineage Gain and loss ofalternative segmentsand constitutiveexons – 34. – 0.9 + 184. + 1.1 Dyak – 37. – 8.7 Dmel Dmoj + 143. + 1.1 Dere Dsec Dana – 57. – 0.5 – 100. – 6.6 Dvir Dgri – 13. – 0.6 – 14. – 1.6 + 131. + 0.4 – 24. – 1.2 Dpse – 75. – 7.2 + 85. + 0.8 Dper – 134. – 1.1 – 40. – 2.3 – 175. – 20.2 – 5. – 0.2 + 45. + 0.9 – 16. – 0.3 Sample size 397 / 18596 ± 57. ± 1.0 Dwil
Gain and loss ofalternative segmentsand constitutiveexons – 151. – 3.6 + 213. + 1.3 Dyak – 164. – 11.7 Non-unique events per 1000 substitutions (Dollo parsimony) Dmel Dmoj + 226. + 2.7 Dere Dsec Dana – 272. – 1.0 – 330. – 9.3 Dvir Dgri – 68. – 1.4 – 40. – 2.1 + 188. + 0.7 – 33. – 2.9 Dpse – 238. – 9.8 + 98. + 1.3 Dper – 233. – 1.8 – 83. – 4.2 – 408. – 27.6 – 72. – 0.4 + 120. + 1.7 – 49. – 1.1 Sample size 452 / 18874 ± 81. ± 1.3 Dwil
Conserved alternative splicing in nematodes • 92% of cassette exons from Caenorhabditis elegans are conserved in Caenorhabditis briggsae and/or Caenorhabditis remanei(EST-genome comparisons) • in minor isoforms as well • especially for complex events • there is less difference between levels of AS (exon inclusion) in natural C.elegans isolates than in mutation accumulation lines (microarray analysis) => positive selection on the level of AS. Irimia…Roy, 2007; Barberan-Sohler & Zaler, 2008
Plants: little conservation of alternative splicing • Arabidopsis thaliana – Oriza sativa (rice) • Oriza sativa (rice) – Zea mays (maize) • Few AS events are conserved (5% of genes compared to ~50% of genes with AS) • the level of conservation is the same for translated and NDM isoforms Severing…van Hamm, 2009
Constitutive exons becoming alternative • human-mouse comparison, EST data => 612 exons constitutively spliced in one species and alternatively in the other • all are major isoform (predominantly included) • analysis of other species (selected cases): ancestral exons have been constitutive • characteristics of such exons (molecular evolution: Kn/Ks, conservation of intron flanks etc) are similar to those of constitutive exons Lev-Maor…Ast, 2007
Changes in inclusion rate • orthologous alternatively spliced (cassette) exons of human and chimpanzee • quantitative microarray profiling • estimate the inclusion rate by comparison of exon and exon-junction probes => 6-8% of altertnative exons have significantly different inclusion levels Calarco…Blencowe, 2007
Sources of new exons • exon shuffling and duplications • mutually exlusive exons • exonisation: new exons, new sites • in repeats • constitutive exons becoming alternative
Alternative splice sites: Model of random site fixation • Plots: Fraction of exon-extending alternative sites as dependent on exon length • Main site defined as the one in protein or in more ESTs • Same trends for the acceptor (top) and donor (bottom) sites • The distribution of alt. region lengths is consistent with fixation of random sites • Extend short exons • Shorten long exons
A natural model: genetic diseases • Mutations in splice sites yield exon skips or activation of cryptic sites • Exon skip or activation of a cryptic site depends on: • Density of exonic splicing enhancers (lower in skipped exons) • Presence of a strong cryptic nearby Kurmangaliev & Gelfand, 2008
Creation of sites Vorechovsky, 2006; Buratti…Vorechovsky, 2007
MAGE-A family of human CT-antigens • Retroposition of a spliced mRNA, then duplication • Numerous new (alternative) exons in individual copiesarising from point mutations Creation of donor sites
Exonisation of repeats • early studies: 61 alternatively spliced translated exon with hits to Alu (no constitutive exons) • 84% frame-shiting or stop-containing • exonisation by point mutations in cryptic sites in the Alu consensus • studied in experiment • both donor and acceptor sites • recent studiy: 1824 human exons, 506 mouse exons • Alu, L1, LTR may generate completely new exons Sorek, Ast, Graur, 2002; Lev-Maor…Ast, 2003; Sorek…Ast, 2004; Sela…Ast, 2007
Evolutionary rate in constitutive and alternative regions • Human and mouse orthologous genes • D. melanogaster and D. pseudoobscura • Estimation of the dn/ds ratio:higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection
Human/mouse genes: non-symmetrical histogram of dn/ds(const. regions)–dn/ds(alt. regions) Black: shadow of the left half.In a larger fraction of genes dn/ds(alt) > dn/ds(const), especially for larger values
1 Concatenated regions:Alternative regions evolve faster than constitutive ones(*) in some other studies dN(alt)<dN(const): less synonymous substitutions in alternaitve regions dN/dS dS dS dN/dS dN dN 0
1 Weaker stabilizing selection (or positive selection) in alternative regions (insignificant in Drosophila) dN/dS dS dS dN/dS dN dN 0
1,5 Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions, in internal alternative regions dN/dS Different behavior of terminal alternatives Mammals: Density of substitutions increases in the N-to-C direction dS dN 0
Many drosophilas, different alternatives dN in mutually exclusive exons same as in constitutive exons dS lower in almost all alternatives:regulation?
The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions • Human and chimpanzee genome substitutions vs human SNPs • Exons conserved in mouse and/or dog • Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance Minor isoform alternative regions: • More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% • More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37% • Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Ka/Ks) ~25% positions • Similar results for all highly covered genes or all conserved exons
An attempt of integration • AS is often species-specific • young AS isoforms are often minor and tissue-specific • … but still functional • although species-specific isoforms may result from aberrant splicing • AS regions show evidence for decreased negative selection • excess non-synonymous codon substitutions • AS regions show evidence for positive selection • excess fixation of non-synonymous substitutions (compared to SNPs) • AS tends to shuffle domains and target functional sites in proteins • Thus AS may serve as a testing ground for new functions without sacrificing old ones
What next? • Changes in inclusion rates (mRNA-seq) • revisit constitutive-becoming-alternative exons • Other taxonomical groups • Evolution of regulation • donor and acceptor splicing sites • splicing enhabcers and silencers • cellular context (SR-proteins etc.) • Control for: • functionality: translated / NMD-inducing (frameshifts, stop codons) • exon inclusion (or site choice) level: major / minor isoform • tissue specificity pattern (?) • type of alternative – 1: N-terminal / internal / C-terminal • type of alternative – 2: cassette and mutually exclusive exons, alternative sites, etc.
Acknowledgements • Discussions • Eugene Koonin (NCBI) • Igor Rogozin (NCBI) • Vsevolod Makeev (GosNIIGenetika) • Dmitry Petrov (Stanford) • Dmitry Frishman (GSF, TUM) • Sergei Nuzhdin (USC) • Support • Howard Hughes Medical Institute • Russian Academy of Sciences (program “Molecular and Cellular Biology”) • Russian Foundation of Basic Research
Authors • Andrei Mironov (Moscow State University) • Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog • Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito • Ekaterina Ermakova (IITP) – Kn/Ks • Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test • Irena Artamonova (Inst. of General Genetics and IITP, Moscow) – human/mouse, plots, MAGE-A