1 / 40

Aziz Khan, Ning Wang April 22, 2013

jerome
Download Presentation

Aziz Khan, Ning Wang April 22, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Landscape of transcription in human cellsS. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger,C. Xue, G. K. Marinov, J. Khatun, B. A. Williams, C. Zaleski, J. Rozowsky, M. R¨oder, F. Kokicinski, R. F. Abdelhamid, T.Alioto, I. Antoshechkin, M. T. Baer, N. S. Bar, P. Batut, K. Bell, I. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado,T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, E. Falconnet, M. Fastuca, K. Fejes-Toth, P. Ferreira, S.Foissac, M. J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. Gunawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov,B. King, C. Kingswood, O. J. Luo, E. Park, K. Persaud, J. B. Preall, P. Ribeca, B. Risk, D. Robyr, M. Sammeth, L.Schaffer, L.-H. See, A. Shahab, J. Skancke, A. M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, H.Wrobel, Y. Yu, X. Ruan, Y. Hayashizaki, J. Harrow, M. Gerstein, T. Hubbard, A. Reymond, S. E. Anonarakis, G. Hannon,M. C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guig´o, & T. R. Gingeras. Nature 489 (2012) 101–108. Aziz Khan, Ning Wang April 22, 2013 GROUP: 3

  2. Outline • Motivation & goal • Workflow • Data generation • Results • Long RNA expression landscape • Short RNA expression landscape • RNA editing & allele-specific expression • Repeat region transcription • Characterization of enhancer RNA • Conclusion Landscape of transcription in human cells, Djebali et al. Nature 2012

  3. Motivation and goal • ENCODE pilot phase (2003–2007) • Examine 1% of the human genome • The ENCODE second phase (2007-2012) • To interrogate the complete human genome • 80%of the human genome have at least one biochemical function • Goal: • Provide a genome-wide catalogue of the produced RNAs • Identify the subcellularlocalization for the produced RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012

  4. Experimental workflow K562 cell line • RNA-PET(pair end tags) • sites of 5’ & 3’ transcripts termini • CAGE(cap analysis of gene expression) • sites of initiation of transcription Landscape of transcription in human cells, Djebali et al. Nature 2012

  5. 15 ENCODE cell lines Landscape of transcription in human cells, Djebali et al. Nature 2012

  6. RNA data processing Landscape of transcription in human cells, Djebali et al. Nature 2012

  7. RNA data & processing software • The mapped data was used to assemble and quantify annotated GENCODE v7 elements • Elements and quantifications were further assessed for reproducibility between replicates using a non-parametric version of IDR Landscape of transcription in human cells, Djebali et al. Nature 2012

  8. Long RNA expression landscape.

  9. Detection of annotated transcripts GENCODE elements are detected by RNA-seq data • 70% of annotated splice junctions, transcripts and genes were cumulatively detected • ~ 85% of annotated exons with an average of coverage 96% (by RNA-seq). • Small variation in the proportion of detected GENCODE elements • Only a small proportion of GENCODE elements are detected exclusively in the Poly-A− RNA fraction Landscape of transcription in human cells, Djebali et al. Nature 2012

  10. Detection of novel transcripts • The identified novel elements covered 78% of the intronicnucleotides and 34% of the intergenicsequences • Used Cufflinks to predict (over all long RNA-seq samples) the following elements in intergenic and antisense regions: • 94,800 exons (19%) • 69,052 splice junctions (22%) • 73,325 transcripts (45%) • 41,204 genes (80%) Landscape of transcription in human cells, Djebali et al. Nature 2012

  11. Independent validation of multi-exonic transcript models • Using overlapping targeted Roche FLX 454 paired-end reads and mass spectrometry • ~ 3,000 intergenic and antisense transcript models tested • Validation rates from70%to 90%were observed • These experiments identified >22,000 novel splice • Almost 8x increase compared to the sites originally detected with RNA-seq Thus, most novel transcripts seem to lack protein-coding capacity Landscape of transcription in human cells, Djebali et al. Nature 2012

  12. The transcriptome of nuclear subcompartments(K562) • Thus, by analysing short and long RNAs in the different • subcellular compartments: • They confirm that splicing predominantly occurs during transcription. • For K562 cell line, total RNA isolated from: • Chromatin, Nucleolus and Nucleoplasm • 51.64% (18,330) of the GENCODE (v7) annotated genes detected for all 15 cell lines (35,494) were identified • Only a small fraction of annotated/novel elements was unique to that compartment Landscape of transcription in human cells, Djebali et al. Nature 2012

  13. Co-transcriptional splicing Short read mappings for exon-based splicing completion • The complete splicing index (coSI): a, b = exon inclusion c = exon exclusion d, e = exons not being completed coSI = Landscape of transcription in human cells, Djebali et al. Nature 2012

  14. Co-transcriptional splicingDistribution of coSIscores computed on GENCODE internal exons Distribution in cytosolic Ploy-A+ RNA fraction Distribution in the total chromatin RNA fraction Landscape of transcription in human cells, Djebali et al. Nature 2012

  15. Gene expression across cell lines Abundance of gene types in cellular compartments lncRNAs contributes more to cell-line specificity than protein-coding genes Abundance of gene types in cellular compartments The distribution of gene expression is very similar across cell lines Protein-coding genes having on average higher expression levels than lncRNAs. Landscape of transcription in human cells, Djebali et al. Nature 2012

  16. Isoform expression within a gene Cannot distinguish whether this is the result of multiple isoforms expressed in the same cell or of different isoforms expressed in different cells. Landscape of transcription in human cells, Djebali et al. Nature 2012

  17. Transcription initiation Workflow of CAGE processing and elements: • Raw CAGE reads mapped to hg19 genome (using Delve) • Delve: a probabilistic mapper using HMM • Reads with bad mapping quality were discarded • Mapped reads  clustered using paraclu • Clusters shorter than 200bp were selected Using TSS predictor: • A non-supervised classifier based on modeling sequences surrounding CAGE regions via HMMs. • To capture sequence motifs of length 2–8 present at a certain distance from the middle of each cluster Landscape of transcription in human cells, Djebali et al. Nature 2012

  18. Transcription initiation • 82,783 non-redundant TSSs were identified • ~ 48% of the CAGE-identified TSSs are located within 500bp of an annotated RNA-seqdetected GENCODE TSS. • 3% are within 500bp of a novel TSS. Landscape of transcription in human cells, Djebali et al. Nature 2012

  19. Transcription termination • 128,824 sites mapping within annotated GENCODE transcripts were identified. • Trim unmapped RNA-seq reads with long terminal poly-As first. • ~ 20% mapped proximal to annotated poly-A sites (PAS). • ~ 80% correspond to novel PAS. • It increased the average number of PAS per gene from 1.1 to 2.5 • A cell-specific preference for proximal PAS in the cytosol compared to the nucleus. Landscape of transcription in human cells, Djebali et al. Nature 2012

  20. Short RNA expression landscape

  21. Annotated small RNAs *Includes all other GENCODE small transcript biotypes except for pseudogenes. †All elements that have passed npIDR (0.1). ‡Number of detected miRNAs with an expressed annotated guide (with an annotated guide in mirbase). §Number of detected miRNAs with an expressed annotated passenger (with an annotated passenger in mirbase). ∥Short RNA-seq mapping for which the 5′ end starts 5 bp after the start and ends 5 bp before the end of a detected gene. Landscape of transcription in human cells, Djebali et al. Nature 2012

  22. Annotated small RNAs (K562) Landscape of transcription in human cells, Djebali et al. Nature 2012

  23. Unannotated short RNAs Two types: 1.Subfragmentsof annotated small RNAs 2.Second and largest source of unannotated short RNAs corresponds to novel short RNAs , associated with promoter, terminator regions of annotated genes ,and termini-associated short RNAs . Landscape of transcription in human cells, Djebali et al. Nature 2012

  24. Genealogy of short RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012

  25. Genealogy of short RNAs About 6% of all annotated long transcripts overlap with small RNAs and are probably precursors to these small RNAs. Many long RNAs seem to have dual roles, as functional RNAs, and as precursors for many important classes of small RNAs. Landscape of transcription in human cells, Djebali et al. Nature 2012

  26. RNA editing & allele-specific expression.

  27. RNA editing Cells can make discrete changes to specific nucleotide sequences within a RNA molecule after it has been generated by DNA A-I editing (main form of RNA editing in mammals) and C-U editing They found: A-I editing (88%) and T-C (5%) Notice:Their results do not support a recent report of a substantial number of non-canonical SNV edits in the RNA of human lymphoblastoid cells30. Landscape of transcription in human cells, Djebali et al. Nature 2012

  28. Allele-specific expression (GM12878 RNA-seq datasets) 18% of both GENCODE annotated protein-coding & non-coding genes exhibit allele-specific expression. Similar proportion of genes with allele-specific expression in whole-cell, cytoplasm, & nucleus. Used AlleleSeq pipeline [Rozowsky et al. Mol. Syst. Biol. 2011] Landscape of transcription in human cells, Djebali et al. Nature 2012

  29. Repeat region transcription

  30. Repeat region transcription 18% (14,828) of CAGE-defined TSS regions overlap repetitive elements. They found that CAGE clusters mapping to repeat regions were noticeably more narrowly expressed than CAGE clusters mapping within genic regions (Shown by Shannon Entropy) Cell-line specificity Indicate they may have real functions Landscape of transcription in human cells, Djebali et al. Nature 2012

  31. Characterization of enhancer RNA

  32. Transcription at enhancers RNA polymerase II binds some distal enhancer regions and produce enhancer-associated transcripts (eRNAs) Landscape of transcription in human cells, Djebali et al. Nature 2012

  33. Pattern of RNA elements around enhancer predictions Landscape of transcription in human cells, Djebali et al. Nature 2012

  34. Enhancer transcripts differ from promoter transcripts Landscape of transcription in human cells, Djebali et al. Nature 2012

  35. Chromatin state at transcribed enhancers Landscape of transcription in human cells, Djebali et al. Nature 2012

  36. Enhancer activity & transcription is cell-type specific Landscape of transcription in human cells, Djebali et al. Nature 2012

  37. Summary • Extend the current genome-wide annotated catalogue of long poly-adenylated& small RNAs of GENCODE • 62.1%and 74.7% of the human genome are covered by either processed or primary transcripts • Primary= contigs + introns + GENCODE genes • Processed= contigs + GENCODE exons • On average one cell line transcribe 39% of the genome • No cell line transcribes more than 56.7%of the transcriptomesacross all cell lines • The consequent reduction in the length of intergenic regions leads to a significant overlapping of neighbouringgenic regions and prompts a redefinition of a gene Landscape of transcription in human cells, Djebali et al. Nature 2012

  38. Summary • A tendency for genes to express many isoforms simultaneously • (plateau: 10–12 expressed isoforms per gene per cell line) • Cell-type-specific enhancers are promoters differentiable from other regulatory regions • Coding & non-coding transcripts are predominantly localized in the cytosol and nucleus, respectively • ~ 6% of all annotated coding and non-coding transcripts overlap with small RNAs and probably precursors to these small RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012

  39. Redefinition of the concept of a gene Novel genes increase the proportion of small intergenic regions Landscape of transcription in human cells, Djebali et al. Nature 2012

  40. Thank you!  Landscape of transcription in human cells, Djebali et al. Nature 2012

More Related