1 / 5

ENCODE, BETHESDA

Gene & Transcripts group July 16, 3PM progress update. ENCODE, BETHESDA. Analysis Subjects. characterization of known functional domains given genome wide transcriptional surveys (transfrags, cages and ditags) pseudogenes protein coding genes noncoding RNAs

cheri
Download Presentation

ENCODE, BETHESDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene & Transcripts group July 16, 3PM progress update ENCODE, BETHESDA

  2. Analysis Subjects • characterization of known functional domains given genome wide transcriptional surveys (transfrags, cages and ditags) • pseudogenes • protein coding genes • noncoding RNAs • transcription on unannotated regions of the genome

  3. Protein Coding Genes • The complexity of loci: transcript vs protein • how often do we find different first exons (CAGE, ditags, gencode) • a list of confident first exons [france] • Tissue/cell line specific transcript annotation (see Strategy slide) • List of exons, transcripts, loci with expression levels per cell • Variation of exon expression within a gene • Association between repeats and transcription • correlation between density of repeats (alus, lines) and expression level of transcripts • distance between transfrags and alus. Can transfrags be explained by runoff transcription of alus. • Characteristics of genes depending on cell line / condition • alternative splicing • number of transcritps • number of exons per gene/ quality of splice sites Non coding RNAs • a catalogue of the known non-coding RNAs in the ENCODE regions Pseudogenes Classification of Pseudogenes duplicated vs processed

  4. Gene Locus (defined by the boundaries of the longest isoform IsoformsCell TypeCluster Class a) X 1 1 Transfrags b) Y 2 2 Transfrags c) Z 3 Transfrags None • Four Characteristics of each isoform transcript • Mean of exons per transcript • Range (Max-Min exons per transcript) • Max (number of exons per transcript) • SD SOM analysis K-Means Cluster

  5. Transcription on Unannotated Regions of the Genome 1) Classification of unannotated transcribed regions into intronic/ intergenic, proximal/distal, conserved mammalian/deeper • Relation to conserved secondary structure (overlap transfrags and evafold, rnaz, …). compensatory SNPs 3) Motifs for transfrags related to exiting motifs and then see if remaining TFRs cluster into new motifs 4) Cell specificity of unannotated TFRs -TFRs present in all 11 samples -TFRs present in only one sample -TFRs present in more than 1 sample 5) Number of 5’ ends based on CAGE and ditags 6) Number of sense/antisense unannotated TFRs.

More Related