1 / 17

ENCODE Pseudogene Summary for GT call

ENCODE Pseudogene Summary for GT call. Mark Gerstein 2005,10.28 11:00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27. Developed Consensus Set of 198 Pseudogenes. A Derived from a qualified union of GIS, Havana, UCSC, & Yale with a uniform criteria on boundaries

sherri
Download Presentation

ENCODE Pseudogene Summary for GT call

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENCODE Pseudogene Summaryfor GT call Mark Gerstein 2005,10.28 11:00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27

  2. Developed Consensus Set of 198 Pseudogenes A Derived from a qualified union of GIS, Havana, UCSC, & Yale with a uniform criteria on boundaries • Identify a “good” set of human proteins – HAVANA set? • Remove pseudogenes (from all 4 groups) overlapping with current GENCODE exons (does GENCODE have an updated version?). • Create an union of the remaining pseudogenes. • Find the “best” matching proteins for each pseudogene, remove entries without a BLAST hit (e-value cutoff issue?). • Realign each pseudogene to its parent protein to produce a uniform alignment and to define the start and end coordinates. • Apply a threshold to sequence identity and coverage? (No.) • Classify pseudogenes into processed and non-processed (how?) B Overall 222 pseudogenes; application of above receipe gives 198 Consensus (Intersection set of above is 81 (proc) + 49 (non-proc)) C Currently, on test browser + encode wiki + http://pseudogene.org/ENCODE From Deyou Z. + Robert B.

  3. Interesting Complexities of Pseudogene Annotation: Insertion of One Pseudogene into Another One First insertion event heterogeneous nuclear ribonucleoprotein A1 (HNRPA1) pseudogene (parent on Chr12) Remnant of a second, mitochondrial insertion event (has post-insertion deletions) NADH dehydrogenase 2 (MTND2) pseudogene (parent mitochondrial) NADH dehydrogenase 4 (MTND4) pseudogene (parent mitochondrial) cytochrome b (CYTB) pseudogene (parent mitochondrial) Protein evidence From Adam F.

  4. EST Evidence of Expression from a Pseudogene at 5’ UTR of Known Gene LILR pseudogene Frameshift Upstream pseudogene corresponds to exons 1-3 of LILR family genes, 3’ exons have been lost. EST evidence supports expression from the pseudogene locus extending to known gene LILRA3. LILRA3 From Adam F.

  5. TAR/Transfrag Evidence for Transcription in 198 consensus pseudogenes - # of 198 overlapped by interrogated regions (affy arrays): 180 (90.9%) - # of 198 overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated)=> There is evidence of transcription (from tars or transfrags) of the pseudogene or the parent gene (if cross-hybridization) for 53.5% of the consensus pseudogenes (upper bound on transcription) - # overlapping cage tags: 11 (5.5%) - # overlapping ditag tags: 1 (0.5%) (83 (41.9%) are overlapped by full length ditags) From France D.

  6. Example Pseudogene overlapped by tars/transfrags and tags: ENCODE_consensus_187 but pseudogene is 93% similar to parent From France D.

  7. Consensus Pseudogenes with ≥2 ChIP-chip Hits Has Trans-criptional Evidence (intersects Gencode transcript) Look for ChIP-chip hits upstream of the pseudogenes From Deyou Z.

  8. Pot. Transcribed Pseudogene (#177)with Upstream ChIP-chip Hits From Deyou Z.

  9. Select ENCODE pseudogenes from the intersection part of consensus set 49 non-processed, 125 processed Designed oligos (25mer, Tm 70°C) Either specific to pseudogene or shared between parental gene and pseudogene Doing 5’RACE in 12 human tissues Brain, heart, kidney, spleen, liver, colon, sm. intestine, muscle, lung, stomach, testis, placenta First 96 pseudogenes 5’RACEs done in 12 tissues Last 78 will be done next week To do: pool multiple RACEs, send to Santa Clara and hybridize to Affymetrix ENCODE 20 nucleotide resolution arrays Experiments to Validate Expression of Encode Pseudogenes Stylianos Antonarakis, Robert Baertsch, Jorg Drenkow, Tom Gingeras, Charlotte Henrichsen Philipp Kapranov, Catherine Ucla, Alexandre Reymond Affymetrix, UCSC, University of Geneva, University of Lausanne From Alex R.

  10. Extra Slides

  11. Pseudogene group Core people:Jennifer Harrow <jla1@sanger.ac.uk>, WEI Chia-Lin <weicl@gis.a-star.edu.sg>, Adam Frankish <af2@sanger.ac.uk>, "Dike, Sujit" <Sujit_Dike@affymetrix.com>, Robert Baertsch <baertsch@SOE.UCSC.EDU>, fdenoeud@imim.es, Deyou Zheng <zhengdy@csb.yale.edu>, Yontao Lu <ytlu@SOE.UCSC.EDU> Alexandre.Reymond@medecine.unige.ch, ytlu@SOE.UCSC.EDU Others: "Hoyem, Tara L" <Tara.Hoyem@pnl.gov>, Roderic Guigo Serra <rguigo@imim.es>, "'Gingeras, Tom'“ Tom_Gingeras@affymetrix.com>, thomas.royce@yale.edu, Suganthi Balasubramanian suganthi@csb.yale.edu 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27

  12. Refresher: many repetitions of the below “Venn analysis” 54 (2) Havana-Gencode:165 pseudogenes (167 -2 ) 17 (2) 16 (0) Yale: 167 pseudogenes (164 + 3) 81 (34) 15 (1) 16 (7) 7 Havana agrees to be added (8, 11, 40, 59, 139, 152, 169). 4 at coding loci. [Yale agrees to delete] 1 with weak sequence identity.* 5 with “non-real” proteins.* Numbers according to Adam’s note 33 (1) UCSC retrogenes: 146 not expressed 9 Havana agrees to be added. 2 at coding loci. [Yale agrees to delete] 1 with weak sequence identity.* 2 with “non-real” proteins.* * Solved by consistent protein set & threshold

  13. From Adam F. Rearranged exon order in unprocessed pseudogene Dot plot protein evidence vs genome adaptor-related protein complex 1, beta 1 subunit (AP1B1) pseudogenes Protein evidence Exon 6 Exon 3 Splice sites same as parent gene Following duplication of the AP1B1 locus rearrangements/duplications have produced two unprocessed pseudogenes corresponding to exons 6 and 3 of the parent gene

  14. From Adam F. Rearrangement of processed pseudogene mRNA dot plot pseudogene similar to part of ribosomal protein L3 (RPL3) Following insertion, one end of the RPL3 pseudogene has been flipped onto the opposite strand (with some loss of internal sequence) Protein dot plot

  15. Overlaps by tar/transfrag subset - Nb overlapped by interrogated regions (affy arrays): 180 (90.9%) - Nb overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated) - Nb overlapped by yale tars (union): 84 (42.4% of all ; 46.7% of interrogated) - Nb overlapped by affy transfrags (union): 102 (51.5% of all ; 56.7% of interrogated) - Nb overlapped by polyA+ tars/transfrags (union) 105 (53% of all ; 58.3% of interrogated) - Nb overlapped by total RNA tars (union) 61 (30.8% of all ; 33.9% of interrogated) From France D.

  16. Expression from pseudogene locus (1) – putative novel transcript Aligned proteins (column collapsed) HAVANA sialyltransferase pseudogene (RP3-477O4.5) supported by protein evidence Supporting EST (100% ID) Putative novel transcript supported by a single EST with has a polyA site and signal polyA site and signal Appears to be some transcription from this locus which is supported at the 3’ end by a single EST From Adam F.

  17. Intersect Consensus Pseudogenes with ChIP-chip Hits From Deyou Z.

More Related