Download
properties of pseudogenes g n.
Skip this Video
Loading SlideShow in 5 Seconds..
Properties of pseudogenes ( G) PowerPoint Presentation
Download Presentation
Properties of pseudogenes ( G)

Properties of pseudogenes ( G)

784 Views Download Presentation
Download Presentation

Properties of pseudogenes ( G)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Properties of pseudogenes (G) • Genomic DNA sequences • Homology to known genes • Non-functional copies • No natural selection pressure • Disablements: frameshifts & stop codons • Small Indels • Inserted repeats (LINE/Alu)

  2. Duplicated pseudogenes Original Gene Gene Duplication Mutations Pseudogene • retains intron/exon structure • e.g. globins, Hox cluster

  3. Processed pseudogenes (retroelements) Original Gene AAAAAA LINE-11 mediated retrotransposition AAAAAA Pseudogene AACATA • Mostly dead-on-arrival • Intronless, poly-A tail, short direct repeats

  4. Why study pseudogenes? • Contamination in sequence databases • Abundant: ~20k pseudogenes in human • ~8K processed, many ribosomal • 80 human ribosomal protein genes by experiments, few hundred in ENSEMBL • Interfere with study on functional genes • Cross-hybridization in microarray experiments • Generate false positives in gene prediction • Pseudogenes are “genomic fossils” • Study the evolution of genes and genomes • Measure mutation/insertion rates

  5. Human cytochrome C gene and pseudogenes Cyc MGDVEKGKKIFIMKCSQCHTVEKGGK-HKTGPNLHG-LFG-RKTGQAPGYSYTAAN cyc --DVEKGKKIFIQKCVQWHTMEKEGK/HETGLNLHG/LLG/RKTGQVIGFSYTDSN Cyc KNKGIIWGEDTLMEYLENPKKYIPGTKMIFVG-IKKKEERADLIAY-LKKATNE cyc KNKGIT*GEDTLKEYLENLKKYIPGTK**YFL/VTKKAERADLITYL\EKATNE

  6. Raw Genome Sequence Search for gene homology Protein Databases Pipeline for pseudogene assignment Remove overlap with known annotations Gene Annotation Classify into 3 categories: processed, duplicated and fragments Post to UCSC and pseudogene.org

  7. Schematic Flowchart of Pseudogene Identification Gene Protein BLASTN TBLASTN …GCTATTTNNNGGGCCAATTATGCG… ENCODE regions with RepeatMasker BLAST Hits Link Hits FASTA / GeneWise Further processing: disablements, classification, other features

  8. Pseudogenes we see in ENCODE regions

  9. Pseudogenes and Ensembl proteins

  10. Yale, Ensembl and Havana pseudogenes

  11. Pseudogenes, Genes and Exons in ENm004 and ENr323

  12. Yale 19 Yale 76 29 Vega 17 111 Vega 42 6 2 ENSEMBL 6 23 ENSEMBL 2 9 13 Regions All 44 Regions Overlap between 3 pseudogene annotation sets

  13. Different criteria for pseudogenes: Features for assignment vs those surveyed Used in assignment Used for survey Not used

  14. ENCODE analysis strategy • Identified genome-wide processed pseudogenes • Added duplicated pseudogenes for ENCODE • Performed detailed analysis on chr22 • Interrelated different gene & pseudogene annotation with tiling array data • Will adopt similar approach for ENCODE

  15. Genes Chr22 PCR tiling array, probe expression in 3 cell lines MAS oligonucleotide arrays Affymetrix tiling arrays expression in 11 cell lines EST CpG islands Integration of different transcription data for genes and pseudogenes on chr22 Pseudogenes Transcription factors binding sites from ChIP-Chip (CREB, NFkB, p53 etc.) Sequence conservation in rat, mouse and chimp

  16. Distribution of chr22 pseudogenes from different sources

  17. Exon PCR Tile (+) EST CpG NFkB Intersection of Exons with Transcribed Microarray Tiles and other Transcription Evidence A B C

  18. G PCR Tile (+) EST CpG NFkB Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence A B C

  19. A B C G PCR Tile (+), 3 cell lines Affymetrix microarray EST PCR Tile (-), 3 cell lines Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence B: Located within Cat Eye Syndrome Critical Region

  20. Gene Gene Pseudogenes in Human-Mouse Synteny Human • Less than one half of the human processed genes have a homologue gene in the mouse syntenic regions • ~ 60% of the human processed G were created after human/mouse divergence (~ 75Myr ago) Mouse

  21. The Human yG HCP9 is relic of a primordial gene, which still functions in mouse

  22. Reasons why pseudogene annotations differ • Pseudogene prediction is a low priority relative to gene annotation • Done by only a few disparate groups • No standard detection/assignment method • False negatives result from discounting repeat regions

  23. Questions for discussion • Keep separate annotation tracks or merge? • Establish ontology & identification criteria? • How do pseudogenes confound gene annotation and prediction • particularly acute for duplicated pseudogenes • Might be worth devoting more attention to cataloging pseudogenes as a group • Alleviate ambiguities in transcript mapping • Identify prediction false positives

  24. Initial Pseudogene Discovery Pipeline

  25. Examples of Functional Pseudogenes • In snail L. stagnalis, expression of Nitric Oxide Synthase (nNOS) is suppressed by an antisense RNA transcribed from an NOS pseudogene [Korneev, Park, O’shea, J. Neuroscience, 1999] • In mouse, a pseudogene regulates expression of Makorin1 gene by binding to a transcriptional repressor or an RNA-digesting enzyme [Hirotsune et al. Nature423 2003] Ancestral NOS gene NOS gene NOS Normal mouse NOS RNA NOS RNA Formation of RNA duplex and suppression of protein production from NOS Transgenic mouse

  26. Pseudogene.org :-- a comprehensive database of yGs

  27. Pseudogene Challenges: Confidence Values • Pseudogenes cannot be verified experimentally • How changing computational parameters affect our level of confidence? • Can we quantify confidence? Accept result with 90% match to existing gene Result Set 1 Fragment Set Filter Accept result with 70% match to existing gene Result Set 2