Properties of pseudogenes ( G)

Properties of pseudogenes (G) • Genomic DNA sequences • Homology to known genes • Non-functional copies • No natural selection pressure • Disablements: frameshifts & stop codons • Small Indels • Inserted repeats (LINE/Alu)

Duplicated pseudogenes Original Gene Gene Duplication Mutations Pseudogene • retains intron/exon structure • e.g. globins, Hox cluster

Processed pseudogenes (retroelements) Original Gene AAAAAA LINE-11 mediated retrotransposition AAAAAA Pseudogene AACATA • Mostly dead-on-arrival • Intronless, poly-A tail, short direct repeats

Why study pseudogenes? • Contamination in sequence databases • Abundant: ~20k pseudogenes in human • ~8K processed, many ribosomal • 80 human ribosomal protein genes by experiments, few hundred in ENSEMBL • Interfere with study on functional genes • Cross-hybridization in microarray experiments • Generate false positives in gene prediction • Pseudogenes are “genomic fossils” • Study the evolution of genes and genomes • Measure mutation/insertion rates

Human cytochrome C gene and pseudogenes Cyc MGDVEKGKKIFIMKCSQCHTVEKGGK-HKTGPNLHG-LFG-RKTGQAPGYSYTAAN cyc --DVEKGKKIFIQKCVQWHTMEKEGK/HETGLNLHG/LLG/RKTGQVIGFSYTDSN Cyc KNKGIIWGEDTLMEYLENPKKYIPGTKMIFVG-IKKKEERADLIAY-LKKATNE cyc KNKGIT*GEDTLKEYLENLKKYIPGTK**YFL/VTKKAERADLITYL\EKATNE

Raw Genome Sequence Search for gene homology Protein Databases Pipeline for pseudogene assignment Remove overlap with known annotations Gene Annotation Classify into 3 categories: processed, duplicated and fragments Post to UCSC and pseudogene.org

Schematic Flowchart of Pseudogene Identification Gene Protein BLASTN TBLASTN …GCTATTTNNNGGGCCAATTATGCG… ENCODE regions with RepeatMasker BLAST Hits Link Hits FASTA / GeneWise Further processing: disablements, classification, other features

Pseudogenes we see in ENCODE regions

Pseudogenes and Ensembl proteins

Yale, Ensembl and Havana pseudogenes

Pseudogenes, Genes and Exons in ENm004 and ENr323

Yale 19 Yale 76 29 Vega 17 111 Vega 42 6 2 ENSEMBL 6 23 ENSEMBL 2 9 13 Regions All 44 Regions Overlap between 3 pseudogene annotation sets

Different criteria for pseudogenes: Features for assignment vs those surveyed Used in assignment Used for survey Not used

ENCODE analysis strategy • Identified genome-wide processed pseudogenes • Added duplicated pseudogenes for ENCODE • Performed detailed analysis on chr22 • Interrelated different gene & pseudogene annotation with tiling array data • Will adopt similar approach for ENCODE

Genes Chr22 PCR tiling array, probe expression in 3 cell lines MAS oligonucleotide arrays Affymetrix tiling arrays expression in 11 cell lines EST CpG islands Integration of different transcription data for genes and pseudogenes on chr22 Pseudogenes Transcription factors binding sites from ChIP-Chip (CREB, NFkB, p53 etc.) Sequence conservation in rat, mouse and chimp

Distribution of chr22 pseudogenes from different sources

Exon PCR Tile (+) EST CpG NFkB Intersection of Exons with Transcribed Microarray Tiles and other Transcription Evidence A B C

G PCR Tile (+) EST CpG NFkB Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence A B C

A B C G PCR Tile (+), 3 cell lines Affymetrix microarray EST PCR Tile (-), 3 cell lines Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence B: Located within Cat Eye Syndrome Critical Region

Gene Gene Pseudogenes in Human-Mouse Synteny Human • Less than one half of the human processed genes have a homologue gene in the mouse syntenic regions • ~ 60% of the human processed G were created after human/mouse divergence (~ 75Myr ago) Mouse

The Human yG HCP9 is relic of a primordial gene, which still functions in mouse

Reasons why pseudogene annotations differ • Pseudogene prediction is a low priority relative to gene annotation • Done by only a few disparate groups • No standard detection/assignment method • False negatives result from discounting repeat regions

Questions for discussion • Keep separate annotation tracks or merge? • Establish ontology & identification criteria? • How do pseudogenes confound gene annotation and prediction • particularly acute for duplicated pseudogenes • Might be worth devoting more attention to cataloging pseudogenes as a group • Alleviate ambiguities in transcript mapping • Identify prediction false positives

Initial Pseudogene Discovery Pipeline

Examples of Functional Pseudogenes • In snail L. stagnalis, expression of Nitric Oxide Synthase (nNOS) is suppressed by an antisense RNA transcribed from an NOS pseudogene [Korneev, Park, O’shea, J. Neuroscience, 1999] • In mouse, a pseudogene regulates expression of Makorin1 gene by binding to a transcriptional repressor or an RNA-digesting enzyme [Hirotsune et al. Nature423 2003] Ancestral NOS gene NOS gene NOS Normal mouse NOS RNA NOS RNA Formation of RNA duplex and suppression of protein production from NOS Transgenic mouse

Pseudogene.org :-- a comprehensive database of yGs

Pseudogene Challenges: Confidence Values • Pseudogenes cannot be verified experimentally • How changing computational parameters affect our level of confidence? • Can we quantify confidence? Accept result with 90% match to existing gene Result Set 1 Fragment Set Filter Accept result with 70% match to existing gene Result Set 2

Properties of pseudogenes ( G)

Properties of pseudogenes ( G)

Presentation Transcript

Physical Properties of the Snowpack

8.5 Properties of Logarithms

Properties of Polymers

Properties of Algebra

Dissecting Self-* Properties

Chapter 2.1

Properties of Light Forensic Light

PROPERTIES OF FLUIDS

Properties of Matter: Physical Properties

Sequence from Exon 1 Deleted

Physical Properties

Physical Properties of Matter

Colligative Properties

Properties of all gases

IMPROVED TECHNIQUES FOR THE IDENTIFICATION OF PSEUDOGENES

Properties of Real Numbers

Unit 2: Properties of Matter

Lecture 5 Physical Properties of Grains

PROPERTIES OF MATTER 12.3

Properties of Logarithms

Discussion Points for 2 nd Pseudogene Call