exploring dead genes n.
Skip this Video
Loading SlideShow in 5 Seconds..
EXPLORING DEAD GENES PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 19

EXPLORING DEAD GENES - PowerPoint PPT Presentation

  • Uploaded on

EXPLORING DEAD GENES. Adrienne Manuel I400. What are they?. Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA Results from reverse transcription from an mRNA transcript Or from gene duplication and subsequent disablement.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'EXPLORING DEAD GENES' - harriet

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
exploring dead genes


Adrienne Manuel


what are they
What are they?
  • Dead Genes are also called Pseudogenes
  • Pseudogenes are non functioning copies of genes in DNA
  • Results from reverse transcription from an mRNA transcript
  • Or from gene duplication and subsequent disablement
expression of pseudogenes
Expression of Pseudogenes
  • Evidently transcribed
  • Expression of pseudogenes vary
  • Snail (lymnaea stagnalis) example of an organism that still has functioning
pseudogenes good and bad
Pseudogenes, Good and Bad!
  • - Raised expression for tumor cells
  • + Useful in studying molecular evolution
  • + Helpful in determining rates of genomic DNA Loss for an organism
size and distribution of pseudogenes
Size and Distribution of Pseudogenes


  • ‘G’ the total population of confirmed and predicted protein-encoding genes
  • ΨG is the estimated population of pseudogenes that correspond to G
The Set of genes with at least one verifying EST match was derived GE
  • A set of genes that were deemed to be highly expressed was derived from microarray expression data and denoted GM
  • The corresponding predicted tool or pseudogenes is denoted ΨGM
data files
Data Files
  • Sanger Sequencing Centre ftp (ftp://ftp.sanger.ac.uk) in this website are the six complete sequences of worm chromosomes
  • GFF Data Files with annotations for genes and other genomic features that correspond to wormpep18
  • Arranged were the pseudogene population in the form of a pipeline

Step 1: Sanger centre pseudogene annotations

  • Start with list of 332 pseudogenes
  • Pseudogene population was derived by looking for gene disablement

Step 2: FASTA matching to find potential pseudogenes

pipelines continued
PIPELINES (continued)
  • Worm genes masked for low complexity region with the program SEG
  • TFASTX and TFASTY are next used to compare the complete wormpep18 against the worm genome
  • After comparison Pseudogene matches were refined with the next step
pipeline continued
Pipeline (continued)

Step 3: reduction for overlaps on the genomic DNA

  • Significant matches of protein sequences to the DNA were reduced for redundancy where homologs match the same segment of DDNA
  • Matches are then sorted

Step 4: Prevention of over counting for adjacent matches.

  • Initial matches may correspond to same pseudogene
  • To avoid over counting matches were realigned

Step 5: Masking against Sanger Centre annotation and Transposon library.

  • Potential pseudogenes filtered for overlap with any other annotations in the Sanger Centre GFF files e.g. exons of genes, tandem or inverted repeats

Step 6: Reduction for possible additional repeat elements

  • At this point there is a set of 3814 pseudogenic fragments
pipeline final step
Pipeline (final step)

Step 7: reducing threshold stringency

  • e-value match threshold reduced from .01 to .001

Check the web!

  • http://bioinfo.mbb.yale.edu/genome/womr/pseudogene
  • To find pseudogene population, the data can be viewed either by searching for protein name or viewing specific range in the chromosome
size of pseudogene popuation
Size of Pseudogene Popuation
  • Composed of 2168 sequence, that’s about 12% of total gene complement
  • Factors that affect the size: 1. Dead copies of transposable elements 2. Size of pseudogene underestimated because pseudogenes with less obvious disablement aren't included. 3.Annotated genes might be pseudogenes because disablement is undetectable 4. Pseudogenes still part of functioning gene 5. Some pseudogenes arise due to sequencing errors 6. Possible genomic repeats
  • Highly expressed genes have fewer dead gene copies
  • The most reliable subset of the pseudogene population is about half the total for ΨG.
  • 39% of pseudogenes are intronic-these kinds of pseudogenes aren't ailing families of proteins
chromosomal distributions
Chromosomal Distributions
  • More abundant near the ends of chromosome (the “arms”)
  • For each chromosome, there is a calculated proportion of dead genes
The data plot above indicates genome to genome over all age.
  • The percentage composition for each of the 20 amino acids is graphed in decreasing order of the implied amino acid composition in the pseudogene set. In the bottom part of the figure, the G difference for each amino acid composition is indicated by a bar.
Listed are the largest sequence families in the worm ranked by genes and pseudogenes
  • They’re named for their particular representative. Four of the 10 paralog genes family when ranked by number are functionally uncharacterized
  • Three of the pseudogenes top 10 are amongst the biggest families when we rank according to number of genes
  • These charts ranked in terms of implied structural pseudofolds
  • Proteins encoded by the worm genome have been assigned to globular domain folds
  • From the SCOP database
why was this studied again
Why was this studied again?
  • To provide an initial estimate of the size distribution and characterizations of the pseudogene comparing C.elegans in attempt to estimate the total number in humans.
  • Found few pseudogenes that are apparently due to processing in the worm genome
  • Found large uncharacterized gene family that makes up 2/3 of dead genes
  • Arms of chromosome are an unreliable for encoding genes but more likely to spawn new proteins