Annotation of drosophila virilis
1 / 34

Annotation of Drosophila virilis - PowerPoint PPT Presentation

  • Uploaded on

Annotation of Drosophila virilis. Chris Shaffer GEP workshop, 2006. Annotation of D. virilis. Outline of general technique and then one practical example This technique may not be the best with other projects (e.g. corn, bacteria) The technique optimized for projects:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Annotation of Drosophila virilis' - gusty

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Annotation of drosophila virilis

Annotation of Drosophila virilis

Chris Shaffer GEP workshop, 2006

Annotation of d virilis
Annotation of D. virilis

  • Outline of general technique and then one practical example

  • This technique may not be the best with other projects (e.g. corn, bacteria)

  • The technique optimized for projects:

    • Moderately close, well annotated neighbor species

    • No EST, mRNA or expression data available

Helpful hints
Helpful Hints

  • Evolutionary distance between D. virilis and D. melanogaster is much larger than chimp to human

    • Conservation will be at the protein domain level

    • Synteny is detectable in some fosmids

    • Most genes stay on the same chromosome (3 exceptions seen in ~40 genes)

D virilis
D. virilis

  • Average gene size will be smaller than mammals

  • Very low density of pseudogenes

  • Almost all genes in virilis will have the same basic structure as melanogaster orthologs; mapping exon by exon works well for most genes

How to proceed
How to proceed

First, identify features of interest:

  • Genscan results

    • Watch out for ends - fused or split genes

  • Regions of high similarity with D. melanogaster protein, identified by BLAST

    • Overlapping genes usually on opposite strand

    • Be vigilant for partial genes at fosmid ends

  • Regions with high similarity to known genes (i.e. BLAST to nr) not covered above

Basic procedure
Basic Procedure

For each feature of interest:

  • Identify the likely ortholog in D. m.

  • Use D. m. database to find gene model of ortholog and identify all exons

  • Use BLASTX to identify locations and frames of each exon, one by one

  • Based on locations, frames, and gene predictions, find donor and acceptor splice sites that link frames together; identify the exact base location (start and stop) of each coding exon

  • double check your results by translation

Basic procedure graphically
Basic procedure (graphically)



BLASTX of predicted gene to melanogaster proteins suggests this region orthologous to Dm gene with 5 exons:

BLASTX of each exon to locate region of similarity:






Basic procedure graphically1
Basic procedure (graphically)






Zoom in on ends of exons and find first met, matching intron Doner (GT) and Acceptor (AG) sites and final stop codon







Once these have been identified, write down the exact location of the first base and last base of each exon. Use these numbers to check your gene model











Example annotation
Example Annotation

  • Open Safari and go to

  • Click on Genome Browser

Example annotation1
Example Annotation

  • Settings are:

    Insect; D. virilis; Mar. 2005; chr10

    (chr10 is a fosmid from 2005)

  • Click submit

Example annotation2
Example Annotation

  • Seven predicted Genscan genes

  • Each one would be investigated

Investigate 10 4
Investigate 10.4

  • All putative genes will need to be analyzed; we will focus on 10.4 in this example

  • To zoom in on this gene enter:

    chr10:15000-21000 in position box

    Then click jump button

Step 1 find ortholog
Step 1: Find Ortholog

  • If this is a real gene it will probably have at least some homology to a D. melanogaster protein

  • Step one: do a BLAST search with the predicted protein sequence of 10.4 to all proteins in D. melanogaster

Step 1 find ortholog1
Step 1: Find Ortholog

  • Click on one of the exons in gene 10.4

  • On the Genscan report page click on Predicted Protein

  • Select and copy the sequence

  • Do a blastp search of the predicted sequence to the D. melanogaster “Annotated Proteins” database at

Step 1 find ortholog2
Step 1: Find Ortholog

  • The results show a significant hit to the “A” and “B” isoforms of the gene “mav”

Step 1 results of ortholog search
Step 1: Results of Ortholog search

  • The alignment looks right for virilis vs. melanoaster- regions of high similarity interspersed with regions of little or no similarity

  • We have a probable ortholog: maverick

Step 2 gene model
Step 2: Gene model

  • What does mav look like?

  • Go to ENSEMBL to get exons and map them to regions:

    • Web brower- go to

Step 2 gene model1
Step 2: Gene model

  • Click on Drosophila

  • Search for mav (top right search box)

  • Click on “Ensembl Gene: CG1901”

  • Scroll down to map and notice two isoforms:

Step 2 gene model2
Step 2: Gene model

  • We now have a gene model (two exon gene, two isoforms).

  • We will annotate isoform A since it is the largest. Due to time constraints, our policy so far is to have students pick and annotate only one isoform for each feature.

  • If more than one isoform exists, pick the largest or the one with the most exons

  • Here student should choose to annotate isoform A (largest)

  • All isoforms should be annotated eventually

Step 3 investigate exons
Step 3: Investigate Exons

  • Given we need to annotate isoform A, we need exon sequence for exon 1 and 2, so we do BLASTX search

  • Click on [Peptideinfo] for isoform A on right just above map

  • Scroll down to find peptide sequence with exons in different colors:


Step 3 investigate exons1
Step 3: Investigate Exons

  • Start with exon 1

  • We will use a varient of the BLAST program, called blast2seq. This version compares two sequences instead of comparing a sequence to a database

  • Best to search entire fosmid DNA sequence (easier to keep track of positions) with the amino acid sequence of exon 1

Step 3 investigate exons2
Step 3: Investigate Exons

  • Create 3 tabs in Safari

  • In the first tab, go to the goose browser chr10 of virilis; click the DNA button, then click “get DNA”

  • In the second tab, go to and get the peptide sequence for the melanogaster mav gene

  • These first two tabs now have the two sequences you are going to compare

  • In the third tab go to NCBI blast page and click on “Align two sequences (bl2seq)”

Step 3 investigate exons3
Step 3: Investigate Exons

  • Copy and paste the genomic sequence from tab 1 into sequence box 1 of tab 3

  • Copy and paste the peptide sequence of exon 1 from tab 2 into sequence box 2

  • Since we are comparing a DNA sequence to a protein we need to run BLASTX

  • Turn off the filter

  • Leave other values at default for now

  • Click “align” button to run the comparison

Step 3 investigate exons4
Step 3: Investigate Exons

  • No significant homology found

  • Either the mav ortholog is not in this fosmid (unlikely given the original blastp hit) or this exon is not well conserved

  • Lets look for similarities of lower quality

  • Click the back button to go back to the bl2seq page

  • Change the expect value to 1000 and click align

Step 3 investigate exons5
Step 3: Investigate Exons

  • We have a weak alignment (50 identities and 94 similarities), but we have seen worse when comparing single exons from these two species

  • Notice the location of the hit (bases 16866 to 17504) and frame +3

Step 3 investigate exons6
Step 3: Investigate Exons

  • A similar search with exon 2 sequences gives a location of chr10:18476-19744 and frame +2

  • For larger genes continue with each exon, searching with bl2seq (adjusting e cutoff if necessary) and noting location and frame of region of similarity

Step 4 create gene model
Step 4: Create Gene Model

  • Pick ATG (met) at start of gene, first met in frame with coding region of similarity (+3)

  • For each putative intron/exon boundary compare location of BLASTX result with gene finder results to locate exact first and last base of the exon and check that the intron starts with “GT” and ends with “AG”

  • Exons: 16515-17504; 18473-19744

  • Intron GT and AG present

Step 4 confirm gene model
Step 4: Confirm Gene Model

As a final check we need to create the putative mRNA, translate it and make sure the protein we get out is similar to expected:

  • Enter coordinates for each exon in browser

  • Click “DNA” button at top then “get DNA”

  • Copy the sequence into a text file

  • Repeat for each exon, adding DNA to file

  • Go to

  • Enter your entire sequence, hit “Translate Sequence”; should get one long protein

  • Compare the protein sequence to ortholog using bl2seq

Step 4 confirm model future
Step 4: confirm model (Future)

  • We have a web page under construction which will simplify confirmation

  • This web site will double check intron- exon boundaries, translate the putative message and create a data file suitable for uploading


  • Some exons are very hard to find (small or non-conserved; keep increasing E value to find any hits (10,000,000 not unheard of)

  • Donor “GC” seen on rare occasions

  • We have seen one example where the only reasonable interpretation was that an intron had moved (out of about 70 genes)

  • Without est and expression data you may get stuck; use your best judgment

Gene function
Gene Function

  • In addition to annotation of the genes we ask the students to look into the function of each gene and discuss what they found in their final paper on annotation

  • For genes in Drosophila the best source to begin your investigation into gene function is the drosophila online database called Flybase.


Flybase gene info
Flybase gene info

  • Search for gene name

  • Will find links to info pages with many helpful references

  • Remember many genes have functions assigned based only on similarity data

  • This is especially true for anonymous genes “CG#####”. Take any functional assignment with large amounts of skepticism, consider it a guess at best