1 / 11

Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al.

Summary of. Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick. Complexity of Eukaryotic Genomes. Complexity of genomic data: Transposons Both Strands of DNA may code.

lalo
Download Presentation

Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary of Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick

  2. Complexity of Eukaryotic Genomes • Complexity of genomic data: • Transposons • Both Strands of DNA may code

  3. Levels of Genome Annotation Quality Assessment • Base Level: A T C G T A C C C A T G YN N NY Y Y Y Y Y YN • Exon Level: • Whole Gene Level: • Whether all a gene’s exons are properly ID’d and assembled

  4. Impediments to Gene-Finder Quality Assessment • Underlying biology is still poorly understood • cDNA libraries must be very complete—often requires multiple passes to generate a complete library. *Diagram courtesy of University of Miami, http://fig.cox.miami.edu/~cmallery/150/gene/sf16x5.jpg

  5. Impediments to Gene-Finder Quality Assessment, cont’d • Even the most experienced experts make errors • Example: 4 “genes” were found to be untranslated regions • Genome Annotation Software often identifies genes that the experts missed

  6. Approaches to Locating Genomic Features • Comparison to cDNA libraries • Problem: Can only compare to existing libraries; cDNA libraries for target organism probably don’t exist • Highly effective, though • Protein homology (utilizing SwissPROT, BLAT, etc.) • Ineffective overall

  7. Approaches to Locating Genomic Features, cont’d • Hidden Markov Models: • Complex statistical analyses • Assign probabilities to nucleotides having certain functions (exon, intron, promoter, suppressor, etc.); compute probabilities in aggregate to determine functions of specific regions of the genome

  8. Promoters, Repeats • Identifying Promoters: • Site-specific identification (binding sites) • Statistical identification (similar to HMM) • Locate gene and then guess • Repeat Sequences • Must be able to identify even with point mutations, insertions/deletions, etc. • Useful for determining evolutionary significance

  9. And the Winner Is… • Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) • Genie—identifies fewer genes, but has fewer false positives

  10. Best Gene Annotation Programs, continued(Table from Reese, et al)

  11. Conclusions • Field is still in infancy • As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. • Researchers will rely on programs like Genie for annotations as quality improves. Illustration courtesy of Genbank, http://www.ncbi.nlm.nih.gov/Genbank/index.html

More Related