1 / 11

VectorBase genome annotation

VectorBase genome annotation. VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK. Overview of current annotation system. Assembled genome. Sequencing centre gene predictions. VectorBase gene predictions. Merge into canonical set. Protein analysis.

dieter
Download Presentation

VectorBase genome annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VectorBase genome annotation VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK VectorBase SWG 2006

  2. Overview of current annotation system Assembled genome Sequencing centre gene predictions VectorBase gene predictions Merge into canonical set Protein analysis Display on genome browser Release to GenBank/EMBL/DDBJ VectorBase SWG 2006

  3. Merging gene sets Gene set #1 Gene set #2 Reduce to single predictions per locus Compare exon/intron structures Identical structures Compatible structures Different structures Merge/Split structures Complex No Map Add isoform predictions based on EST/Peptide data Canonical gene set VectorBase SWG 2006

  4. Data types used for gene prediction/validation Protein sequences ‘Self’ (i.e. species to be predicted) Taxonomic splits of UniprotKB Transcript sequences mRNAs ESTs Microarray Evidence of expression SAGE tags Ditags MPSS Proteomics data Sequence statistics Coding potential Splice site prediction VectorBase SWG 2006

  5. Canonical predictions VectorBase gene prediction pipeline Blessed predictions Manual annotations Community submissions (Apollo) (Genewise, Exonerate, Apollo) Similarity predictions Species-specific predictions (Genewise) (Genewise) Protein family HMMs ncRNA predictions (Genewise) (Rfam) Transcript based predictions Ab initio gene predictions (Exonerate) (SNAP) VectorBase SWG 2006

  6. VectorBase curation database pipeline for manual/community annotation Community annotation (Community representatives) Manual annotation (Harvard) Curation warehouse db Chado-XML Chado-XML Apollo Chado Apollo Community annotation (in collaboration with Harvard) GFF3 Ensembl Gene build db VectorBase SWG 2006

  7. New gene build Overview of current re-annotation system Full gene build Partial Gene build Blessed genes Species-specific gene prediction Current gene set Compare Merge Updated gene set VectorBase SWG 2006

  8. Comparing new gene builds with the old one • Use of manual annotation for validation of automated gene build improvements • Simple statistics (CDS length, intron size, CDS matching TE’s) • BRC annotation metrics • Supporting evidence for a gene prediction (citation,expression,orthology) • Attachment of Standard Operating Procedures (SOPs) VectorBase SWG 2006

  9. Gene build schedules Full gene build • Triggers for re-annotation • Temporal • Data • New EST data for species • New genomes • Re-annotated genomes 4 months 1 month Partial gene build VectorBase SWG 2006

  10. VectorBase annotation capacity with increased number of genomes Gene builds per year per genome 2 full 1 full 1full 1 full 2 partial 3 partial 2 partial 1 partial 2 genomes Yes Yes Yes Yes 3 genomes Yes Yes Yes Yes 4 genomes No Yes Yes Yes 5 genomes No Yes Yes Yes 6 genomes No No Yes Yes 7 genomes No No No Yes 8 genomes No No No No VectorBase SWG 2006

  11. Re-annotation questions • Triggers for re-annotation • Strict temporal triggers • Always do a full gene build every year? • Data triggers • How much new data is enough? • Knock-on effects of related species (re)annotation? • Encouraging community submissions • How can we get more community annotation input? • Outreach at conferences (Roadshow) VectorBase SWG 2006

More Related