1 / 19

VectorBase vectorbase

new genomes. new features. and future plans. Daniel Lawson (on behalf of VectorBase). VectorBase http://www.vectorbase.org. Kolymbari Meeting July 2011. VectorBase. University of Notre Dame. EMBL-EBI. IMBB. Harvard University. University of New Mexico. Imperial College, London.

blade
Download Presentation

VectorBase vectorbase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. new genomes new features and future plans Daniel Lawson (on behalf of VectorBase) VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  2. VectorBase University of Notre Dame EMBL-EBI IMBB Harvard University University of New Mexico Imperial College, London VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  3. VectorBase • Integrated genomic resource for arthropod vectors of human pathogens. • Funded by NIH-NIAID as part of four Bioinformatic Resource Centers (BRCs). • Collaboration of 3 European and 3 US Institutes. • VectorBase is: • Both service provider and content generator • A collator of genomic information • A genome annotation group (gene structure prediction) • A provider of tools for browsing and data mining vector genomes • A helpdesk for community queries • Responsible for data submissions to the public archival databanks • Committed to regular release cycle (5-6 releases per year) VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  4. Summary of current contents VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  5. VectorBase website • Release cycle has allowed for more frequent updates to the Ensembl browser • Includes support for presenting local data (GFF3, BAM, BED, (big)WIG & VCF files) • Updates/development for specific data types (e.g. PopGen, ontologies & search) • VectorBase site needs a style/technology make over • Aim to removing clutter from the site and improving user experience • Merging our Help wiki (FAQ, tutorials, newsletter, forum) into the main site • Advantages for site maintenance and flexibility for coming years • Now is the time to get in contact with comments, wish list items. Please contact VectorBase if you have comments about the current site, wish lists for the new site and if you want to be involved in user testing the new site. VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  6. Pre-sites for upcoming genomes VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  7. Pre-sites for upcoming genomes Browse Search VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  8. Supporting species without genomic resources Browse Search Genome De-linked Annotation Viewer VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  9. Supporting species without genomic resources VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  10. Updating annotation sets Community VectorBase • Submissions from community (CAP) • Previously as .xls file • Soon to also accept fasta and gff3 • DAS server for data presentation overhauled • Integration into reference gene set codified • Manual curation at Harvard/New Mexico • Priority is Anopheles gambiae • Provides QC for new gene builds • Final arbiter for issues arising from CAP • Move to Aedes aegypti in late 2011 The quality of the gene sets will improve faster if you, the community, play an active role in correcting gene predictions. Please contact VectorBase if you find an incorrect prediction or have data sets which can improve the gene set. VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  11. Updating annotation sets RNA-Seq • Aim: • Gene prediction using high-throughput transcriptome data a.k.a ‘RNA-seq’ • Overview • Alternative method for generating transcript-based gene predictions. • Uses Illumina or 454 reads as well as traditional Sanger sequenced ESTs • Relatively short read lengths makes intron-exon junction prediction hard countered by the very high volume of data generated (millions of reads) • Pipeline uses existing short-read algorithms for gene prediction: • tophat, cufflinks, scripture • Potential problems • Data sets require significant filtering and pre-analysis QC • Mis-calling of homopolymer runs in 454 data leads to data noise and mis-prediction of splice sites • Large data sets include many inappropriate splicing events (intron read through, NMD targets etc.) • Summary • Effective at finding UTR regions and validating/improving existing predictions • Vital for making sense of sequence based measures of gene expression VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  12. Projection build Updating annotation sets Projection from reference • Aim: • Gene prediction using ‘high’ quality reference set from a related species. • Overview • When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly. • This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. • Whole-genome alignment (WGA) between reference and target using BLASTz. • Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. • Project predictions through transformation of coordinates between reference and target assemblies. • Summary • Effective for low coverage and poor quality assemblies. • Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction. VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  13. Anopheles gambiae reference sequence • Many issues with the PEST assembly as a reference • S molecular form is proposed as the next reference Metrics of success Sanger* Hybrid assembly strategy • Project existing gene predictions • de novo prediction in novel regions • Re-map important datasets Illumina† 454 VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  14. Anopheles gambiae reference sequence Validation of the assembly by normal metrics Emphasis on the concordance with large scale restriction map (optical map) VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  15. Anopheles gambiae reference sequence VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  16. Upcoming genomes: Kolymbari 2013? NHGRI White papers Others Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Anopheles Anopheles darlingi* Anopheles stephensi Aedes Aedes albopictus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica i5K initiative ... ? Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  17. Notices • 2nd round of Driving Biological Projects solicitation • 2 years funding at $300K per year maximum • 2 page letters of interest by August 1st • Invited full proposals by November 1st • http://www.vectorbase.org/Other/News/?id=140 • Hiring an outreach position at Notre Dame • Details on the University of Notre Dame website • http://www.vectorbase.org/Other/News/?id=145 VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  18. Contact VectorBase at info@vectorbase.org VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

  19. Acknowledgements • V • EMBL-EBI Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey Imperial College Fotis Kafatos Bob MacCallum George Christophides Seth Redmond Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo NoTre Dame New MexicO Maggie Werner-Washburne Phil Baker HaRvard Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell IMBB Kitsos Louis Pantelis Topalis Emmanuel Dialynas A Sequencers TIGR/JCVI WashU Broad Institute EnsEmbl VectorBase http://www.vectorbase.org Kolymbari MeetingJuly 2011

More Related