html5-img
1 / 16

BARCODE SEQUENCE DATAFLOW INTO GENBANK

BARCODE SEQUENCE DATAFLOW INTO GENBANK. Ilene Mizrachi November 30, 2011 Fourth International Barcode of Life Conference. Barcode Project -2003 and beyond. Barcode of Life project was initiated at in 2003 INSDC would be the repository for raw and assembled sequence data

dyllis
Download Presentation

BARCODE SEQUENCE DATAFLOW INTO GENBANK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BARCODE SEQUENCE DATAFLOW INTO GENBANK Ilene Mizrachi November 30, 2011 Fourth International Barcode of Life Conference

  2. Barcode Project -2003 and beyond • Barcode of Life project was initiated at in 2003 • INSDC would be the repository for raw and assembled sequence data • INSDC adopts new source fields to accommodate Barcode metadata requirements • Barcode of Life Database (BOLD) established as a community workbench and sequencing center

  3. What is a Barcode? • A global reference library of DNA barcode sequences that is integrated with other systems of biodiversity information (e.g., databases of specimens, species, biogeographic information). • Mechanism to link DNA sequences to vouchered specimens and valid species names. • A reserved BARCODE keyword was adopted for data that met strict barcode standards

  4. Barcode Standard • Formally described species or a provisional label for an unpublished species • Voucher specimen identifier, preferably in a biorepository using a structured field • Country-Code using the controlled vocabulary used by GenBank; • Sequence from a gene region specified by the CBOL • COI for animals • matK and rbcL for plants • ITS for fungi • Contain at least 75% contiguous, high quality bases from within the approved region • Electropherogram trace files for bidirectional sequencing runs • Sequences of all forward and reverse primers • Strongly recommended data elements • GPS coordinates • Name of the identifier • Name of the collector • Date of collection

  5. Compliant Barcode Record

  6. Barcode records in GenBank

  7. Life of an iBOL Record

  8. Submissions from BOLD

  9. Data Sharing Works

  10. http://www.ncbi.nlm.nih.gov/WebSub/?tool=barcode

  11. QA checks at GenBank To ensure that the sequence data is of high quality, the following checks are run: • Barcode data element compliance • Consistency checks such as: • reported latitude-longitude falls within cited country • collection date has already occurred • Sequence quality checks

  12. Compliance tool

  13. Checking Sequence Quality • Trim primer sequences • Check congruence between fwd and reverse reads • Align sequences to check for gaps • Translate sequences to check for internal stops

  14. Updates Are Critical • Primary data repository – sequence records owned by submitter • Submitter is responsible for providing additional data and metadata as it becomes available: • Publication • Sequence • Taxonomy • Voucher • Third party updates are welcome!

  15. Challenges • If Reference Barcodes are to be used for species identification, phylogenetics, ecological forensics, conservation, and macro-analysis of biodiversity patterns, then the minimal requirement should be (a) high quality sequence (b) link to specimen and (c) taxonomic identification • Need to support rapid data release including preliminary taxonomic classifications similar to “Fort Lauderdale Principles” of genomics community • Data updated asynchronously at BOLD and in GenBank. Need to continue work on update channel • Need to work with communities to devise strict QA tests for plant and fungal Barcodes

  16. Acknowledgements • Taxonomy Group • Scott Federhen • Conrad Schoch • Lu Sun • Carol Hotton • DetlefLeipe • GenBank Group • Susan Schafer • Michael Fetchko • Software Support • Colleen Bollin • KamenTodorov • VasukiGobu

More Related