1 / 35

David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution

The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC). David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.edu ; http://www.barcoding.si.edu

meira
Download Presentation

David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The BARCODE Data Standard:CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.edu;http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938

  2. Infrastructure of Taxonomy:Fragmented, Disconnected • Collections and databases of specimens • Seedbanks, culture/cell line collections • Compilations of taxonomic names • Floristic and faunistic surveys/inventories • Monographs, Taxonomic revisions • Data repositories (gene sequences, characters, images, trees) • The (undigitized) Taxonomic Literature

  3. Linking Logical Categories (1):Specimens, Names, Opinions

  4. Linking Logical Categories (2):Naming and defining species Holotype specimens

  5. Linking Logical Categories (3):Establishing species boundaries Species concept beyond holotype - Paratype series - Typological versus population thinking - Genetic lineages - BSC (hard to apply)

  6. Linking Logical Categories (4):Interpreting species boundaries • Other assigned specimens: • Species philosophy of original author • Interpretation of user

  7. Databases of Names, Specimens, Species Distributions Museum databases of associated data Databases of species occurrences and distribution (OBIS) Authority files of taxonomic names

  8. DNA Barcodes:A Key Variable for Biodiversity Informatics Museum databases of associated data Databases of species occurrences and distribution (OBIS) Authority files of taxonomic names

  9. CBOL’s Working Groups • Database: Designing/constructing the Barcode Section of GenBank • DNA: Protocols for formalin-fixed and old museum specimens; Producing LIMS for dissemination • Data Analysis: Beyond phenetic methods; population genetics perspective • (Plants: Initiated discussions of plant barcode gene region(s))

  10. BARCODE Data Standards • Consultations with GenBank, ITIS, museum database developers, GBIF, ISIS, from 2004 • Consensus results of Front Royal meeting • GBIF  ITIS  GRIN • NBII  Species2000  IPNI • ICZN  ZooRecord  OBIS • GenBank Proposed to International Nucleotide Sequence Database Collaboration (EMBL, DDBJ) • Approved by CBOL and INSDC mid-2005

  11. Reserved Keyword “BARCODE” • GenBank reviews records against standard • Adds keyword “BARCODE” in annotation field • Can be removed by CBOL

  12. Requirements • Species name selected from authority • Sequence from COI or other barcode region approved by CBOL • Structured link to voucher specimen • Online access to metadata • Trace files and quality scores • Primer sequences and names • Minimum sequence length (500bp for COI) • Geographic locality

  13. Recommended fields, added to INSDC at CBOL’s request • Latitude and longitude • Name of the identifier • Name of the collector • Date of collection

  14. New Data Fields Latitude/Longitude Collection date Collector’s name Identifier’s name

  15. BARCODE Keyword in GenBank

  16. BARCODE Records in INSDC Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.

  17. Structured link to Vouchers Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.

  18. What constitutes a voucher? • Long-term reference tied to BARCODE • Corroborates the species identification • Provides additional tissue • CBOL relies on community decisions: • Full specimen? • Parts for morphologic features (e.g., feather?) • Frozen tissue? • E-Vouchers for large specimens, destructive samples, catch-and-release?

  19. Where’s the voucher?

  20. Structured Voucher IDs Linking to Vouchers

  21. Voucher Specimen ID • Based on Darwin Core • Eventually will be replaced by GUID • Triplet: Institution Acronym : Collection : Specimen # NMNH : FISH : 123456 • CBOL, GBIF and NCBI discussing global registry of: • Institutional acronyms • Collection codes • “Pre-accession” specimen IDs

  22. Link to Species Names Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.

  23. Species names in INSDC

  24. NCBI Taxonomy BrowserThe good, the bad, and the ugly • Species names provided by submitters • Checked against compilations • Linkout to Catalogue of Life, other sources • Names not found added to Taxonomy Browser • Submitters informed of errors but not forced to make corrections

  25. NCBI Taxonomy Browser

  26. NCBI Taxonomy BrowserSome names have no other source

  27. Other names linked to GBIF and Catalogue of Life…

  28. …and primary data source

  29. Authoritative Species Lists • Catalogue of Life • Species lists compiled by barcoding projects • FISH-BOL from FishBase, CoF • MBI mosquito catalog • Nomenclators • NameBank • New names in publications • Eventually, central registries (e.g., ZooBank)

  30. Provisional Species ID • Uncertain identifications • Species complexes • Newly discovered variants • Ecogenomic samples • Need general guidelines to ensure: • Globally unique, • Stable, retrievable • Can’t be confused with valid species name

  31. BARCODE Records in INSDC Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.

  32. Connecting taxonomic articles Improving links to taxonomic journals

  33. Links to Taxonomic Literature • Library-Laboratory meeting in London, 2005, on electronic access to taxonomic literature • Led to formation of Biodiversity Heritage Library initiative • Proactive steps with PubMed to add taxonomic journals to online abstracts • Aggressive negotiation with publishers of barcoding papers • Involvement in Encyclopedia of Life

  34. Long-term data curationof BARCODE records Data records assembled Community feedback Compliant with BARCODE standards? Update records (audit trail of species names retained) Data records released on INSDC IDs consistent with other records? GenBank adds BARCODE flag CBOL control of BARCODE flag Data records published in BOLD

  35. Acknowledgements Robert Hanner, University of Guelph, Chair of CBOL’s Database Working Group Scott Federhen, NCBI Taxonomy Browser Donald Hobern, Head of Informatics, GBIF

More Related