Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian - PowerPoint PPT Presentation

Faraday
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian PowerPoint Presentation
Download Presentation
Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian

play fullscreen
1 / 38
Download Presentation
Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian
397 Views
Download Presentation

Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. 2nd International Barcode of Life Conference 18 September 2007 The BARCODE Data Standard: Enabling Molecular Diagnostics for Biodivesity Robert Hanner, PhD Database Working Group Chair, CBOL Global Campaign Coordinator, FISH-BOL Associate Director, Canadian Barcode of Life Network Biodiversity Institute of Ontario, University of Guelph, Canada

  2. When Worlds Collide…

  3. The Infrastructure of Taxonomy • Collections and databases of specimens • Codes of Taxonomic Nomenclature • Compilations of taxonomic names • Data repositories (characters, gene sequences, images, trees) • Monographs • Floristic and faunistic surveys/inventories • Revisions • The (undigitized) Taxonomic Literature

  4. International Nucleotide Sequence Database Collaboration http://www.insdc.org/

  5. Roles of INSDan archival database/repository for nucleotide sequence Output of Project A Common access interface Users Output of Project B Output of Project C Assignment of a unique identifier (an accession number) to a sequence Standardization of data structure including data items and values

  6. Open Access and Full Text Resources

  7. Growth of INSD

  8. New tools for taxonomy DNA Barcoding The ability to compare genotype information across a huge range of organisms is a powerful tool

  9. Emerging Applications

  10. Validation demonstrates that a procedure is robust, reliable and reproducible. PCR amplification and DNA sequencing: • Are robust methods which produces successful results a high percentage of the time. • Are reliable methods that produce accurate results. • Are reproducible methods producing similar results each time a sample is tested.

  11. Manual Assembly Subjective interpretation?

  12. “Only [27%] of papers had a legitimate specimens examined section, with museum numbers for each voucher, and names of the museums where the specimens used in the study could be examined”

  13. Couplets Consisting of:“Species Name - DNA Sequence” Basis of a “look-up table” enabling molecular diagnostic applications However, both elements are assertions Underlying specimens and associated raw sequence data are not typically available for secondary inspection

  14. Problem Areas TRANSPARENCY AND TRACEABILITY • Genetic Data Quality • Specimen Data Quality • Taxonomy • Information Access

  15. First International Barcode of Life Conference: Feb 5-8, 2005

  16. Barcoders began calling for a Paradigm Shift

  17. Rationale for Defining “BARCODE” keyword in GenBank • Provides the community with reference records with verifiable and retrievable data: • Associated with retrievable voucher specimens (liberally defined: tissue, DNA, etc.) • Linked to on-line metadata • Meet an agreed upon standard of taxonomic identification • Provide an assured level of data completeness • On an agreed upon gene region • Recommended for use in identifying unknowns

  18. The Barcode Data Standard • Establishing a new data standard for “BARCODE” keyword records in DDBJ/EMBL/GenBank: • Minimum 500bp, <1% ambiguous base calls • Double stranded sequence • Trace files and associated quality scores • Primers used to generate sequence • Linkages to: • A morphological voucher specimen • Structured reference to collections • Geospatial reference information • Valid species name • Who performed the identification • Literature citations

  19. Features, Qualifiers and Values The Feature table is updated based on discussions at the International Collaborators meeting of INSDC

  20. BARCODE Records (without trace files)

  21. NCBI Barcode Submission Tool in Beta Test Phase

  22. NCBI Barcode Submission Tool in Beta Test Phase Since 2005, better software, more sequences, better links to museum vouchers…

  23. BOLD HOMEPAGE- External Feeds

  24. Detailed View

  25. Detailed View

  26. Process Record

  27. Trace File Browser

  28. NCBI Trace Archive accepts BARCODE as experimental strategy

  29. Triplet structure for specimen identifiers /specimen_voucher=“<institution-code>|<collection-code>|<specimen-id>” <institution-code> - abbreviation of the archiving institution <collection-code> - collection within the institution (possibly null) (*) <specimen-id> - specimen identifier within the collection The above approach is used in the DarwinCore/GBIF and is parallel to the Life Science Identifier (LSID) that is an Object Management Group (OMG) standard. (*) museums herbaria culture collections stock centers germplasm repositories (seed banks) frozen tissue banks zoos/aquaria/botanical gardens DNA banks personal collections e-voucher archives

  30. CBOL/GBIF/NCBI Collection Registry

  31. CBOL/GBIF/NCBI Collection Registry

  32. Structured Reference to Vouchers

  33. LinkOut to Collection Catalogs

  34. And the NCBI Trace Archive

  35. Summary • INSDC is an archival genetic database in the public domain • BOLD is a public/private workbench for assembling BARCODE compliant projects & supports the organization of barcode campaigns • BOLD and GenBank continue to develop routines for synchronization and interoperability • As of this Meeting, the BARCODE Data Standard is Ready for Full Implementation!

  36. Acknowledgments: • All Participants of the CBOL Database Work Group • Scott Federhen, NCBI • Donal Hobern, GBIF • Scott Miller, Smithsonian Institution • David Schindel, CBOL • Sujeevan Ratnasingham, Biodiversity Institute of Ontario