1 / 15

WP 12 CoL Taxon Placement Service (piping tools) Viktor Didziulis Kwok Yin Cheung

WP 12 CoL Taxon Placement Service (piping tools) Viktor Didziulis Kwok Yin Cheung Species 2000 Office, University of Reading. i4Life Cambridge Meeting, 4th March 2011. Aim.

airell
Download Presentation

WP 12 CoL Taxon Placement Service (piping tools) Viktor Didziulis Kwok Yin Cheung

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP 12 CoLTaxon Placement Service (piping tools) Viktor Didziulis Kwok Yin Cheung Species 2000 Office, University of Reading i4Life Cambridge Meeting, 4th March 2011

  2. Aim To improve quality of the taxonomic backbone of the i4Life partner databases as well as external biodiversity projects by implementing sharing and review of species names and their taxonomy among the Global Biodiversity Programmes (GBPs), Global Species Databases (GSDs) and the Catalogue of Life (CoL) i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  3. Objectives • Implement taxa and species names placement work-flow by providing tools for: • Accepting submission of lists of species names that ARE NOT IN the CoL database • Assigning them to the relevant GSDs for inclusion into their taxonomic databases • Providing simple web based user interface for name list uploads, downloads, reporting placement status, commenting on species and taxa names, displaying work-flow statistics and progress i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  4. Process of CoL piping GBPs WP11 CoL Cross-mapping tools WP4 CoL Download Service UI & Reports on name inclusion Supply side pipeline Distribution side pipeline CoL CoL Assembly / QAW process Reception buffer database Processing Distribution to GSDs i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  5. Users • Global Biodiversity Programme (GBP data manager) - provides access to taxa-not-found-in-col list of names; • Taxonomist - deals with initial assignment of “hopeless” taxa names to GSD sectors • Global Species Database (GSD custodian) - decides what taxa to accept into own GSD database for subsequent incorporation into the CoL • Administrator - operates the whole system and supervises the workflow i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  6. Main Components of CoL Pipeline Buffer Schema i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  7. CoL Reception Buffer Database • taxa (master) table • Stores taxa names imported from GBPs upload (taxa_import temporary table) • Stores brief communications between GBPs and GSDs • Links up taxa names with right GSDs • Add missing taxa • Assigned entries to be downloaded by the GSDs i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  8. CoL Reception Buffer Database • gsds_taxa_assignment table: • Information in this table is extracted from the CoL • It serves as a look up table between GSDs and their taxonomic expertise i.e. top points of taxa attachment • Vital role in the GSD assignment process • Some (multisector) GSDs may have more than one entry, as they cover more than one taxon group i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  9. CoL Reception Buffer Database • history table • logs actions of GBPs, GSDs, Administrator, Taxonomist and cron • action_type: GBP/GSD login/logout, upload/download, edit/assign, etc... • action_log: all the errors occuring during the action go there i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  10. CoL piping process GBPs names input Remove duplicate entries Add /edit missing higher taxa Taxonomist taxa table name in taxa table? yes no report back to GBP gsdstaxa assignment table Names to GSD mapping unassigned names find GSDs? no yes GSDs i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  11. Supply side pipeline • Either download list of names (NOT IN CoL) from GBP or GBP uploads the list; • The list should contain data on genus; species epithet; infraspecies; infraspecies marker; author; family; order; class; phylum; kingdom; provider hints; provided id • Mandatory minimum is genus and species epithet. All other fields are optional and will be used for GSD assignment purpose only. • Proposed format for data exchange – delimited text file (csv or tab delimited) i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  12. CoL piping (assignment of names to GSDs) • Use taxa_assignment table to match new name to GSD by phylum, class, order or family; • If new name cannot be matched up with any GSD (i.e. no taxonomy supplied), use genus names in CoL to match up with GSDs • If name matching by genus fails – assign name to the Taxonomist for manual processing • If a GSD for taxa does not exist (yet) – assign the taxa to the Taxonomist for creation of a proto-GSD i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  13. Distribution side pipeline • Present list of names to GSDs for download; • The list should contain data on genus; species epithet; infraspecies; infraspecies marker; author; family; order; class; phylum; kingdom; provider hints; • Mandatory minimum is genus and species epithet. All other fields are optional as GSD will decide where in the taxa tree the name needs to be placed • Two more fields will be filled in by GSD: gsd_comments (brief text explaining reason of name rejection) and gsd_status (PLACED or REJECTED) • Proposed format for data exchange – delimited text file (csv or tab delimited) that can be handled with spreadsheet software or text editor i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  14. Preliminary experiments Prototyping and proof-of-concept experiments were carried out using Webmin / Usermin server management utilities. However further development will be done in PHP. i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

  15. Timetable i4Life Cambridge Meeting, 4th March 2011, The University of Reading, UK

More Related