1 / 21

Requirements of a Taxonomy Database Tcl-DB a Prototype

Requirements of a Taxonomy Database Tcl-DB a Prototype. Outline Requirements Hierarchy Alternative Search Terms: Synonyms and Vernaculars Alternative Spellings Alternative Classifications Tcl-DB Prototype System Tcl-DB Structure 2NF Extensibile: Adding a new data source e.g. NCBI

turi
Download Presentation

Requirements of a Taxonomy Database Tcl-DB a Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Requirements of a Taxonomy DatabaseTcl-DB a Prototype

  2. Outline • Requirements • Hierarchy • Alternative Search Terms: Synonyms and Vernaculars • Alternative Spellings • Alternative Classifications • Tcl-DB Prototype System • Tcl-DB Structure • 2NF • Extensibile: Adding a new data source e.g. NCBI • Tcl-DB: UID Tracking • Tcl-DB: Stats • Utility and Further Work

  3. 1. Hierarchy

  4. 2. Alternative Search Terms: Synonyms and Vernaculars

  5. 3. Alternative Spellings: Caenorabditis elegans, C elegans and Caenorhabditis elegans  

  6. 4. Alternative Classifications:

  7. Tcl-DB Prototype System. Proposed Architecture

  8. Tcl-DB: Logical Structure

  9. Tcl-DB Physical Database Structure

  10. Assertion: Resolving the M:M with an association entity

  11. Node: Hierarchical Queries Nested Set, Path and Connect by >select count(name_id) from node start with name_id = ‘100891' connect by prior name_id = parent_name_id; >select count(name_id) from node where path like '/%'; >select count(name_id) from node where left_id between 1 and 9290;

  12. synonym_name and vernacular: subtypes,multi-valued attributes or weak entities

  13. Tcl-DB: 2NF

  14. Tcl-DB: Procedures, Packages and Functions: Adding a new data source e.g. NCBI

  15. Step 1: Build Views, what names are already in the database

  16. Step 2: Move names from view to Tcl schema

  17. Step 3: Fill the nodes table in tcl schema

  18. Step 4: fill synonym_name table in tcl schema Step 5: fill vernacular table in tcl schema

  19. Tcl-DB: UID Tracking • after name data load: • Run two joins on name and nids_mv • Nids – name_id when the name_text exist • Null – name_id when the name_text not exist • Update name and give all new names a NID • Update name give all names their original NID • Refresh the NID_view

  20. Tcl-DB: Utility and Further Work • Computing Interesting Stats: • How much overlap between ITIS and NCBI? • How many names unique to NCBI? • How many of these are binomials Vs ‘environmental sample 256’ • How many of these names can be matched allowing for 1 – 3 letter mismatches. • NCBI taxonomy – data quality, Integrity and Usability? • Transitively closing the Synonyms Table and Vernacular Table • Building an interface. • Spell checkers

  21. Lots of Questions?How do we use this to build taxonomically aware databases?How about updates to the data?Database links , Web services, Simple DB Cross References?Use Genbank Model?Open to Suggestions/Ideas!Do we need to think about:PhyloCode?Type Specimens?

More Related