NIFSTD - A Comprehensive Ontology for Neuroscience Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe Neuroscience Information Framework, University of California, San Diego. INTRODUCTION.
Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe Neuroscience Information Framework, University of California, San Diego
As a core component of Neuroscience Information Framework (NIF) project (http://neuinfo.org), NIF Standard (NIFSTD) was envisioned as a set of modular ontologies that provide a comprehensive collection of terminologies to describe neuroscience relevant data and resources. The NIFSTD is a critical constituent in the NIF project to enable an effective concept-based search mechanism against a diverse collection of neuroscience resources. The overall ontology has been assembled in a form that promotes reuse of standard ontologies in biomedical domain, easy extension and modification over the course of its evolution. We present here on the structure, design principles and current state of NIFSTD.
STRUCTURE AND DESIGN PRINCIPLES
The NIFSTD is constructed according to the best practices closely followed by the Open Biological Ontologies(OBO) community . It was built in a modular fashion, each covering a distinct orthogonal neuroscience relevant domain. NIFSTD avoids duplication of efforts by conforming to standards that promote reuse. The modules are standardized to the same upper level ontologies, the Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO), and the Ontology of Phenotypic Qualities (PATO). Through the use of these foundational and generic ontologies, each of these modules was represented in a standardized manner. This approach not only follows the powerful modularization ontology design pattern (http://odps.sourceforge.net/), but can also be more easily extended to provide highly nuanced representations to meet the need of emerging neuroscientific research domains.
Table 2: Some of the primary annotation properties in NIFSTD
Representation of Concept Relations. NIFSTD utilizes the OBO-RO for specifying relationships between entities that are unambiguous, distinct, and constrained. Cross-domain concepts are related through a set of object properties specified in OBO-RO e.g., located in, contains, inheres in, participates in, etc. These relational properties mostly exist as inverse pairs—e.g., part of and has part . Use of the OBO-RO serves both to separate the representation of different types of relations (e.g., “is a” vs. “part of”) and to limit to proliferation of relation types. The former requirement is critical to enabling maximal algorithmic parseability of relations. We do not want the number of relations to be be overly expansive as each relation brings with it a computational burden.
Fig.3: NIFSTD Development/ Curation Workflow
4. Update NIFSTD (testing):This step involves updating the actual NIFSTD OWL files or creating new OWL files in testing environment based on the update of contents from previous steps.
5. Testing in OntoQuest: After each significant updates in the owl files, the NIFTD OWL implementation goes for OntoQuest testing in staging server for feedback.
6. Testing in BioPortal: After each significant updates in the owl files the NIFSTD OWL implementation is tested in BioPortal staging environment for feedback.
Keep persistent links to older versions: After positive feedbacks from Step 5 and 6, we archive the links to the old owl files and post the links to the NIF project wiki.
Tasks 8-13 involves updating the NIFSTD production version, updating the NIFSTD project wiki page with release notes with version
specific major changes and additions of the new contents in NIFTSD, Updating OntoQuest and BioPortal production versions.
CURRENT STATE OF NIFSTD
Fig. 2 : Exampleview of some mammalian thalamic brain regions in NIFSTD. a. Core “is a” hierarchy for “Regional part of diencephalon”; b. Partonomy of diencephalon computed using OWL ObjectProperties and restrictions that relate the regional part of thalamus to the thalamus. Only a portion of the classes covering thalamic entities is shown here.
Fig.3: Example of cross-domain relations that can be built among NIFSTD modules (NOTE: Current NIFSTD has yet to add expressed_in relations)
Table 1: Domains covered by NIFSTD, along with the vocabularies imported from external sources and the corresponding NIFSTD OWL module.
Representation Language. The NIFSTD ontology is expressed in Web Ontology Language (OWL). The current use of OWL for representing the NIFSTD semantic framework provides both the ability to employ current OWL and RDF tools to assemble and edit the ontology, as well as a means to support a rich semantic mining capability to NIF. NIFSTD holds to the OWL Description Logic (OWL-DL) dialect to ensure computational decidability and support of automated reasoning through the use of a common DL reasoners such as Pallet and Fact++. NIFSTD is also available in Wiki format at NeuroLex.org.
Re-use of Available Distilled Knowledge Sources. Wherever possible, existing terminologies and ontologies were reused to cover domains that were required by the Neuroscience community (see Table 1). These community vocabularies were culled from a variety of sources, ranging from fully structured ontologies to loosely structured controlled vocabularies. Table 1 highlights these source ontologies which were either imported directly or adopted into different NIFSTD modules. Refer to Table 2in  for a complete list of terminology resources that were used to construct NIFSTD with their URLs, abbreviation and reference for more information.
Distinct, Orthogonal Concept Domains. Each of the OWL modules in NIFSTD consists of a conceptually orthogonal or distinct domain ( see Table 1 and Fig.1). Orthogonality is one of the primary OBO Foundry principles critical to ensuring maximal re-usability of the ontology. The modularity helps minimize dependencies and ensure re-use by enabling users to accept only those domains they need for annotating. If an ontology contains one or more domains overlapping with an existing module, files must be mapped extensively to specify semantic equivalencies thus creating an added dependency and curatorial burden.
Single Inheritance. Each class within the NIFSTD modules follows single inheritance principle. This promotes the classes to be univocal and avoids ambiguities. However, classes with multiple parents can be derived via automated classification on defined classes i.e., asserted classes with logical necessary and sufficient conditions.
Viewing the NIFSTD Vocabularies. The NIFSTD vocabularies are available as owl files which may be viewed using Protégé or similar ontology tools. However, these tools generally require a fair amount of expertise to use. To create more human friendly viewing environments, NIFSTD is also available through NCBO BioPortal and also in a wiki format (http://neuroLex.org). Within the NIF, NIFSTD is served through an ontology management system called OntoQuest . OntoQuest generates an OWL-compliant relational schema and supports operations for navigating, path finding, hierarchy exploration, and term searching in ontological graphs.
NIFSTD and NeuroLex Wiki. We strive to balance between the involvement of the neuroscience community for domain expertise and knowledge engineering community for ontology expertise when constructing the NIFSTD. The wiki version of NIFSTD, the NeuroLex (http://neurolex.org) has been developed as the easy entry point for the broader community to access, annotate, edit and enhance the core NIFSTD lexicon. The peer reviewed contributions in the media wiki are later implanted in NIFSTD OWL modules in a regular basis. We envision NeuroLex wiki to be the main entry point to NIFSTD contents for the general users and domain experts to view, annotate and contribute to the overall lexicon. Please refer to the poster presentation by S.D. Larson et.al on NeuroLex.org for more details on NeuroLex and its wiki environment.
Fig.1: The semantic domains (in oval) covered in the NIFSTD with some of the subdomains (in rectangle). Each of the domains are covered by a separate OWL module (see Table 3)
NIFSTD DEVELOPMENT/ CURATION WORKFLOW
NIF is not charged with development of new modules but relies on community for new content. There are execeptions in the are of neuronal cell types where NIF is working with groups of neuroscientists to create a comprehensive list of neurons and their properties. NIF is, however, to provide extensions to existing ontologies, create restrictions within modules to describe things like partonomies and creating bridge file when appropriate to enhance search, e.g., neuron by brain region or neuron by molecule etc.
The Workflow. The current NIFSTD development/curation workflow includes the tasks mentioned in each of the rectangular boxes followed by a number as in figure 3:
Add/Edit NeuroLex Terms/Categories: This step involves various NIF users/ group who are interested to add, update, enhance, or annotate the current NIF vocabularies through NeuroLex. NeuroLex wiki serves as the main entry point/ collaborative interface for implementing changes in the NIFSTD ontology.
Bulk Upload of Terms: Depending on the number and nature of terms (i.e., adding new large sub-tree of an existing NIFSTD class, or new classes with known parents for a specific NIF module etc.), we can have bulk upload of terms that requires creating too many categories/pages in NeuroLex Wiki by hand otherwise. These requests can be made through a spreadsheet containing the terms with known parents and annotations.
Identify Valid Contribution: Identifying the valid contributions in the previous steps are determined by the NIF domain experts. Each contribution in the NeuroLex requires this step before they get implemented in the actual NIFSTD ontology. Valid contributions are identified based on certain criteria such as relevance to neuroscience, source, consistency, appropriateness of the hierarchy etc. For the newly added categories this step would make sure that the terms are actually new and not the synonymous or duplicates of the existing NIFSTD concepts.
Table 3: Current NIFSTD OWL modules with their persistent URLs (PURLs)
Fig.4: Simplified import hierarchy for current NIFSTD
Currently covering about 20,000+ concepts (including both classes and synonyms), the NIFSTD continues to evolve to incorporate new modules and contents as well as implementing more detailed and useful cross-domain relations that follow ontology development best practices. NIFSTD can be considered as an ideal example of how OBO Foundry principles can promote building comprehensive ontologies in a practical and effective way.
Bug WJ, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C, Laird AR, Larson SD, Rubin D, Shepherd GM, Turner JA, Martone ME. The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience. Neuroinformatics. 2008 Sep;6(3):175-94. Epub 2008 Oct 31. PMID: 18975148
Gupta A, Bug W, Marenco L, Qian X, Condit C, Rangarajan A, Müller HM, Miller PL, Sanders B, Grethe JS, Astakhov V, Shepherd G, Sternberg PW, Martone ME. Federated Access to Heterogeneous Information Resources in the Neuroscience Information Framework (NIF). Neuroinformatics. 2008 Sep;6(3):205-17. Epub 2008 Oct 29. PMID: 18958629
Supported by a contract from the NIH Neuroscience Blueprint HHSN271200800035C via NIDA.
ICBO 2009: NIFSTD Ontologies Neuroscience Information Framework http://neuinfo.org