1 / 14

Naming conventions for ontology engineering

Naming conventions for ontology engineering. Daniel Schober, PhD The European Bioinformatics Institute (EBI) NET Project – Postdoctoral Ontologist www.ebi.ac.uk/net-project. Collaborative Efforts – Scenario. Metabolomics Standards Initiative (MSI) Describe metabolomics laboratory workflows

race
Download Presentation

Naming conventions for ontology engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics Institute (EBI) NET Project – Postdoctoral Ontologist www.ebi.ac.uk/net-project Daniel Schober, EMBL-EBI

  2. Collaborative Efforts – Scenario • Metabolomics Standards Initiative (MSI) • Describe metabolomics laboratory workflows • Minimal requirements, augmenting exchange formats • Ontology working group under OBI… • Ontology for Biomedical Investigations (OBI) • Larger collaborative, multi-domains effort • Brings together p various ‘omics’ and biomedical communities • Describe general laboratory workflow • Experimental Design, protocols, data analysis etc. • Developed under OBO Foundry… • Open Biomedical Ontologies (OBO) Foundry • Provides best practices for ontology engineering • Creates a complete suite of orthogonal and interoperable ontologies • Over 60 ontologies and ~10 core foundry Daniel Schober, EMBL-EBI

  3. Collaborative Efforts – Challenges • Create networked orthogonal ontologies • Integrating MSI ontology with OBI • Integrating OBI with BFO and other OBO-Foundry ontologies, e.g. • PATO (qualities), ChEBI (chemicals), … • Integrate modular developments • Parallel branch development • OWL-import, referencing • Improve the communication among developers • Database developers and biologists • Semantic web and text miners -> We need common naming conventions - To harmonize the appearance and design of modules Daniel Schober, EMBL-EBI

  4. Common Naming Conventions – Why? • Representational artefacts built according to different - Engineering methodologies • MethOntology, Tove, Enterprise, … • Engineering Tools • Protégé, OBO-Edit, OntoEdit, … • Representation languages and semantics • OBO, OWL and CLIPS-Frames, … - Engineering ‘schools’ and philosophies • GO, semantics web, AI (Protégé Frames), … • Manchester, Saarbruecken, Stanford, Trento, Karlsruhe, … • Realists, Conceptualists, … • As diverse as these backgrounds are the naming conventions applied ! • Diverse ad hoc ways to name what is represented Daniel Schober, EMBL-EBI

  5. ID convention uppercase prefix, underscore, number vs. lowercase prefix, colon, string or no name just ID string Separator Space vs. underscore vs. nil Case UpperCamelCase vs. underscore Namespace prefix Acronyms Synonyms Omissions Compound name Administrative helper classes Singular vs. Plural, xref Instance convention Daniel Schober, EMBL-EBI

  6. Existing Naming Conventions – Status • Semantic web best practices and deployment group web • Format specific: OWL • Limited visibility: information dispersed and embedded into many documents • BioPax manual • Limited visibility: naming conventions only implicitly dealt with in general documentation • Implementation specific: naming conventions discussed at implementation level (Protégé/OWL) • Limited coverage: IDs addressed marginally (page 53, Technical Notes RDF:ID), no conventions on relations • GO developers style guide • Format specific: mainly OBO; has its own definition for namespace which differs from the one in OWL/semantic web • Limited visibility: naming conventions dispersed throughout websites, e.g. GO namespace, term names and identifiers are explained in different documents Daniel Schober, EMBL-EBI

  7. Existing Naming Conventions – Status • ISO-Standards • Information overflow: About 40 documents that contain closely related guidelines • Limited access: commercial • ANSI/ISO Z39.19-2005 • Semantics specific: Controlled vocabulary, e.g. about terms, not classes • Limited coverage: No term ID handling or versioning addressed • Law and order - Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy (FMA) S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006) • Scientific domain dependent: anatomy • Hardly visible: paper access • Acceptance and visibility is ‘limited’ to specific target community • We need universally applicable conventions Daniel Schober, EMBL-EBI

  8. Our Goals • Overcome diversity and fragmentation • Collect existing naming conventions • Make them accessible via repository • Review and compare • Create a single common document • Distil universally valid aspects for OWL and OBO • Ensure visibility for target domains • Move towards a common resource for the OBO Foundry groups • Provide best practice guidelines • Provide robust names for ontology classes • Not a ‘knowledge representation language’ for names, like e.g. HUGO does for gene symbols (awgTg(GBtslenv)832Pkw) • Engage in discussion with other groups • A two phases approach … Daniel Schober, EMBL-EBI

  9. Towards Common Naming Conventions • Phase 1: Straw man document • “Working towards naming conventionsfor use in controlled vocabulary and ontology engineering” • See Bio-Ontologies SIG Proceedings, p. 29-32 • Created for MSI Ontology WG, targeting the larger OBI group • Implementation and format independent • Phase 2: Survey OBO Foundry groups • Questionnaire (work in progress) • Ontology and engineering process • Current practice in naming entities • Envisioned benefits of common conventions • In depth questions on particular conventions • Results to be posted under OBO Foundry wiki Daniel Schober, EMBL-EBI

  10. Naming Convention Straw Man - Examples • Explicit and concise names • Avoid omissions and ellipses • Plant Ontology (PO) used 'cell' for 'plant cell' • Avoid negative names like ‘non-separation device’ • Avoid ambiguous words • 30 meanings of ‘set’; e.g. plurality ‘protocol set’ or action ‘parameter set’ • Brand name convention: use [company name+brand name+superclass] • ‘US 2’ becomes ‘Bruker US 2 NMR magnet’ • To ensure shared understanding of intended meaning • Typographical issues • Use lowercase as in natural language • most flexible, e.g. ‘pH’, ‘DNA_hybridisation’ (no acronym boarder problems) • Avoid punctuation, sub/superscripts • Resolve special characters consistently, e.g.  ->alpha  To ensure readability, reduce diversity in appearance Daniel Schober, EMBL-EBI

  11. Naming Convention Straw Man - Examples • Lexical issues • Reuse words and avoid synonyms within compound names • ‘x_part_of_process’, ‘y_part_of_process’ and ‘z_part_of_process’ instead of ‘x_component_of_process’, ‘y_portion_of_process’, ‘z_part_of_process’  To decrease learning- and search-burden on user side, to ease text mining by reducing string variability • Use underscore or space separator (instead of CamelCase) • prevents distortions like ‘CapNMRProbe’ and ‘pHValue’, yet allows brandnames like ‘SampleJet’ • To ease text mining and readability (demarked word borders) • Use singular nominal word form • Avoid inconsistencies like ‘biphenyl’ (CHEBI:17097) under a IUPAC required ‘biphenyls’ (CHEBI:22888)  To harmonize appearance, to avoid redundancy, to ease ontology cross-referencing and import Daniel Schober, EMBL-EBI

  12. Common Naming Convention – Open Issues • Syntactic issues • Qualifier order: put the qualifier term before the part being qualified ? • ‘NMR_instrument’ in place of ‘instrument_for_NMR’ • ‘Helper’ strings in class names: establish general ones ? • E.g. ‘sensu’ postfix in GO to indicate species specificity, ‘fruiting body development (sensu Bacteria)’ (GO:0030583) • Semantic issues • Administrative ‘helper’ classes: how to name these metadata bins ? • unclassified (OBI_200067), ChEBI_objects (OBI_336), toBeDiscussed, _collected_relations • Identifiers and namespace: are conventions useful ? • OBI uses [group prefix+underscore+unique number], e.g. OBI_334 • BFO uses [meaningful string], e.g. IndependentContinuant Daniel Schober, EMBL-EBI

  13. … we further envision … • Facilitated access to ontologies through meta-tools • Reducing diversity with which ontology libraries and tools have to cope with, e.g. OLS, BioPortal, PROMPT and text mining tools • Facilitating ontology integration and cross-referencing • Comparison, alignment (OWL-import) and mapping • Serving as guideline for new communities Common Naming Convention - Benefits • Communication has improved … • In geographically distributed, collaborative efforts • Between developers from different domains and backgrounds • Appearance of what we represent has been normalized - Not just a matter of aesthetics - Manoeuvring within the hierarchy became faster Daniel Schober, EMBL-EBI

  14. Acknowledgements and Resources • Authors and those contributing to the discussion • Susanna-Assunta Sansone, Philippe Rocca-Serra, Suzi Lewis, Waclaw Kusnierczyk, Barry Smith, Chris Mungall, Jane Lomax, Robert Stevens, Frank Gibson, Luisa Montecchi-Palazzi, Dietrich Rebholz • Members of MSI, PSI, OBI groups and OBO Foundry coordinators • http://msi-ontology.sf.net • http://psidev.sf.net • http://obi.sf.net • http://obofoundry.org • Further info • “Working towards naming conventionsfor use in controlled vocabulary and ontology engineering”,Bio-Ontologies SIG Proceedings, p. 29-32 • Funding sources (supporting my work) • UK BBSRC e-Science BB/D524283/1 and BB/E025080/1 • Semantic Mining NoE (visits to IFOMIS and Manchester) Daniel Schober, EMBL-EBI

More Related