1 / 20

Daniel Schober , Ilinca Tudose, Vojtech Svatek, Martin Boeker

OntoCheck – Verifying Naming Conventions in Protégé 4 OR Fulfilling the 4.th commandment: “You shall not make wrongful use of names”. Daniel Schober , Ilinca Tudose, Vojtech Svatek, Martin Boeker. Diversity in Naming Conventions.

warren
Download Presentation

Daniel Schober , Ilinca Tudose, Vojtech Svatek, Martin Boeker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OntoCheck – Verifying Naming Conventions in Protégé 4OR Fulfilling the 4.th commandment: “You shall not make wrongful use of names” Daniel Schober, Ilinca Tudose, Vojtech Svatek, Martin Boeker

  2. Diversity in Naming Conventions

  3. Benefits of Consistent Naming • Increase consistency, accuracy & clarity of labels • Normalize appearance • Reduce diversity with which meta-tools have to cope with • Ease text mining • Ease ontology mapping & alignment • NCBIO Portal • Lexical Owl Ontology Mapper (LOOM)

  4. OBO Foundry Naming Conventions

  5. The OntoCheck Protege 4.1 plugin • Clean-up checks on RUs • Checks on Naming Conventions • Lexical harmonization and labeling enforcement • Checks on Metadata • Completeness and cardinality checks on mandatory and obligatory annotation properties • Checks can be stored, shared & reused • e.g. before each ontology release • Quantifies found violations

  6. Typographical Checks • Word Case • CamelCase, camelHump, lower Case Start, Upper case start, all lower case, ALL UPPER CASE • Word Separator • none, space, hyphen, underscore, dot • Digits • Check for numerics in labels • Look for cardinality and order indicators

  7. Lexical Checks • Regular Expressions • Check on specified Affix pattern • e.g. Role subclasses have ‘role‘ -postfix • Avoid Boolean operators in labels • Check on ‘and’, ‘ or’, ‘non’, ‘anti‘, ‘dis’ • Check on metalevel postfixes • e.g. ‘ class ‘, ‘type’, ‘concept’, ‘relation’ • Check for punctuation • e.g. dots hint for abbreviations • Character & word count • Check for potentially unclear names • Alert on labels >4 characters • Alert on unreadable names >50 characters

  8. CheckTab: MixedCaseConvention Test For all Thing subclasses check if they are CamelCase for OWLClassName RU

  9. CheckTab: PostfixInclusion Test For all QualityRegion subclasses check if they contain a ‚Region‘-postfix

  10. CheckTab: Cardinalityenforcement For all Thing subclasses check for presence of labels (Min Card=1) Save & Load Checks

  11. CompareTab: Label equalsClassName Test For all Thing subclasses check if ClassName equals rdfs:label (ignore separator & case)

  12. StatisticsTab

  13. OntoCheck Test CasesDiscoveredQuantifiedViolations

  14. OntoCheck current Storage Format Ugly but simple txt file written in P4 inst. Dir. check-name:: QualityRegionContainsRegion panel:: checkPanel checkCombo:: 0 checkRegexText:: Region checkRB:: 2 check-name:: ThingDoesntContainNon panel:: checkPanel checkCombo:: 0 checkRegexText:: Non checkRB:: 1 check-name:: ThingHaveRdfs:labelValueForAllClasses panel:: checkPanel checkCombo:: 6 checkRegexText:: your regex here checkRB:: 0 check-name:: ThingCamelCaseOWLClassNameNC panel:: checkPanel checkCombo:: 0 checkRegexText:: your regex here checkRB:: 4 comboNamingType:: 1 cbWithDigits::

  15. CheckTab - Future Extensions • Check for naming clashes & redundancies • Classes with different IDs but equal labels • Check for plural word forms • Check on non-ASCII characters • α alpha • Check on redundant restrictions • Between own and inherited axiomatic class definitions

  16. Next Steps • Engage collaboration with OntologyDesignPatterns.org • Formalize ‚Naming ODP‘ Pattern • Correlate OntoCheck storage format with ‚Naming ODP‘ • Enable to load Checks from ODP • Pre-formalize sets of consistent Naming Conventions • E.g. for OBO Foundry compliance, Manchester Style, ISO … • Analyse and reuse LiLA framework for linguistic label analysis • Dominique Ritze, Johanna Völker, Christian Meilicke and Ondrej Svab-Zamazal. Linguistic Analysis for Complex Ontology Matching. Proceedings of the ISWC workshop on Ontology Matching (OM), 2010, http://code.google.com/p/lila-project/ • Apply to analyze naming structures & recommend fitting NC

  17. Conclusions • Enforces syntactic and lexical normalization • Render labels clearer to users & machines • Ease ontology cross-referencing and import • Ease Ontology mapping & alignment by reducing String variability • Avoid redundancy and inconsistencies • E.g. ‘biphenyl’ (CHEBI:17097) under a IUPAC required ‘biphenyls’ (CHEBI:22888) • Helps enforcing metadata completeness • Helps in Quality assurance and Quantification • OntoCheck Plugin in early stage • CheckTab already proves useful in multiple in-house efforts

  18. Resources & Acknowledgements Resources • OntoCheck plugin download • http://www.imbi.uni-freiburg.de/ontology/OntoCheck/ • OBO Foundry Naming Conventions & Questionnaire • http://obofoundry.org/wiki/index.php/Naming • Schober D. et al. (2009) Survey-based naming conventions for use in OBO Foundry ontology development. BMC Bioinformatics, Vol.10, Issue 1, 2009 Acknowledgements • This work was initiated and supervised by Daniel Schober, implemented and improved by IlincaTudose under additional guidance from Martin Boeker. Timothy Redmond helped solving Protégé API problems. • DS was supported by the Deutsche Forschungsgemeinschaft (DFG) grant JA 1904/2-1, SCHU 2515/1-1 GoodOD (Good Ontology Design) • IT was supported by the DebugIT EU Grant ICT-2007.5.2-217139

More Related