Aligning the Parasite Experiment Ontologyand the Ontology for Biomedical InvestigationsUsing AgreementMaker Valerie Cross, CosminStroe XuehengHu, PramitSilwal, MaryamPanahiazar, Isabel F. Cruz, Priti Parikh, AmitSheth firstname.lastname@example.org July 29 , 2011 ICBO @ Buffalo NY
Outline • Task: Align PEO and OBI Ontologies • OAEI Investigation • AgreementMaker Overview • Enhancements to AgreementMaker • Experimental Results • Conclusions and Future Work
Parasite Experiment Ontology (PEO)http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology • models provenance metadata associated with experiment protocols used in parasite research. • extends the upper-level Provenir ontology (http://knoesis.wright.edu/provenir/provenir.owl) • PEO (v 1.0) includes Proteome, Microarray, Gene Knockout, and Strain Creation experiment terms along with other terms that are used in pathway. • 110 classes & 27 properties, uses concepts in Parasite Life Cycle ontology Snapshot of PEO
Ontology for Biomedical Investigations(OBI)http://purl.obolibrary.org/obo/obi • describes biological and clinical investigations. • includes a set of 'universal' terms applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. • support the consistent annotation of biomedical investigations, regardless of the particular field of study. • represent the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it. • being built under the Basic Formal Ontology (BFO).
Ontology Alignment Evaluation Initiative (OAEI) http://oaei.ontologymatching.org • Annual international competition to evaluate ontology alignment techniques with multiple tracks • Benchmark tests • Biomedical track (Mouse and NCI Human Anatomies) • Conference track (15 ontologies) • “Side effect” of the competition are published ontology sets consists of two ontologies and correct mappings as determined by experts • Results measured by • Recall, precision, and F-measure (combines recall and precision) • Runtime • Other
OAEI Anatomy Track • #1 The matcher has to be applied with its standard settings. • #2 An alignment has to be generated that favors precision over recall. • #3 An alignment has to be generated that favors recall over precision. • #4 A partial reference alignment has to be used as additional input.
AgreementMaker - OA SystemUniv. of Illinois Chicago, ADVIS Lab, Dr. Isabel F. Cruz and CosminStroe • Motivation • Automatic methods are required to match large ontologies • Several features of the ontologies have to be considered • Users need to trust the mappings and to be directly involved in the loop • System’s capabilities • Wide range of matching methods • Capability to smartly combine multiple strategies • Multi-purpose user interface to allow evaluation and manual interaction with the matchings • Extensible architecture to allow reuse and composition of the matching modules
Existing Matchers • First layer (conceptual) • BSM (Basic Similarity Matcher) • PSM (Parametric String-Based Matcher) • ASM (Advanced Similarity Matcher) • VMM (Vector-based Multi-term Matcher) • Second layer (structural) • DSI Descendent Similarity Inheritance • SSC Sibling Similarity Contribution • Third Layer (aggregation) • LWC Linear Weighted Combination
Lexicon Extensions to Matchers • AgreementMaker version 0.22 extended these string-based matchers by integrating two lexicons (2010 OAEI): • the Ontology Lexicon, built from synonym and definition annotations existing in the ontologies themselves, and • the WordNet Lexicon, created by starting with the ontology lexicon and adding any non-duplicated synonyms/definitions found in WordNet • Result: BSMlex, PSMlex, and VMMlex.
Initial Experiments • AgreementMaker (ver. 0.22) with the OAEI 2010 anatomy configuration resulted in only two mappings • Found inconsistency in entity descriptions of PEO and OBI. • Identifiers: PEO URIs use a textual fragment identifier (http://knoesis.wright.edu/ParasiteExperiment owl#transfection), while OBI's entities use numerical identifiers (e.g., http:// purl.obolibrary.org/obo/OBI_0600060). • Labels: PEO's use of the rdfs:label field (on 19.1% of classes) does not follow the specification guidelines since it contains a PLO identifier. OBI uses the rdfs:label field to contain a descriptive string on almost 100% of its classes. • Comments: PEO uses on 99% of its classes and provides a definition. OBI only uses the comment field on about 4% of its classes. • Some common annotations exist between PEO and OBI BUT either PEO or OBI has low coverage • OBI has high coverage for label annotations • PEO has high coverage for comment annotations. • This heterogeneity and matchers matching the same annotations to each other (i.e., class ID with class ID, label with label, etc.) resulted in almost no alignment.
Annotation Profiling • allow the user to select and combine different annotations of the source or target ontology to be used in the alignment process.
Customization of Lexicon Matchers • The lexicon builders for BSMlex, PSMlex, and VMMlex lexicon use a fixed name for the synonym and definition annotations (hasSynonym and hasDefinition). • Lexicon builder modified to exploit the synonym annotations in PEO and OBI by having the user choose the annotation names used to create the lexicons. • OBI does not use hasSynonym but uses IAO annotation properties IAO 0000111 (“editor preferred term") and IAO 0000118 (“alternative term") which serve the same function as synonyms for the OBI. • The PEO does not use synonyms but uses the comment annotation for a definition in most cases. • Result: BSMlex+, PSMlex+, and VMMlex+.
BioPortal Mappings http://bioportal.bioontology.org/mappings http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology
Conclusions and Future Work • Experimental results in the biomedical domain demonstrate the problem of heterogeneous annotations of ontologies. • Validated past approach of extending matching algorithms using lexicons, showing the best results produced by matchers that use lexicons BSMlex+ • Investigate including more lexicons such as UMLS to achieve better result • Heterogeneity managed by increasing the flexibility of state of the art matching algorithms, i.e.,, annotation profiling, mapping provenance information and custom lexicons which supports a domain expert in this process • relies on the user to select relevant annotations to be used in the matching process. • More work needs to be done specifically to automatically identify semantically compatible annotations by applying established ontology evaluation metrics • Already have added a wide variety of semantic similarity measures to AgreementMaker for future use in semantic matching, not just lexical matching of concepts between ontologies. • .