1 / 31

Designing clustering methods for ontology building: The Mo’K workbench

Designing clustering methods for ontology building: The Mo’K workbench. Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu. INTRODUCTION. Paper objectives: Presentation of a workbench for development and evaluation of the methods that learn ontologies

lsamson
Download Presentation

Designing clustering methods for ontology building: The Mo’K workbench

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

  2. INTRODUCTION • Paper objectives: • Presentation of a workbench for development and evaluation of the methods that learn ontologies • Some experimental results that illustrate the suitability of the model in characterization of the methods of learning semantic classes

  3. INTRODUCTION • Ontology building general strategy: • Define a distance metric (as good an approximation for the semantic distance as possible) • Devise/use a classifying algorithm that uses the above distance to build the ontology

  4. Harris’ hypothesis • Formulation: Study of syntactic regularities leads to identification of syntactic schemata made out of combinations of word classes reflecting specific domain knowledge • Consequence: one can measure similarity using cooccurence in syntactic patterns

  5. Conceptual clustering • Ontologies are organized as acyclic graphs: • Nodes represent concepts • Links represent inclusion (generality relation) • The methods considered in this paper rely upon bottom-up construction of the graph

  6. The Mo’K model • Representation of examples: • Binary syntactic patterns of the form: <head – grammatical relation – modifier head>, where <modifier head> is the object, and the rest of the pattern is the attribute • Example: • This causes a decrease in […] • <cause Dobj decrease>

  7. Clustering • Bottom up clustering by joining classes that are near: • Join classes of objects (nouns or actions – tuples <verb, relation>) that are frequently determined by the same attributes • Join attribute classes that frequently determine the same objects

  8. Corpora • Specialized corpora used for domain specific ontologies • Corpora are pruned (rare examples are eliminated) – the workbench allows the specification of • Minimum number of occurences for a pattern to be considered • Minimum number of occurences for an attribute/object to be considered

  9. Distance modeling • Consider only distances that: • Take syntactic analysis as input • Do not use other ontologies (like WordNet) • Are based on distributions of the attributes of an object • Identify general steps in computation of these distances to formulate a general model

  10. Distance computation • Step 1: weighting phase • Modify the frequencies of elements in the contingency matrix using general algorithm: • Initialization of the weight of each example E: W(E) • Initialization of the weight of each attribute A: W(A) • For each example E • For each attribute A of the example • Calculate W(A) in the context of E • Update global W(E) • For each attribute A of the example • Normalization of the W(A) by W(E) • Step 2: similarity computation phase

  11. Distance evaluation • The workbench provides support for evaluation of metrics • The procedure is • Divide the corpus in training and test • Perform clustering on training • Use similarities computed on training to classify examples in the test and compute precision and recall – produce negative examples by randomly combining objects and attributes

  12. Experiments • Purpose: evaluate Mo’K ’s parameterization possibilities and the impact of the parameters on results • Corpora: two French corpora • One with cooking recipes from the Web – nearly 50000 examples • One with agricultural data (Agrovoc) – 168287 examples

  13. Results (Asium’s distance, 20% test)

  14. Recall rate • X-axis: the number of disjointed classes on which recall is evaluated

  15. Class efficiency • Class efficiency: ration between triplets learned and triplets effectively used in evaluation of recall

  16. Conclusions • Comments? • Questions?

  17. Ontology Learning and Its Application to Automated Terminology Translation Authors: Roberto Navigli, Paola Velardi and Aldo Gangemi Presenter: Ovidiu Fortu

  18. Introduction • Paper objective: • Present OntoLearn, a system for automated construction of ontologies by extraction of relevant domain terms from corpora of text • Present the usage of OntoLearn in the task of translating multiword terms from English to Italian

  19. The OntoLearn architecture • Complex system, uses external resources like WordNet and the Ariosto language processor

  20. The OntoLearn • New important feature: • Semantic interpretation of terms (word sense disambiguation) • Three main phases: • Terminology extraction • Semantic interpretation • Creation of a specialized view of WordNet

  21. Terminology extraction • Terms selected with shallow stochastic methods • Better quality if syntactic features are used • High frequency in a corpus is not necessarily sufficient: • credit card – is a term • last week – not a term

  22. Terminology extraction, continued • The comparison of frequencies in texts from different domains eliminates such constructs as “last week” – domain relevance score • Relevance of term t in domain Dk

  23. Terminology extraction, continued • Domain consensus of a term t in class Dk exploits the frequency of t across documents

  24. Terminology extraction, continued • A combination of the two scores is used to detect relevant terms • Only the terms with DW larger than a threshold are retained

  25. Semantic interpretation • Step 1: create semantic nets for every wk t and any synset wk by following all WordNet links, but limiting the path length to 3 (after disambiguation of words) • Step 2: intersect the networks and compute a score based on the number and type of semantic patterns connecting the networks

  26. Semantic interpretation, continued • Semantic patterns are instances of 13 predefined metapatterns • Example: • Topic, like in archeological site • Compute the score (Sik is sense k of word i in the term) for all possible pairs

  27. Semantic interpretation, continued • Use the common paths in the semantic networks to detect semantic relations (taxonomic knowledge) between concepts: • Select a set of domain specific semantic relations • Use inductive learning to learn semantic relations given ontological knowledge • Apply the model to detect semantic relations • Errors from the disambiguation phase can be corrected here

  28. Creation of a specialized view of WordNet • In the last phase of the process • Construct the ontology by eliminating the WordNet nodes that are not domain terms from the semantic networks • A domain core ontology can also be used as backbone

  29. Translating multiword terms • Classic approach: use of parallel corpora • Advantage: easy to implement • Disadvantage: few such corpora, especially in specific domains • OntoLearn based solution: • Use EuroWordNet and build ontologies in both languages, associating them to synsets

  30. Translation – the experiment • Experiment on 405 complex term in a tourism corpus • Problem: poor encoding of Italian words in EuroWordNet (fewer terms than in the English version – reduce to 113 examples) • Use semantic relations given by OntoLearn to translate: room service  servizio in camera

  31. Conclusions • Questions? • Comments?

More Related