1 / 26

Ryutaro Ichise Principles of Informatics Research Division, National Institute of Informatics

Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information Science. Ryutaro Ichise Principles of Informatics Research Division, National Institute of Informatics Speaker : Shun-hong Sie

tamal
Download Presentation

Ryutaro Ichise Principles of Informatics Research Division, National Institute of Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity MeasuresIEEE/ACIS International Conference on Computer and Information Science Ryutaro Ichise Principles of Informatics Research Division, National Institute of Informatics Speaker: Shun-hong Sie Date:2009/10/19

  2. Outline • Introduction • Ontology Mapping Problem • Ontology Mapping as a Machine Learning Problem • Experimental • Conclusions

  3. Introduction • People use the internet to collect information as a decision mark tool. • To make vacation plans, user conduct research on the internet for suitable lodging, routers, and sightseeing spots. • Internet sites are operated by individual enterprise, which mean that we are required to check the sites manually in order to collect information.

  4. Introduction • In order to solve this problem, the Semantic Web is expected to become a next generation web standard that will be capable of connecting different data resources. • The semantics of the data are provided by ontologies for interoperability of the resources.

  5. The problem • Since ontologies cover a particular domain or use, it is necessary to develop a method to map multiple ontologies in order to increase the coverage of different domain or users.

  6. Ontology Mapping Problem • Defined • The ontology O contains a set of concepts, , that are organized into a hierarchy. Each concept is labeled by strings and can contain instances.

  7. Concept Instance Ontology Mapping Problem • How to find a concept in ontology B that corresponds to the concept in ontology A?

  8. Ontology Mapping Problem • To solve this problem, we think about the combination of concepts among different ontologies. • Defining the value of the combination pair. • Defining the value of pairs of concepts in a concept pair matrix.

  9. Ontology Mapping Problem • The value is 1 when the two concepts can be mapped and 0 when the two concepts cannot be mapped. • In this paper, we assume that the matrix value is binary.

  10. Ontology Mapping as a Machine Learning Problem • What type of information is available to compose the matrix? • Using a string-matching method, such as concept name matching, and other methods. • Single similarity measure is insufficient for determining the matrix.

  11. Ontology Mapping as a Machine Learning Problem • Diversity of ontologies.

  12. Ontology Mapping as a Machine Learning Problem • Define matrix values by using multiple similarity values of the concepts. • Therefore, we can convert the ontology mapping problem into a machine learning problem by using this framework.

  13. Concept Similarity Measures • String-based similarity • Graph-based similarity • Instance classification similarity • Knowledge-based similarity

  14. Concept Similarity Measures • In this paper • Word similarity • Word list similarity • Concept hierarchy similarity • Structure similarity

  15. Word similarity • Prefix • Eng. vs. England • Suffix • Phone vs. telephone • Edit distance • Calculate the similarity as a count of the string substitutions, deletions and additions. • N-gram • The word is divided into n number of strings, and the similarity is calculated by the number of same string sets.

  16. Word similarity • Knowledge-based similarity • Use WordNet(http://wordnet.princeton.edu/) as the knowledge resource for calculating the similarity. • WordNet is organized with synsets. • Synset • Wu & Palmer • description • Lin

  17. WordNet search result

  18. WordNet search result

  19. Knowledge-based similarity • Synset • To calculate the shortest path of the different word pairs using synsets. • Wu & Palmer • Use the depth and the least common super-concept (LCS) of words.

  20. Knowledge-based similarity • description • Utilizes the description of a concept in WordNet. The similarity is calculated as the square of the common word length in both descriptions of the words. • Lin

  21. Word List Similarity • Word similarity measures is not applicable to a word list such as “Food_Wine.” • We define two types of similarities: • maximum word similarity • word edit distance. Social, Sci. Pyramid Edit distance=? Edit distance=1 Social, Science Pyramid, Theory

  22. Concept Hierarchy Similarity Ontology A Ontology B

  23. Experimental Evaluation • Ontology Alignment Evaluation Initiative (OAEI) 2007 data set. • Google、Yahoo、Look-smart • Includes 4639 pairs of ontologies written in OWL format, with 2265 pairs of the 4639 pairs are correctly matching answers, which are positive examples, and 2374 pairs are incorrectly matching answers, which are negative examples. • Use 10-fold cross-validations. • Implemented system called Malform-SVM (Machine learning framework for Ontology Matching using SVM)

  24. Experimental Results

  25. Experimental Results

  26. Conclusions • Use multiple similarity measures. • Need to investigate the trade-off between the man-hours required for making mapping examples with the performance improvement system in the future.

More Related