1 / 19

Learning to Map between Ontologies on the Semantic Web

Learning to Map between Ontologies on the Semantic Web. AnHai Doan, Jayant Madhavan , Pedro Domingos, and Alon Halevy Databases and Data Mining group University of Washington. Semantic Web. Mark-up data on the web using ontologies Enable intelligent information processing over the web

drenee
Download Presentation

Learning to Map between Ontologies on the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Map between Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy Databases and Data Mining group University of Washington

  2. Semantic Web • Mark-up data on the web using ontologies • Enable intelligent information processing over the web • Personal software agents • Queries over multiple web pages …

  3. People Staff … … Staff Faculty Academic Technical Lecturer Professor Senior Lecturer Asst. Professor Professor Assoc. Professor James Cook PhD, U Sydney Name Education Data Instance Semantic Mapping An Example Find Prof. Cook, a professor in a Seattle college, earlier an assoc. professor at his alma mater in Australia www.cs.washington.edu www.cs.usyd.edu.au Semantic Mappings allow information processing across ontologies

  4. Semantic Web: State of the Art • Languages for ontologies • RDF, DAML+OIL,… • Ontology learning and Ontology design tools • [Maedche’02], Protégé, Ontolingua,… • Semantic Mappings crucial to the SW vision • [Uscold’01, Berners-Lee, et al.’01] Without semantic mappings…Tower of Babel !!!

  5. Semantic Mapping Challenges • Ontologies can be very different • Different vocabularies, different design principles • Overlap, but not coincide • Semantic Mapping information • Data instances marked up with ontologies • Concept names and taxonomic structure • Constraints on the mapping

  6. Sim(Fac,Staff) ? Academic Sim(Fac,Acad) Sim(Fac,Prof) Compute Similarity Satisfy Constraints Overview People Staff Staff Faculty Academic Technical Faculty Lecturer Professor Senior Lecturer Asst. Professor Professor Assoc. Professor Define Similarity

  7. Our Contributions • An automatic solution to taxonomy matching • Handles different similarity notions • Exploits information in data instances and taxonomic structure, using multi-strategy learning • Extend solution to handle wide variety of constraints, using Relaxation Labeling • An implementation, ourGLUE system, and experiments on real-world taxonomies • High accuracy (68-98%) on large taxonomies (100-330 concepts)

  8. A,S A, S A,S A,S Sim(Assoc. Prof., Snr. Lect.) = P(A,S) P(A  S) = [Jaccard, 1908] P(A,S) + P(A,S) + P(A,S) P(A  S) Defining Similarity Snr. Lecturer Assoc. Prof Multiple Similarity measures in terms of the JPD Hypothetical Common Marked up domain Joint Probability Distribution: P(A,S),P(A,S),P(A,S),P(A,S)

  9. S A S A No common data instances In practice, not easy to find data tagged with both ontologies ! United States Australia Solution: Use Machine Learning

  10. S A S A A S A,S A,S CLA CLS A S A,S A,S A,S A,S A,S A,S Machine Learning for computing similarities JPD estimated by counting the sizes of the partitions United States Australia

  11. Improve Predictive Accuracy – Use Multi-Strategy Learning Single Classifier cannot exploit all available information Combine the prediction of multiple classifiers A Meta-Learner CLA1 A A … A A CLAN A Content Learner Frequencies on different words in the text in the data instances Name Learner Words used in the names of concepts in the taxonomy Others …

  12. Compute Similarity Satisfy Constraints So far… Define Similarity Joint Probability Distribution Multi-strategy Learning

  13. Parents People Staff Children Next Step: Exploit Constraints • Constraints due to the taxonomy structure • Domain specific constraints • Department-Chair can only map to a unique concept • Numerous constraints of different types Staff People Staff Fac Acad Tech Prof Lect. Assoc. Prof Asst. Prof Prof Snr. Lect. Extended Relaxation Labeling to ontology matching

  14. People Staff Staff ? Acad ? Prof Prof Prof Snr. Lect. Lect. Assoc. Prof Asst. Prof Snr. Lect. Lect. Solution: Relaxation Labeling Find the best label assignment given a set of constraints Staff People Fac Fac Acad Staff Tech Prof Lect. Assoc. Prof Asst. Prof Prof Snr. Lect. • Start with an initial label assignment • Iteratively improves labels, given constraints • Standard Relaxation Labeling not applicable • Extended in many ways

  15. Mappings for O1 , Mappings for O2 Relaxation Labeler Generic & Domain constraints Similarity Matrix Similarity Estimator Similarity function Joint Distributions:P(A,B),P(A,B),… Distribution Estimator Meta Learner Distribution Estimator Learner CLN Learner CL1 Taxonomy O1 (structure + data instances) Taxonomy O2 (structure + data instances) Putting it all together GLUE System

  16. Real World Experiments • Taxonomies on the web • University classes (UW and Cornell) • Companies (Yahoo and The Standard) • For each taxonomy • Extracted data instances – course descriptions, and company profiles • Trivial data cleaning • 100 – 300 concepts per taxonomy • 3-4 depth of taxonomies • 10-90 average data instances per concept • Evaluation against manual mappings as the gold standard

  17. Results University I University II Companies

  18. Related Work • Our LSD schema matching system [Doan, Domingos, Halevy ’01] • GLUE handles taxonomies, richer models, and a much richer set of constraints • Other Ontology and Schema Matching work [Noy, Musen’01], [Melnik, et al.’02], [Ichise, et al.’01] • Mostly heuristics, or single machine learning techniques • Relaxation Labeling for constraint satisfaction [Hummel, Zucker’83], [Chakrabarti, et al.’00] • Significantly extend this approach

  19. Conclusions & Future Work • An automated solution to taxonomy matching • Handles multiple notions of similarity • Exploits data instances and taxonomy structure • Incorporates generic and domain-specific constraints • Produces high accuracy results • Future Work • More expressive models • Complex Mappings • Automated reasoning about mappings between models

More Related