1 / 22

Knowledge Discovery in Ontology Learning

Knowledge Discovery in Ontology Learning. A survey. Outline. Introduction OL Data Input OL Application Fields OL Methods OL Tools (practical session). Introduction. Ontology Engineering is a time-consuming task

vega
Download Presentation

Knowledge Discovery in Ontology Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Discovery in Ontology Learning A survey

  2. Outline • Introduction • OL Data Input • OL Application Fields • OL Methods • OL Tools (practical session)

  3. Introduction • Ontology Engineering is a time-consuming task • Ontology Learning (OL) is the semi-automatic process supporting ontology engineering • OL it is a bottom-up and data-driven process • OL is an interdisciplinary field

  4. OL Data Input • Pure NL text • Ontologies • KB (DB) instances • Schemata • DB schemata • Web schemata • Log files

  5. OL Application Fields • OL can support Ontology Engineering (and management) in different phases. • Ontology extraction: based on some input the ontology engineer gets ontology proposal. • Ontology reuse: pruning existing domain ontologies for a specific application. • Ontology interoperability (multiple ontology management): mapping discovery.

  6. OL Methods (outline) • Ontology Extraction (from text) • Weak ontology notion • Document Ontology extraction • Strong ontology notion • Association rules • Conceptual clustering • Ontology Reuse • Ontology Pruning • Ontology Learning for interoperability

  7. Document Ontology extraction (1) • Extraction of concepts from a set of documents and identification of relationships between these concepts with different individual terms [3] • No semantic relations extraction • Only concepts extraction (aggregation of terms identified with the same concept) • Use of statistical analisys above a set of documents • Good for domain specific applications

  8. Document Ontology extraction (2) • Input (text documents) • Pre-processing • Normalization • LSI (using SVD) • Document Ontology Construction

  9. Document Ontology extraction (3) Singular Value Decomposition A U S VT = X X Terms Terms r x r r x n Documents Concepts m x n m x r

  10. Association Rules (1) • Make use of shallow text processing techniques [6] • No taxonomic relation • Assumption: syntactic relations  semantic relations

  11. Association Rules (2) • Preprocess the text documents • Morphological analysis • Recognition of name entities • Retrieval of domain specific concepts (if available) • Disambiguation using context information • Determine Concept Pairs set (CP) using several heuristic (either general or domain dependant) • NP-PP heuristic • Sentence heuristic • Title heuristic

  12. |{ti|Xk Yk ti}| n |{ti|Xk Yk ti}| |{ti|Xk ti}| Association Rules (3) • Determine T = {{ai,1,…,ai,n}| (ai,1, ai,2)CP  m >2 ((ai,1, ai,m) H  (ai,2, ai,m) H)} • Determine support and confidence for all association rules Xk Yk, where |Xk|=|Yk|=1 • Propose to the user only the rules that exceed user-defined thresholds support (Xk Yk) = confidence (Xk Yk) =

  13. Conceptual Clustering (1) • Use of conceptual clustering approach [2,5] to extract a hierarchy of concepts and to learn subcategorization frames • In our case, examples to cluster are set of words, associated to the frequency of the corresponding instantiated frame in the corpora • Syntactic parser provides parsed sentences including attachments of noun phrases to verbs and clauses<to travel> <subject: father> <by: car><to travel> <subject: neighbor> <by: train><to drive> <subject: friend> <by: car><to drive> <subject: colleague> <by: motor-bike><to drive> <subject: friend> <by: motor-bike> • Unambiguous parsed sentences is not a requirement, noise is taken in account • The meaning of the concepts of the ontology is characterized by the subcategorization frames they appear in

  14. Conceptual Clustering (2) E.g.: <to travel> <subject: father> <by: car><to travel> <subject: neighbor> <by: train><to drive> <subject: friend> <by: car><to drive> <subject: colleague> <by: motor-bike><to drive> <subject: friend> <by: motor-bike><to travel> <subject: [father(1), neighbor(1)]> <by: [car(1), train(1)]><to drive> <subject: [friend(2), colleague(1)]> <by: [car(1), motor-bike(2)]><to travel> <subject: human> <by: motorized vehicle><to drive> <subject: human> <by: motorized vehicle>

  15. Conceptual Clustering (3) Clusters which have a maximum overlap (thus, clusters which contains the same words with the same frequencies) have to be merged.

  16. Ontology Pruning • Ontology pruning is a data-driven means to reuse existing (general) ontologies in order to tailor them to a certain domain [4] • The approach uses data-oriented techniques that are based on word/concept frequencies • The idea is to compare the frequencies of words/concepts in two different corpora, one domain-specific and one generic • Words/concepts whose frequencies, in the domain-specific corpora, overcome of a certain percentage the frequencies of the same words in the generic corpora, are accepted, the others rejected

  17. OL for Interoperability (1) • The key challenge here is to find semantic mappings between similar elements from two ontologies [1] • First problem: how can we define a meaningful similarity measure? • Second problem: how can we compute such measure using the available data? • An assumption here, is to have instances that can be used to learn concepts

  18. P(A  B) P(A  B) P(A ,B) P(A , B) + P(¬A , B) + P(A , ¬B) A B OL for Interoperability (2) • Similarity Measure • Many definitions are possible (it is task dependent) • Many similarity measures are based on the joint probability distribution:P(A , B) – P(¬A , B) – P(A , ¬B) – P(¬A , ¬B) • Jaccardcoefficent – JC(A,B) = =

  19. [N(U1A,B) + N(U2A,B)] [N(U1) + N(U2)] OL for Interoperability (3) • Distribution estimator • We assume to have a set of instances that is representative of the universe covered by the ontology • N(UiA,B) is the number of instances of the ith ontology that belongs to both A and B • P(A , B) = • Problem: what if A and B does not belong to the same ontology? (because this is our case!)

  20. OL for Interoperability (4) R U1A t1, t2, t3, t4 Trained Learner L t5, t6, t7 A C D t5, t6 t7 U1¬A E F t1, t2 t3, t4 G U2A , B U2¬A , B U2B L s1, s3 s2 , s4 s1, s2, s3, s4 B H s1 s5, s6 s5 s5, s6 s6 U2¬B U2A , ¬B U2¬A , ¬ B I J s2 s3, s4

  21. OL Tools (KAON) • http://kaon.semanticweb.org • Open Source • Java based • Implements a modular framework • Text2Onto, module for OL from text (association rules, see Association Rules (1)) • Ontology Pruning implemented (simple filter on TF)

  22. References [1] A. Doan, J. Madhavan, P. Domingos, A. Halevy. Learning to map between ontologies on the Semantic Web. In Proceedings of the 11th International World Wide Web Conference (WWW 2002), Hawaii, USA, May 2002. [2] D. Faure, C. Nedellec. A corpus-based conceptual clustering method for verb frames and ontology acquisition. In 1st International Conference on Language resources and Evaluation -- Workshop on Adapting lexical and corpus resources to sublanguages and applications, Granada, Spain, pages 1--8, 1998. [3] G. R. Maddi, C. S. Velvadapu, S. Srivastava, J. Gil de Lamadrid. Ontology Extraction from text documents by Singular Value Decomposition. [4] A. Maedche, R. Volz, R. Studer, B. Lauser. Pruning-based identification of a domain in ontologies. In Proc. of I-KNOW'03, Graz, Austria, 07 2003. [5] A. Maedche, V. Zacharias. Ontology-based Instance Clustering. In proc. of ECML/PKDD. Springer, 2002. [6] A. Maedche, S. Staab. Discovering Conceptual Relations from Text. In Proc. Of ECAI-2000.

More Related