1 / 19

Solving Some Text Mining Problems with Conceptual Graphs

Tula State University Faculty of Cybernetics Laboratory of Information Systems. Solving Some Text Mining Problems with Conceptual Graphs. M. Bogatyrev , V. Tuhtin. 200 8. The Nature of Text Mining. Data mining: "the nontrivial extraction of implicit, previously unknown, and

dom
Download Presentation

Solving Some Text Mining Problems with Conceptual Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tula State UniversityFaculty of Cybernetics • Laboratory of Information Systems Solving Some Text Mining Problems with Conceptual Graphs M. Bogatyrev, V. Tuhtin 2008

  2. The Nature of Text Mining Data mining: "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[1] "the science of extracting useful information from large data sets or databases."[2] Text mining is interdisciplinary: • Text mining: • process of deriving • high quality information from text; • text data mining; • text analytics • information retrieval, • machine learning, • statistics, Computational Linguistics • W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992),Knowledge Discovery in Databases: • An Overview, AI Magazine: pp. 213–228 • 2. D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA.

  3. Computational (Corpora) Linguistics Text Mining Natural Language Processing KnowledgeDiscovery Global Problems • Analysis of: • syntax • grammar • morphology • semantics • text categorization, • text clustering, • concept/entity extraction, • sentiment analysis, • document summarization Problems • annotation • abstraction • ontologies • semantic roles • Objects of tagging • clusters, • trends, • associations, • deviations Processing objects • Knowledge Models: • rules; • ontologies Metadata • Corpora: • large and structured text • tagging Data: Plain text

  4. Concepts Representations. Conceptual Graph Standard by J. Sowa [8] Conceptual relations 1. Conceptual Graph Interchange Form (CGIF) [City*a:'Boston'][Bus*b:''][Person*c:'John'][Going*d:''](agent?d?c)(dest?d?a)(instrument?d?b) 2. XML Form <graph id="35979486054" owner="0"> <type><label>Proposition</label> </type><layout> <rectangle x="0.0" y="0.0" width="1500.0" height="1500.0"/> <colorforeground="0,0,175" background="0,0,175"/> </layout>…</layout></arrow></graph> </conceptualgraph> Applying Predicate Calculus (CGIF + NOTIO) Conceptual Graph: Example: “John is going to Boston by bus”

  5. Conceptual Graphs in Digital Libraries • Supporting CGs in Digital Libraries: • Building and storing CGs • Automated building of CGs • Organizing access to CGs in Datastore • Solving applied problems with CGs • Automated building and developing catalogues and rubricators of DLs • KDD problems

  6. Supporting Conceptual Graphs :Building and storing CGs Lexical restrictions are needed: Semantic Role Labelling helps to create conceptual relations in CGs • Standard way of building CG • The sentences are marked with part-of-speech tags. • Some titles and sentences from abstracts are filtered • The selected sentences are parsed, obtaining their syntactic tree. • The syntactic tree is traversed and the canonical conceptual graphs related to itnodes are joined. • DL contains scientific papers • Only abstracts are transformed to CGs http://framenet.icsi.berkeley.edu/ http://wordnet.princeton.edu/

  7. Semantic RoleLabelling for CGs Building “The working of a genetic algorithm is usually explained by the search for superior building blocks”. “John is going to Boston by bus” http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php

  8. Conceptual Graphs in Some Text Mining Problems 1. Building Association Rules - Set of CGs - initial set Generalization for concepts Disjoin for relations - transactional set • subsets represented • in T - Set of generalized CGs - Association Rule - Association Rule on CGs Supported by Having Confidence as

  9. Conceptual Graphs in Some Text Mining Problems 2. Building Ontologies by Aggregation of CGs Supporting Contexts: - with CGs: - with Corpora: In analyzing the ambiguities, Wittgenstein developed his theory of language games, which allow words to have different senses in different contexts, applications, or modes of use.

  10. Solving Text Mining problems by CGs clustering CGs Hierarchy • CGs Contexts problem • CGs Similarity problem ? Clustering algorithm for specific similarity measures

  11. Conceptual Graphs Clustering Similarity Measures Conceptual similarity Relational similarity Some modifications of similarity measures Unified similarity measure

  12. Genetic algorithm:speciality of decisions Ackley test function Fitness function trajectories Final population Initial population

  13. a1 an a2 … ai … n objects Genetic algorithmfor clustering GA chromosomes representing the clustering for variousencoding schemes for clustering[5]: (a) groupnumber;(b) matrix;(c) permutation with the separator character 7; (d) greedy permutation;(e) orderbased. Clusters: {X1, X3, X6}, {X2, X4, X5} Our encoding scheme: picks the number of object which is in the same cluster as i -th object

  14. a1 an a2 … ai … n objects Genetic algorithmfor clustering Chain encoding for Conceptual Graphs: • realizes implicit parallelism of genetic algorithms; • forces clustering algorithm to work faster; • is invariant under similarity measure on CGs An idea about a possibility to vary fitness function of GA by varying its parameters

  15. Семантическая информация Модуль Ассоциативные выделения связи ассоциативных связей Характеристические запросы Отношения Индивидуумы Набор объектов Модуль оценки Пользователь Ядро системы Модуль ГА индивидуумов Характерис- Пригодность тические Зап- Резуль- запросы Структура БД росы таты Объекты Модуль Запросы Метабаза агрегирования Отношения данных БД Результаты Структура БД 13 EVO – LIB ProjectSystem’s architecture Объекты Структура БД Агент

  16. Conceptual Graphs Clustering:Data Example for Clustering • We assume that the modality (i.e., number of local optima) of a fitness landscape is related to the difficulty of finding the best point on that landscape by evolutionary computation (e.g., hillclimbers and genetic algorithms (GAs)). • We first examine the limits of modality by constructing a unimodal function and a maximally multimodal function. • At such extremes our intuition breaks down. • A fitness landscape consisting entirely of a single hill leading to the global optimum proves to be hard for hillclimbers but apparently easy for GAs. • A provably maximally multimodal function, in which half the points in the search space are local optima, can be easy for both hillclimbers and GAs. • Exploring the more realistic intermediate range between the extremes of modality, we construct local optima with varying degrees of “attraction” to our evolutionary algorithms. • Most work on optima and their basins of attraction has focused on hills and hillclimbers, while some research has explored attraction for the GA's crossover operator. • We extend the latter results by defining and implementing maximal partial deception in problems with k arbitrarily placed global optima. • This allows us to create functions with multiple local optima attractive to crossover. • The resulting maximally deceptive function has several local optima, in addition to the global optima, each with various size basins of attraction for hillclimbers as well as attraction for GA crossover. • This minimum distance function seems to be a powerful new tool for generalizing deception and relating hillclimbers (and Hamming space) to GAs and crossover. • This paper describes an initial version of a library of sharable and reusable medical ontological theories, organized according to a proposed classification of ontologies.

  17. Conceptual Graphs Clustering:clustering results - applying conceptual nearness - applying relational nearness

  18. Resume • Conceptual graphs is the perspective tool for modelling semantics of texts in DL. • A process of creating ontologies can be based on technologies which use conceptual graphs. • Conceptual graphs clustering helps in solving structural problems in DLs and in understanding its data. • Evolutionary approach is perspective in semantic modelling with conceptual graphs. • To progress CGs technologies, a joined efforts of computer specialists and linguists are needed.

  19. References • A World of Conceptual Graphs: http://conceptualgraphs.org/ • Boytcheva, S. Dobrev, P. Angelova, G.CGExtract: Towards Extraction of Conceptual Graphs from Controlled English. Lecture Notes in Computer Science № 2120, Springer 2001. • F. Southey J. G. Linders. Notio - A Java API for Developing CG Tools. 7th International Conference on Conceptual Structures, 1999. P.p. 262-271. • Hirst G. Ontology and the Lexicon. - Handbook on Ontologies in Information Systems, Berlin – Springer, 2003. • Cole, R. M. Clustering With Genetic Algorithmshttp://citeseer.ist.psu.edu/cole98clustering.html. • Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-Yates, Text Mining at Detail Level Using Conceptual Graphs. Lecture Notes in Computer Science Vol. 2393. Springer-Verlag, 2002. Pp. 122 - 136 • Sarbo, J. Formal conceptual structure in language. In Dubois, D. M., editor, Proceedings of Computing Anticipatory Systems (CASYS'98), pp. 289 - 300, Woodbury, New York. 1999. • Sowa R., Conceptual Graphs: Draft Proposed American National Standard, International Conference on Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer 1999. • Богатырёв М.Ю. , Латов В.Е. Исследование генетических алгоритмов кластеризации. - Изв. ТулГУ. Сер. Математика. Механика. Информатика. Том 8, вып. 3 . Информатика. - Тула, 2002. - С. 101- 107. • Holland J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press. Reprinted by MIT, 1992. • Растригин Л.А. Адаптация сложных систем. Рига: Зинатне, 1981. 375 с. • Емельянов В.В., Курейчик В.В., Курейчик В.М. Теория и практика эволюционного моделирования. – М.: Физматлит, 2003 - 432 с. • Богатырёв М.Ю. Генетические алгоритмы: принципы работы, моделирование, применение. Тула, ТулГУ, 2003. 152 с. • M. Bogatyrev. Modelling Systems With Symmetry// Proceedings of the 4 th International IMACS Symposium of Mathematical Modelling. - Vienna, Austria, February 5-7, 2003.- ARGESIM-Verlag, Vienna, 2003. - pp. 270 - 275. • M. Bogatyrev, V. Latov, K. Avdeev. Symmetry Based Decomposition and its Application in Evolutionary Modelling. – Applied Mathematica: Proc. of 8 th International Mathematica Symposium. Avignon, 19-23 June, France, 2006

More Related