Cognitively Based Parsing and Extraction of Potential Knowledge from Sentence Structures

Cognitively Based Parsing and Extraction of Potential Knowledge from Sentence Structures Matthew Elliott, Draque Thompson, Nicholas Davis Director Dr. Per Aage Brandt, Center for Cognition and Culture, Department of Cognitive Science, Case Western Reserve University This work aims to show how computational processes can be trained to extract knowledge from textual input in a simple and straightforward manner. For example, looking at the sentence, ‘Luz went back to Pordonone to open a hospital,' (Hemingway, A Very Short Story) provides a vast amount of information. A system can extract the information provided by the sentence as knowledge of potentiality, P‐knowledge, or as knowledge of the factual, F-knowledge. P-Knowlede extracted from this sentence’s entailments is as follows: Luzcan go back to Pordonone. Pordononecan be a destination, which can be traveled to. Hospitals can be opened by Luz. Hospitals can be opened by Luz in Pordonone. The verbs ‘went’ and ‘open’ in the sentence are being defined by the co‐text, as things that Luz can do. Finally, what thisspecific person haspresently done, having gone back to Pordonone and having opened a hospital, is defined as factual knowledge. The definitions and growth of a P-knowledge network can be further refined by feeding the system additional corpora of text. As the system processes more information, it generates a wealth of worldly knowledge potentialities and patterns of affinity begin to emerge. A semantically meaningful parsing is the evident prerequisite of achieving such transitions from F to P-knowledge. Stemmatic sentence parsing has proven successful in retrieving correct grammatical constituency in this English pilot project. Natural Language Processing Potential Knowledge Network Organizing Texts The Potential Knowledge Network illustrates generalized semantic relationships between words within its parsed vocabulary. The network is dynamically configured based upon the current interpretational needs of the parser. After having parsed the right children’s books, our program might learn that wolves can bite, that children can be bitten and that wolves can bite children. This wolf p-knowledge may be used to interpret future texts. For example if the system encounters the phrase ‘Man is a wolf,’ it will be marked as a metaphor because there is no intrinsic relation between wolf and men in the p-knowledge as developed above. As sentences are parsed and fit to semantic trees, the trees are bound together, reflecting the emerging overall plot structure of the “story,” whether it is a technical manual or the children’s fable “Little Red Riding Hood”. It is at this point that anaphoric resolution takes place, attacking the difficult ambiguity issue of what pro-forms within the text are referencing. After this has been completed, this information is fed into the potential and factual knowledge networks. Natural Language Processing as applied to this project bridges the gap between Linguistics and Computer Science. This dual field of study focuses on the interaction between computers and natural (human) languages. It is separated in two domains. First, Natural Language Generation involves systems converting database knowledge into human readable languages. Second, Natural Language Understanding is the process by which computers are enabled to understand the meaning of natural languages. Future Applications The Hypothesis: Meaning is Encoded in Text Factual Knowledge Network The primary goal of this project is to produce a component to be leveraged as a module in future systems. As this program focuses heavily on a dynamic understanding of processed texts, tasks incorporating our knowledge networks for summarization of texts based on particular area of interest and translation are the most obvious potential applications envisioned. More ambitiously, any interactive artificial intelligence hoping to pass challenging Turing tests must have a system that allows for a conceptual understanding of natural language, as our knowledge network does. In sharp contrast to the Potential Knowledge Network, the Factual Knowledge Network represents what is the case, rather than what might be possible. This network represents specific knowledge about particular entities. For example, if James tells you that his dog is sick, you know specifically that his particular dog is sick. Although you can extract the potential knowledge that dogs possess the potential to be sick, you may only assert positively that James’ dog in particular is sick. The core of our theory is uncontroversial. Text has meaning encoded within it. It is our theory of extraction that is unique. Serious attempts at understanding the meaning encoded within text have been made previously. Most notably has been an effort from the University of California, Berkley. Their methods involve the creation of a large, encyclopedic knowledge bank containing all possible interactions between words from a predetermined body of vocabulary. This method offers only a static understanding of language. Our methodology differs in that that we have a system which builds knowledge by processing text directly. It establishes word meanings based purely on context, rather than by relying on human data entry. Stemma Trees Semantic sentence trees are the parser’s output. These trees are diagrams of the actual semantic meaning encoded by the syntactic components within sentences, rather than a vague Chomskian syntactic structure or a school grammar formula. The structure of the trees is language neutral, in the sense that it captures elementary semantic meaning in a computable node structure format that applies to any language (so far we have encountered no exceptions, though we might). Each node of the semantic tree represents one of nine profiles, each representing a conceptual category that is active in sentence formation. The same set of nodes, in different constellations, can model all possible constructions. Completed Semantic Parsing Engine Summarization Translation Strong A.I. The Parser “I will translate this.” “私はこれをやくします。” Our parser uses an algorithm that is a derivative of the Cocke-Younger-Kasami parsing algorithm. The primary distinction between the two is that our parser has been developed to return all possible meaning interpretations, while the traditional CYK parsing algorithm only determines the validity of a sentence based on the rules of a particular language. The cost of our complete interpretive processing is an increase in processing time for any given phrase. “Ishmael tells the story of captain Ahab trying to catch a white whale…” Acknowledgements Contact Information This work was supported by the Laboratory for Applied Research in Cognitive Semiotics, lead by Dr. Per Aage Brandt, Center for Cognition and Culture, Department of Cognitive Science at Case Western Reserve University and also by SOURCE. Larimee Cortnik, Department Assistant: larimee.cortnik@case.edu

Cognitively Based Parsing and Extraction of Potential Knowledge from Sentence Structures

Cognitively Based Parsing and Extraction of Potential Knowledge from Sentence Structures

Presentation Transcript

Large-Scale Extraction and Use of Knowledge from Text

Sentence Structures

Sentence Structures

Sentence Structures

Sentence Parsing (labeling)

Sentence Structures

Sentence Structures

Important sentence structures

Sentence Structures

Analysis and Knowledge Extraction from Video & Audio

A Knowledge-based Approach to Citation Extraction

Sentence Parsing

Optimising the Extraction of Knowledge from Email Messages

Clauses and Sentence Structures

Mariam TARIQ Automatic Extraction of Knowledge Structures in Specialist Domains

Sentence Structures

Sentence Structures

Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data

Sentence Structures

BM: Extraction of temperature from equivalent potential temperature

Sentence Extraction Results

Sentence Structures

Cognitively Based Parsing and Extraction of Potential Knowledge from Sentence Structures