1 / 26

CILC2011

A framework for structured knowledge extraction and representation from natural language via deep sentence analysis. Università Degli Studi Dell’Aquila. Stefania Costantini Niva Florio Alessio Paolucci. CILC2011. Outline. Motivation Our Proposal Workflow

jenis
Download Presentation

CILC2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Università Degli Studi Dell’Aquila Stefania Costantini NivaFlorio Alessio Paolucci CILC2011

  2. Outline • Motivation • Our Proposal • Workflow • Deep Analysis: Parsing & Dependency Structure • Context Disambiguation • Resolution • OOLOT • RDF/OWL Exporting • Example • Conclusion

  3. Motivation “overcome the knowledge acquisition bottleneck”

  4. Motivation Structured data from plain text The more interesting one: Ontology population (Semantic Web) …but endless possibilities!!!

  5. OurProposal Our framework allows us to: • Extract knowledge from natural language sentences using a deep analysis technique based on linguistic dependencies and phrase syntactic structure. • Use OOLOT (Ontology Oriented Language of Thought) an intermediate language based on ASP (Answer Set Programming), specifically designed for the representation of the distinctive features of the knowledge extracted from natural language. • Easily Integrate our framework in the context of the Semantic Web. OOLOT lets us exploit the non monotonic reasoning (through ASP) to deal with common sense reasoning and other typical aspects of the knowledge encoded through the Natural Language.

  6. Workflow

  7. Parsing • Syntactic Parsing: • It can determine the syntactic structure of a sentence • Chomsky’s constituent analysis • It builds up the elements in their hierarchical order • Syntactic parsers decompose a text into tokens and attribute them their grammatical function • Statistical Parsing: • It is based on a corpus of training annotated data • It gathers information about the frequency with which the elements are needed in specific contexts • Only statistic may be not enough to determine when to split a symbol in sub-symbols • Probabilistic Context Free Grammar (PCFG): • More than one production rule may apply to a sequence of words, thus resulting in a conflict • It uses the frequency of various productions to order them

  8. Parsing Stanford Parser: PCFG parser

  9. Parsing Statistical parsing is useful to solve problems like ambiguity and efficiency BUT We lose part of the semantic information Dependency Grammar: words in a sentence are connected by means of binary, asymmetrical governor-dependent relationships

  10. Context Disambiguation Given a (finite) set of contexts, assign each lexical item to one (or more) context(s) including a score. 0.7 0.3 We use a simple, frequency-based, disambiguation algorithm.

  11. Resolution Each lexical item (a word, or a set of), is resolved against popular ontologies, including DBPedia, YAGO, GeoNames, WordNet 3 OWL, … <http://dbpedia.org/resource/Car>

  12. OOLOT The language of thought is an intermediate format mainly inspired by Kowalski’s LoT. It has been introduced to represent the extracted knowledge in a way that is totally independent from original lexical items and, therefore, from original language. Our LOT is itself a language, but its lexicon is ontology oriented, so we adopted the acronym OOLOT (Ontology Oriented Language Of Thought). OOLOT is used to represent the knowledge extracted from natural language sentences, so basically the bricks of OOLOT (lexicons) are ontological identifier related to concepts (in the ontology), and they are not a translation at lexical level.

  13. OOLOT: Lambda-based translation Example: “Many girls eat apples”

  14. OOLOT: Lambda-based translation Example: “Many girls eat apples”

  15. OOLOT: Lambda-based translation

  16. OOLOT: Lambda-based translation

  17. OOLOT: Lambda-based translation And, finally, after applying apple to the previous partial expression, we have:

  18. RDF/OWL Exporting Since OOLOT is designed to have a representation very close to RDF, it's possible to export toward RDF/OWL. In many cases, when is possible to maintain the semantic, there is a 1:1 mapping, otherwise we're starting using RDF/OWL syntactic approximations through reification (when you can’t preserve the original semantic) Best case: OOLOT: predicate(subject, object) RDF: <subject, predicate, object>

  19. Framework In Action “Ferrari is an Italian sports car manufacturer based in Maranello.”

  20. Framework in Action

  21. Framework in Action

  22. Framework in Action

  23. Framework in Action

  24. Framework in Action

  25. Framework in Action

  26. Conclusion & Future Works This is a quite new framework, so many aspects need to be refined and improved. Further exploit: OOLOT language ASP to RDF/OWL Exporting

More Related