1 / 7

Ontea: Pattern based Annotation Platform

Ontea: Pattern based Annotation Platform. Michal Laclavík. Ontea Method. Motivation To create semantic meta data from texts or documents Approach Even unstructured text contains patterns Patterns can be used to extract various objects from text Results are: key - value pairs

alban
Download Presentation

Ontea: Pattern based Annotation Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontea: Pattern based Annotation Platform Michal Laclavík

  2. Ontea Method • Motivation • To create semantic meta data from texts or documents • Approach • Even unstructured text contains patterns • Patterns can be used to extract various objects from text • Results are: key - value pairs • Such pairs can be transformed to ontology individuals • Class – individual • Individual – property http://ontea.sourceforge.net

  3. Text Bratislava is the capital of Slovakia. Slovakia is in Europe. Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location Ontea discovers key – value pair: Location – Europe By transformation to ontology knowledge base - it finds Europe as continent using inference (sub-class of Location) Continent – Europe More Examples are in the table: Result Examples http://ontea.sourceforge.net

  4. Features • Identification of concept instances from the ontology • Automatic population of ontologies with instances • Identifying relevance, when creating instances using information retrieval techniques • Large scale semantic annotation of documents or texts using Google’s MapReduce architecture. http://ontea.sourceforge.net

  5. Advantages • Simple, customizable method • Not tied to document structure • Architecture build on detection of key-value pairs and its various transformation. For example: • Text: “Slovensko je v Európe“=> • Extraction: Location – Európe => • Transformation, Lemmatization: Location – Európa => • Transformation, Ontology: Continent – Europe • Scalable method. Ported to Grid and Hadoop. • Applicable on texts in any language • Success rate 60%-90% depending on used patterns, transformers and application http://ontea.sourceforge.net

  6. Integration with other tools URL DocConverter Plain Text Nalit Language Identification Ontea Pattern Matching Morphonary Transformation: Lemmatization Transformation: Individual Search and Creation Transformation: Relevance Identification Lucene Ontology Repository http://ontea.sourceforge.net

  7. Future research & development http://ontea.sourceforge.net/

More Related