1 / 8

Ontology Based Annotation of Text Segments

Ontology Based Annotation of Text Segments. Presented by Ahmed Rafea rafea@claes.sci.eg Samhaa R. El-Beltagy Maryam Hazman . Agenda. Objective Problems related to AGROVOC Requirements of the Proposed Annotation System The Architecture of the Proposed Annotation System Evaluation

Download Presentation

Ontology Based Annotation of Text Segments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Based Annotation of Text Segments Presented by Ahmed Rafea rafea@claes.sci.eg Samhaa R. El-Beltagy Maryam Hazman

  2. Agenda • Objective • Problems related to AGROVOC • Requirements of the Proposed Annotation System • The Architecture of the Proposed Annotation System • Evaluation • Conclusion and Future Work

  3. Objective • To annotate segments of organizational electronic publications and web pages with domain meta data, using the segments headings and an ontology, to enhance the quality and focus of the information retrieved when these publications are searched.

  4. Problems related to AGROVOC • Some Arabic terms are not available though they are available in the English version e.g. Moziac viruses, Physiological disorders, .. • Arabic agricultural terminology differs from one country to another, e.g.. Wheat is: حنطة in AGROVOC while it is قمح in Egypt • Agricultural entries that are very specific to a country (for example: country specific crop varieties). • Some important concepts are missing from AGROVOC all together. E.g. all instances of viral diseases (the entry exists in AGROVOC but has no narrower terms). • Inaccurate Arabic term e.g. the term cultivation is translated to فلاحة الأرض which narrows down the actual meaning of the cultivation term. A more accurate translation for this term could have been زراعيةمماراسات • Some important relationships were found to be missing e.g.The terms ‘Viral diseases’, and ‘Bacterial diseases’ are not listed as narrower terms of ‘Plant Diseases’ • Inaccurate relationship The concept نباتات ضارة (Noxious plants) is not listed as a NT of نباتات (Plants), but as a related term to it.

  5. Requirements of the Annotation System • It is required to build a system that is capable of: • Extending or customizing an existing ontology (automatically / semi-automatically) • Identifying multiple possible descriptors associated with any single segment. • Annotating segments with as specific concepts as possible. • Normalizing input text and the ontology through stemming to facilitate matching.

  6. HTML Doc Ontology Segmentor user Annotator Segment 1 Segment 2 ------- Segment n Ontology Extender Annotated Segments Annotated Segment Repository The Architecture of The Proposed Annotation System

  7. Evaluation • An expert was asked to annotate 4088 segments • The implemented system run on these segments heading and the results were as follows:- • The number of terms added to the ontology was 395 (which is equivalent to 95.6% of the 412 terms added by the expert). • Precision was 97%, Recall was 91%, and F-score was 94%. • Running the system on another 359 segment headings, without allowing any ontology extension, the results were as follows: • the precision was 96%, the recall was 86% and the F-score was 91%.

  8. Conclusion • The results of experiments carried out to evaluate this work, show that it can be used to annotate document segments with a high degree of accuracy. • The problems encountered that led to deficiency in the recall were analyzed and currently we are trying to enhance the results accuracy. Some of these problems are due to Ontology and others are due to processing Arabic text. • We plan to investigate the use of the generated annotated segments to build classifiers in order to assign labels to segments that have no headings. • We explore ontology extraction from information rich documents so as to be able to apply our approach when an initial ontology does not exist.

More Related