1 / 19

Toward Semantic Web Information Extraction

Toward Semantic Web Information Extraction. B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding. Toward a Semantic Web. Fully automatic methods for the semantic annotation are needed Related topics Information retrieval (IR)

vinson
Download Presentation

Toward Semantic Web Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

  2. Toward a Semantic Web • Fully automatic methods for the semantic annotation are needed • Related topics • Information retrieval (IR) • Information extraction (IE) • Name-entity recognition (NER) • Annotation processes

  3. Semantic Annotation Diagram

  4. Name Entities • Named Entities (NE) • people, organizations, locations, and others referred by name. • May also include scalars and expressions • numbers, amounts of money, dates, etc. (NUMEX, TIMEX) • Hypothesis • Named entities (and the relations between them) mentioned in a resource constitute an important part of its semantics

  5. Semantic Annotation of NEs • Semantic Annotation of the NEs in a text includes: • Recognition of the type of the entities in the text • Identification of the entity individual • Comparison • the traditional NER approach results in: <Person>Yihong Ding</Person> • the Semantic Annotation of NEs should result in something like the following: <BYUPerson ID=“http://..byu../YihongDing”>Yihong Ding</BYUPerson>

  6. The KIM Platform • The Knowledge and Information Management Platform provides: • Automatic Semantic Annotation of NEs (and relations between them) • Ontology Population with NE individuals and relations • Indexing and Retrieval w.r.t NEs • Query and Navigation over the Formal Knowledge

  7. KIM Constituents • KIM Ontology (KIMO) • KIM World KB • KIM Server – with API for remote access and integration • Front-ends: KIM Web UI, Plug-in for Internet Explorer, and KB Explorer

  8. KIM Bases • KIM is based on the following open-source platforms: • GATE – NLP and IE platform in University of Sheffield • Sesame – RDF(S) repository Administrator b.v. • Ontology Middleware and Custom Inference by Ontotext as extensions of Sesame • Lucene – open source IR-engine from Apache

  9. KIM Architecture

  10. KIM Ontology (KIMO) • Light-weight upper-level ontology • 250 NE classes • 100 relations and attributes: • covers mostly NE classes, and ignores general concepts • includes classes representing lexical resources • www.ontotext.com/KIM/kimo.rdfs

  11. KIM World KB • A projection of the world (domain ontology) • Quasi-exhaustive coverage of the most popular entities in the world • Entities of general importance – like the ones that appear in the news • At present KIM KB consists of about 200,000 entities: • 50,000 locations, 130,000 organizations, 6000 people, etc.

  12. Entity Description • NEs are represented in KIM World KB with their Semantic Descriptions consisting of… • Aliases (Florida & FL) • Relations with other entities (Person hasPosition Position) • Attributes (latitude & longitude of geographic entities) • Proper class of the NE

  13. KIM Server • APIs for: • Semantic Annotation • Document Persistence • Indexing & Retrieval of documents w.r.t NEs • Semantic Repository Access & Exploration

  14. KIM Semantic Information Extraction • Based on GATE • NLP IE platform • Rules now based on ontology classes instead of a flat set of NE types • Recognition and Identification of the NEs • IE supported by a Semantic Repository • Containing lexical and gazetteer resources • Annotations referring to Entity Descriptions • Ontology Population with the newly recognized entities & relations

  15. KIM IE Pipeline

  16. KIM Plug-in

  17. KIM IE Performance • Evaluated over 3 human-annotated corpora of news articles: • International Business News, International Political News, and UK Political News (~500 articles): • Precision 86%, Recall 84% w.r.t the standard NE types • But these metrics are not representative for semantic annotation

  18. Semantic Annotation Metrics • There are no established metrics for semantic annotation: • No human-annotated corpora with precise class and instance information • No metrics for various partial matches • When a more specific class is recognized • When a more general class is recognized • When the class is correctly recognized, but the individual entity is not correctly identified.

  19. Conclusion • It is possible to adopt traditional IE techniques for semantic annotation • It is worth using almost-exhaustive entity knowledge for IE • KIM is still under development • Proper evaluation metrics • Precise disambiguation • More advanced IE techniques • KIM ontology and KB development

More Related