Towards a semantic extraction of named entities
Download
1 / 14

Towards a semantic extraction of named entities - PowerPoint PPT Presentation


  • 320 Views
  • Uploaded on

Towards a semantic extraction of named entities. Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK. Introduction. Challenges posed by progression from traditional IE to a more semantic representation of NEs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards a semantic extraction of named entities' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Towards a semantic extraction of named entities l.jpg

Towards a semantic extraction of named entities

Diana Maynard, Kalina Bontcheva, Hamish Cunningham

University of Sheffield, UK


Introduction l.jpg
Introduction

  • Challenges posed by progression from traditional IE to a more semantic representation of NEs

  • What techniques are best for the deeper level of analysis necessary?

  • Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?


The ace program l.jpg
The ACE program

“A program to develop technology to extract and characterise meaning from human language”

Aims:

  • produce structured information about entities, events and the relations that hold between them

  • promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)


The ace tasks l.jpg
The ACE tasks

  • Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility)

  • Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal)

  • Identification of relations holding between such entities


Slide5 l.jpg

<entity ID="ft-airlines-27-jul-2001-2"

GENERIC="FALSE"

entity_type = "ORGANIZATION">

<entity_mention ID="M003"

TYPE = "NAME"

string = "National Air Traffic Services">

</entity_mention>

<entity_mention ID="M004"

TYPE = "NAME"

string = "NATS">

</entity_mention>

<entity_mention ID="M005"

TYPE = "PRO"

string = "its">

</entity_mention>

<entity_mention ID="M006"

TYPE = "NAME"

string = "Nats">

</entity_mention>

</entity>


The mace system l.jpg
The MACE System

  • Rule-based NE system developed within GATE, adapted from ANNIE

  • PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer

  • Also: genre ID, switching controller to select different PRs automatically


Differences between annie and mace l.jpg
Differences between ANNIE and MACE

  • Locations  Location / GPE

  • GPEs have roles (GPE, Per, Org, Loc)

  • New type Facility (subsumes some Orgs)

  • Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country)

  • No Date, Time, Money, Percent, Address, Identifier


What does this mean in practical terms l.jpg
What does this mean in practical terms?

  • Separation of specific from general information makes adaptation easier

  • Reclassification of gazetteers unnecessary

  • Changes mainly to semantic grammars to

    - use different gazetteer lookups

  • use more contextual information

  • group rules together differently


Semantic grammars l.jpg
Semantic Grammars

  • ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type)

  • MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type)

  • The important factor is the increased complexity of new rules, rather than the number

  • Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute

  • 6 weeks for adaptation



Evaluation 2 l.jpg
Evaluation (2)

  • NEWS – 92 articles (business news)

  • ACE – 86 broadcast news from September 2002 evaluation

  • Difference on ACE task

  • MACE on MUC-style annotations

    • GPEs are left as GPE (so count as errors)

    • GPEs are mapped to Locations


Comparison of annie vs mace l.jpg
Comparison of ANNIE vs MACE

72% Precision, 84% Recall if GPEs mapped to Locations


Conclusions l.jpg
Conclusions

  • MACE is a rule-based NE system, in contrast with most systems which use ML.

  • Advantages that doesn’t require much training data, and is fast to adapt because of its robust design

  • If large amounts of training data are available, HMM-based systems tend to perform slightly better

  • Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods


ad