towards a semantic extraction of named entities l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Towards a semantic extraction of named entities PowerPoint Presentation
Download Presentation
Towards a semantic extraction of named entities

Loading in 2 Seconds...

play fullscreen
1 / 14

Towards a semantic extraction of named entities - PowerPoint PPT Presentation


  • 322 Views
  • Uploaded on

Towards a semantic extraction of named entities. Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK. Introduction. Challenges posed by progression from traditional IE to a more semantic representation of NEs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards a semantic extraction of named entities' - Sharon_Dale


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
towards a semantic extraction of named entities

Towards a semantic extraction of named entities

Diana Maynard, Kalina Bontcheva, Hamish Cunningham

University of Sheffield, UK

introduction
Introduction
  • Challenges posed by progression from traditional IE to a more semantic representation of NEs
  • What techniques are best for the deeper level of analysis necessary?
  • Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?
the ace program
The ACE program

“A program to develop technology to extract and characterise meaning from human language”

Aims:

  • produce structured information about entities, events and the relations that hold between them
  • promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)
the ace tasks
The ACE tasks
  • Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility)
  • Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal)
  • Identification of relations holding between such entities
slide5

<entity ID="ft-airlines-27-jul-2001-2"

GENERIC="FALSE"

entity_type = "ORGANIZATION">

<entity_mention ID="M003"

TYPE = "NAME"

string = "National Air Traffic Services">

</entity_mention>

<entity_mention ID="M004"

TYPE = "NAME"

string = "NATS">

</entity_mention>

<entity_mention ID="M005"

TYPE = "PRO"

string = "its">

</entity_mention>

<entity_mention ID="M006"

TYPE = "NAME"

string = "Nats">

</entity_mention>

</entity>

the mace system
The MACE System
  • Rule-based NE system developed within GATE, adapted from ANNIE
  • PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer
  • Also: genre ID, switching controller to select different PRs automatically
differences between annie and mace
Differences between ANNIE and MACE
  • Locations  Location / GPE
  • GPEs have roles (GPE, Per, Org, Loc)
  • New type Facility (subsumes some Orgs)
  • Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country)
  • No Date, Time, Money, Percent, Address, Identifier
what does this mean in practical terms
What does this mean in practical terms?
  • Separation of specific from general information makes adaptation easier
  • Reclassification of gazetteers unnecessary
  • Changes mainly to semantic grammars to

- use different gazetteer lookups

  • use more contextual information
  • group rules together differently
semantic grammars
Semantic Grammars
  • ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type)
  • MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type)
  • The important factor is the increased complexity of new rules, rather than the number
  • Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute
  • 6 weeks for adaptation
evaluation 2
Evaluation (2)
  • NEWS – 92 articles (business news)
  • ACE – 86 broadcast news from September 2002 evaluation
  • Difference on ACE task
  • MACE on MUC-style annotations
    • GPEs are left as GPE (so count as errors)
    • GPEs are mapped to Locations
comparison of annie vs mace
Comparison of ANNIE vs MACE

72% Precision, 84% Recall if GPEs mapped to Locations

conclusions
Conclusions
  • MACE is a rule-based NE system, in contrast with most systems which use ML.
  • Advantages that doesn’t require much training data, and is fast to adapt because of its robust design
  • If large amounts of training data are available, HMM-based systems tend to perform slightly better
  • Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods