named entity disambiguation a hybrid statistical and rule based incremental approach l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach PowerPoint Presentation
Download Presentation
Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach

Loading in 2 Seconds...

play fullscreen
1 / 28

Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach - PowerPoint PPT Presentation


  • 415 Views
  • Uploaded on

BK TP.HCM Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi Minh City University of Technology, Vietnam) Semantic Web Group (VN-KIM) Faculty of Computer Science & Engineering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
named entity disambiguation a hybrid statistical and rule based incremental approach

BK

TP.HCM

Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach

Hien Nguyen*(Ton Duc Thang University, Vietnam)

Tru Cao (Ho Chi Minh City University of Technology, Vietnam)

Semantic Web Group (VN-KIM)

Faculty of Computer Science & Engineering

Ho Chi Minh City University of Technology

*Email: hien@tut.edu.vn

outline
Outline
  • Introduction
  • Wikipedia
  • Algorithm
  • Experimental results
  • Concluding remarks
introduction named entities
Introduction: Named Entities
  • Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc.
  • Example
  • “Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”
introduction problem
Introduction: Problem
  • Different NEs may have the same name.
    • “John McCarthy has been a staple of the Ultimate Fighting Championship since its second event on March 11, 1994.”

John McCarthy John McCarthy(referee)

    • “John McCarthy, professor of computer science at Stanford University, who developed LISP.”

John McCarthy John McCarthy(computer scientist)

    • “John McCarthy, Britain's longest-held hostage in Lebanon, has been set free after more than five years in captivity.”

John McCarthy John McCarthy(journalist)

introduction motivation
Introduction: Motivation
  • Web searches
    • Queries about Named Entities (NEs) constitute a significant portion of popular web queries (Bunescu et al., EACL 2006).
    • ~ 30% of search engine queries include person names (R. Guha et al., WWW 2004)
    • Named entity disambiguation can lead to improve effectiveness of search results on the web for popular named entities.
  • Web-based Information Extraction
    • Identifying exactly NEs in web pages can improve accuracy in IE tasks (e.g. extracting relationships between NEs).
  • Question & Answering
    • Identifying exactly NEs in questions can improve accuracy of answers
introduction ne disambiguation
Introduction: NE Disambiguation
  • Mapping entity names (in a text) to actual entities in a KB of discourse (e.g. Wikipedia).
    • An ambiguous entity names are out of the KB
    • An ambiguous entity names occur in the KB, but they refer to named entities out of the KB
    • An ambiguous entity names refer to two or more than named entities in the KB
introduction ne disambiguation7
Introduction: NE disambiguation

But much like the first presidential debate held two weeks ago in Oxford, Mississippi, a draw for Obama would be considered a win.

introduction ne disambiguation8
Introduction: NE disambiguation

Gamsakhurdia is seen as a national hero by those who mourn him Zviad Gamsakhurdia, Georgia's first president after independence from the USSR, has been buried in the capital Tbilisi 14 years after his death.

ne disambiguation
NE disambiguation

John McCarthy, 'great man' of computer science, wins major award

introduction approach
Introduction: Approach
  • Disambiguation based on context
    • Co-occurring entity names
    • Co-occurring NE identifiers
    • Tokens in a window context centered at a name in consideration
  • Disambiguation based on a KB
    • We view that instances in the KB have two in formation
      • Attributes
      • Relations
    • We represent those instances by their attributes and relations
introduction approach11
Introduction: Approach

Text containing ambiguous names

Wikipedia article

  • All keywords in the window text centred around the ambiguous name
  • The whole text is extended with page titles of the previously identified NEs enclosed
  • Entity page titles
  • Redirecting page titles
  • Category labels
  • Hyperlink labels

Heuristics +TF-IDF vector similarity

wikipedia
Wikipedia
  • Wikipedia is a free encyclopedia written by a collaborative effort of global community of more than 150,000 volunteers
  • These volunteers have contributed more than 11 million articles in 265 languages
  • More than 275 million people visit Wikipedia site every month
  • 2,697,848 articles in English version (visiting Jan 14th, 2009)
wikipedia pages titles14
Wikipedia – Pages &Titles

Disambiguation text

wikipedia redirect pages
Wikipedia – Redirect pages

Redirect page titles

algorithm
Algorithm
  • Hybrid statistical and rule-based incremental algorithm:
    • Rule-based NE disambiguation
      • Utilizing Wikipedia disambiguation texts

E.g. “… Rockville, Maryland …” , disambiguation text Maryland helps identifying Rockvilleis an area in Maryland

algorithm20

On Thursday morning, Sen. Barack Obama warned supporters not to get "cocky," while a few hours later McCain pledged to Pennsylvania voters he would erase Obama's lead by Election Day.

Algorithm
  • Rule-based NE disambiguation (cont.)
      • Exploiting coreference relationship between referents: Propagation of the identified NE, if any, along its coreference chain

E.g.

      • Extension of the whole text with the Wikipedia entity page titles of the identified NEs
algorithm21
Algorithm
  • After Rule-based stage, for remaining ambiguous names, matching the whole text vector with Wikipedia candidate entity pages

The extracted context surrounding ambiguous names

Wikipedia article

  • All keywords in the window text centred around the ambiguous name
  • The whole text is extended with page titles of the previously identified NEs enclosed
  • Entity page titles
  • Redirecting page titles
  • Category labels
  • Hyperlink labels

TF-IDF vector similarity

experimental results
Experimental results
  • Experiments: 10 news from CNN on Travel, Entertainment, World, World Business, and Americas
experimental results24
Experimental results
  • D1 obtained after running GATE
  • D2 obtained after GATE’s errors corrected
experimental results25
Experimental results
  • We measure accuracy as the total number of right assignments NE (in text)/Wiki NE divided by the total number of assignments
concluding remarks
Concluding remarks
  • The proposed method is a hybrid and incremental process that utilizes previously identified NEs and related terms co-occurring with ambiguous names in a text for entity disambiguation
  • Work under investigation:
    • Disambiguating ambiguous cases when ambiguous names occur in a KB, but they refers to named entities out of the KB.
thanks for your attention

Thanks for your attention

VN-KIM Group

http://www.cse.hcmut.edu.vn/vn-kim/

Contact author:hien@tut.edu.vn or nthien97@yahoo.com