To link or not to link a study on end to end tweet entity linking
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

To Link or Not to Link ? A Study on End-to-End Tweet Entity Linking PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

To Link or Not to Link ? A Study on End-to-End Tweet Entity Linking. Stephen Guo, Ming-Wei Chang , Emre Kıcıman. Motivation. Microblogs are data gold mines! Twitter reports that it alone captures over 340M short messages per day Many applications on tweet information extraction

Download Presentation

To Link or Not to Link ? A Study on End-to-End Tweet Entity Linking

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


To link or not to link a study on end to end tweet entity linking

To Link or Not to Link? A Study on End-to-End Tweet Entity Linking

Stephen Guo, Ming-Wei Chang, Emre Kıcıman


Motivation

Motivation

  • Microblogs are data gold mines!

    • Twitter reports that it alone captures over 340M short messages per day

  • Many applications on tweet information extraction

    • Election results (Tumasjan et al., 2010)

    • Disease spreading (Paul and Dredze, 2011)

    • Tracking product feedback and sentiment (Asur and Huberman, 2010)

    • ...

  • Existing tools (for example, NER) are often too limited

    • Stanford NER on tweets set achieves 44% F1 [Ritter et. al, 2011]


Entity linking wikifier in tweets

Entity Linking (Wikifier) in Tweets

Oh Yes!! giants vs packers game now!! Touchdown!!

  • Q1: Which phrase should be linked? (mention detection)

  • Q2: Which Wikipedia page should be linked for selected phrases? (disambiguation)


Contributions

Contributions

  • Proposed a new evaluation scheme for entity linking

    • A natural evaluation scheme for microblogs

  • A system that performs significantly better on tweets than other systems

    • Learn to detect mention and perform linking jointly

    • Outperform Tagme[Ferragina & Scaiella 2010] and [Cucerzan 07] by 15% F1

  • What we have learned

    • Mention detection is a difficult problem

    • Entity information can help mention detection


Outline

Outline

  • Task Definition (again!)

  • Two stage versus Joint

  • Model + Features

  • Results + Analysis


What should be linked

What should be linked?

Oh Yes!! giants vs packers game now!! Touchdown!!

  • Comparing different Wikifiersis a tough problem [Cornolti, WWW 2013]

  • Really, there is no good definition on what should be linked


Our scenario

Our Scenario

What people are talking about the movie “The Town” on twitter?

  • Assume our customers are only interested in entities of certain types

    • Movies; Video Games; Sports Team;…

    • Type information can be directly inferred by the corresponding Wikipedia page

  • Now, it is fair to compare different systems

    • We assume PER, LOC, ORG, BOOK, TVSHOW, MOVIE


The desired results

The Desired Results

  • Oh Yes!! giants vs packers game now!! Touchdown!!


Terminology

Terminology

  • Oh Yes!! giants vs packers game now!! Touchdown!!

Assignment

Mention Candidates

Mentions

Entity


Related work

Related Work

  • Wikifier [Cucerzan, 2007; Milne and Witten, 2008…….]

    • Given a document, create Wikipedia-like links

    • Very difficult to evaluate/compare

    • Mention detection and disambiguation are often treated separately

  • NER [Li et al., 2012; Ritter et al., 2011, ...]

    • No Linking

    • Limited Types

  • KBP [Ji et al., 2010; Ji et al., 2011,...]

    • Focus on disambiguation aspect


Outline1

Outline

  • Task Definition (again!)

  • Two stage versus Joint

  • Model + Features

  • Results + Analysis


What approach should we use

What approach should we use?

  • Task: Wikifier to the entities of the certain types (all named entities)

  • Approach 1:

    • Train a general named entity recognizer for those types

    • Link to entities from the output of the first stage

  • Approach 2:

    • Learn to jointly detect mention and disambiguate entities

    • Take advantage of Wikipedia information

    • Take advantage of type information into our model

Limited Types; Adaptation

Advanced model


The necessity of the joint approach

The Necessity of the Joint Approach

The town is so so good, Don’t worry Ben, we already forgave you for Gigli

  • Q: Is “the town” a mention?

  • Deep analysis with knowledge is required

    • Gigli is Ben Affleck’s movie, which did not receive a good review

    • Ben Affleck is the lead actor in the movie “The Town”


Outline2

Outline

  • Task Definition (again!)

  • Two stage versus Joint

  • Model + Features

  • Results + Analysis


Features

Features

  • Oh Yes!! giants vs packers game now!! Touchdown!!

Mention, Entity Pair Features

2-nd Order Features

Type Features

Mention Specific Features


Mention specific features

Mention Specific Features

  • Mention Specific Features

    • How likely “giants” is being used as an anchor?

    • How likely “giants” is capitalized in Wikipedia?

    • Is the “giants” a stopword? The number of tokens…

  • Entity - Mention Pair Features

    • Given a string "giants". Estimate by Wikipedia link structure

    • Similarity between the context of the and the words in Wikipedia “”

    • View count


View count

View Count

  • The Wikipedia statistics

    • http://dumps.wikimedia.org/other/pagecounts-raw/

    • Log exists for every hour

    • Very valuable data

  • View count is useful

    • Sometimes the most linked entity in Wikipedia is not the most popular one

    • “jersey shore” ==> ?

    • Jersey Shore links: 441 views: 509140

    • Jersey Shore (TV_series) links: 324 views: 5081377


Second order features

Second Order Features

  • = the set of Wikipages that link to

  • The Jaccard score


Type features

Type Features

  • The information content on Wikipedia are different from Twitter

    • Wikipedia is informational; Tweets are actionable

    • Misspelled words: “watchin, watchn, …… “

  • We want to find context for PER, LOC, ORG,… for tweets

    • Step 1: train on a system

    • Step 2: labeled 10 million unlabeled tweets

    • Step 3: Collect popular contextual words for each type

    • Step 4: train a new system with one new feature

      • Check if the context match the type


Mining contextual words

Mining Contextual Words


Procedure

Procedure

  • Testing: step 1

    • Given a tweet

    • Tokenize it, remove symbols, segment hashtags

  • Testing: step 2

    • For all k-gram words in the tweet, do table look up

      • To find mention candidates and the entities they can link to

  • Testing: step 3

    • Construct features and output the assignment with the trained model

  • Learning: Structural SVM; Inference: Exact/Beamseach

    • A rule-base system for categorizing Wikipedia


Outline3

Outline

  • Task Definition (again!)

  • Two stage versus Joint

  • Model + Features

  • Results + Analysis


To link or not to link a study on end to end tweet entity linking

Data

  • We sample two sets of tweets

    • Train, Test 1 from [Ritter 2011]

    • Test 2 from Twitter with entertainment keywords

      • “director, actress”……

  • [email protected] is very high

    • Many, many algorithms focus on disambiguation

    • However, if the mention are correctly extracted, the system is already very good


Main results

Main Results

  • TagMe[Ferragina & Scaiella 2010] and Cucerzan [Cucerzan 07]

    • Cucerzan is designed for well-written documents

    • We have a more principle way to handle mention detection than Tagme


Impact of features

Impact of Features

  • Entity information helps mention detections

  • Mining contextual words helps a bit

  • Capturing Entity-Entity relation also improves the model


Conclusion discussions

Conclusion & Discussions

  • We provide an experimental study on tweets

    • Jointly detect mentions and disambiguate

    • A structured learning approach

  • What have we learned

    • Mention detection is a difficult problem

    • Entity information could potentially help mention detection

  • Future work

    • Explore the connections between the joint approaches and the two stage approaches

      • [Illinois—ACL 2011, Aida-- VLDB 2011]

    • A more principled way to handle context


  • Login