named entity mining from click through data using weakly supervised lda n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Named Entity Mining From Click-Through Data Using Weakly Supervised LDA PowerPoint Presentation
Download Presentation
Named Entity Mining From Click-Through Data Using Weakly Supervised LDA

Loading in 2 Seconds...

play fullscreen
1 / 14

Named Entity Mining From Click-Through Data Using Weakly Supervised LDA - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Named Entity Mining From Click-Through Data Using Weakly Supervised LDA. Gu Xu 1 , Shuang -Hong Yang 1,2 , Hang Li 1 1 Microsoft Research Asia, China 2 College of Computing, Georgia Tech, USA. Talk Outline. Named Entity Mining Exploiting click-through data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Named Entity Mining From Click-Through Data Using Weakly Supervised LDA' - nani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
named entity mining from click through data using weakly supervised lda

Named Entity Mining From Click-Through Data Using Weakly Supervised LDA

Gu Xu1, Shuang-Hong Yang1,2, Hang Li1

1Microsoft Research Asia, China

2College of Computing, Georgia Tech, USA

talk outline
Talk Outline
  • Named Entity Mining
    • Exploiting click-through data
    • Applying Latent Dirichlet Allocation
    • Developing a weakly supervised Learning approach
  • Weakly Supervised LDA
  • Experimental Results
  • Summary
named entity mining
Named Entity Mining
  • Named Entity Mining (NEM)
    • To mine the information of named entities of a class from a large amount of data.
    • Example: mine movie titles from a textual data collection
    • Applications: Web search, etc.
  • Three Challenges
    • Suitable data source for NEM
    • Ambiguity in classes of named entities
    • Supervision from human knowledge

Click-through Data

LDA (Topic Model)

Weakly Supervised Learning

click through data
Click-through Data
    • Query context
      • [movie]trailer, [game]cheats
    • Click context
      • imdb.com for movies, gamespot.com for games
  • Wisdom-of-crowds
    • Very Large-scale data and keep on growing
    • Frequent update with emerging named entities
  • New data source for NEM
    • Over 70% queries contain named entities.
    • Rich context for determining the classes of entities.

Click-Through Data

latent dirichlet allocation
Latent Dirichlet Allocation
  • Deal with ambiguity in classes of named entities
    • Classes of named entities are ambiguous.
      • Harry Potter: Book, Movie and Game
    • Topic models (LDA)

Harry Potter

harry potter trailer  imdb.com

harry potter dvd  movies.yahoo.com

harry potter cheats  cheats.ign.com

harry potter game  gamespots.com

Classes of Named Entity as Topics

Movie

Game

Click Context

Click Context

Query Context

Query Context

gamespots.com

cheats.ign.com

gamefaqs.com

# cheats

# walkthrough

# game

imdb.com

movies.yahoo.com

disney.go.com

# trailer

# dvd

# movie

weakly supervised learning
Weakly Supervised Learning
  • Supervise LDA training with examples
    • LDA is unsupervised model.
      • Topics in LDA are latent and not align with predefined semantic classes, like book, movie and game.
    • Human labels are inaccurate and partial.
      • Binary indicator rather than proportion
      • Labels only indicate that a named entity belongs to certain classes, but not exclude the possibility that it belongs to the other classes.
    • Weakly-supervised LDA
      • Supervise LDA training with partial labels
weakly supervised lda
Weakly Supervised LDA
  • Overview

………………..

Harry Potter

………………..

………………..

Seeds

harry potter book

http://www.amazon.com

harry potter cheats

http://cheats.ign.com

harry potter trailer

http://www.imdb.com

……………………………………..

Click-through Data

Create a virtual document for each seed and train WS-LDA

# book, http://www.amazon.com

# cheats, http://cheats.ign.com

# trailer, http://www.imdb.com

……………………………………..

Virtual Document

Contexts

Websites

Newly Discovered Entities

Find new named entities as well as their classes by using obtained query contexts and clicked websites

weakly supervised lda cont
Weakly Supervised LDA (cont.)
  • LDA with two types of virtual words
    • w1: Query context
    • w2: Click context

# book

# cheats

# trailer

……………

Virtual Document

http://www.amazon.com

http://cheats.ign.com

http://www.imdb.com

………………………………….

weakly supervised lda cont1
Weakly Supervised LDA (cont.)
  • Introduce Weak Supervision
    • LDA log likelihood + soft constraints
    • Soft Constraints

Soft Constraints

LDA Probability

Document Probability

on i-th Class

Document Binary Label

on i-th Class

experimental results
Experimental Results
  • Dataset
    • Seed named entities
      • About 1,000 seeds for each class, and 3767 unique named entities in total
    • Click-through data
      • 1.5 billion query-URL pairs, containing 240 million unique queries and 17 million unique URLs
experimental results cont
Experimental Results (cont.)
  • Top Contexts and websites

Movie Contexts

Game Contexts

Book Contexts

Music Contexts

Movie Websites

Game Websites

Book Websites

Music Websites

experimental results cont1
Experimental Results (cont.)
  • Accuracy of Mined Entities
summary
Summary
  • Proposed to use click-through data as a new data source for NEM
  • Employed topic model to deal with ambiguity in classes of named entities
  • Devised weakly supervised LDA for modeling click-through data
    • Two types of virtual words
    • Introduce weakly supervised learning into LDA
  • Experiments on large-scale data verified effectiveness of proposed approach