Entity categorization over large document collections
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Entity Categorization Over Large Document Collections PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Entity Categorization Over Large Document Collections. Presenter : Shu-Ya Li Authors : Venkatesh Ganti , Arnd Christian König , Rares Vernica. KDD, 2008 . Outline. Motivation Objective Methodology Experiments and Results Conclusion Comments. Motivation. Prior approaches.

Download Presentation

Entity Categorization Over Large Document Collections

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Entity categorization over large document collections

Entity Categorization Over Large Document Collections

Presenter : Shu-Ya Li

Authors : VenkateshGanti, Arnd Christian König, RaresVernica

KDD, 2008


Outline

Outline

  • Motivation

  • Objective

  • Methodology

  • Experiments and Results

  • Conclusion

  • Comments


Motivation

Motivation

  • Prior approaches

  • But…

Entity

  • companies

  • [Entity]

  • present results

  • … Donald Knuth

  • works in research …

is-a-researcher (Donald_Knuth)

is-a-researcher (Entity)?

Context

[Entity] publish

  • newspapers

  • Going from unstructured data to structured data

  • Extracting entities (people, movies) from documents and identifying the categories (painter, writer, actor)

  • Most prior approaches (unary relation extraction)

    • only analyzed the local document context within which entities occur.


Objectives

Objectives

}

“…[Entity]’s

paper…”

[Entity], ‘paper’

[Entity], ‘talk’

[Entity], ‘published’

([Entity],

is-a-researcher)

“…[Entity]

gave a talk…”

“…[Entity]

published…”

Multi-Feature Relation Extractor

  • In this paper, we improve the accuracy of entity categorization by

    • considering an entity’s context across multiple documents

    • exploiting existing large lists of related entities


Methodology

Methodology

… Julia Roberts starred in Pretty Woman in 1988 …

(Yao_Ming, is-a-athlete)

Actor-List

Feature: Co-occurrence between entityand

actor name in context.

Ex: Extraction of is-a-movie relation

Alan Alba

Richard Gere

Julia Roberts

actor name

Entity

(Pretty Woman , is-a-movie)


Methodology processing large document collections

Methodology - Processing large Document Collections

Classification

Classifiers C

Aggregation

List-Member

Extraction

Context Feature

Extraction

Entity-List Pairs

.retaining the most important list members

Verification

(Delete false Positives)

Entity-Feature

Pairs

a known set of directors (as ε)

a list of actors (as )

3.2 million documents from Wiki

Entity – Candidate

Context Pairs

}

E1: Pretty Woman

E2: Mystic Pizza

E3:Doubt

E4: Duplicity

E5:Enchanted

Amy Adams

ElizabethReaser

JuliaRoberts

TaraReid

JudyReyes

Actors

list

n-gram

Extraction

Rule-based

Extraction

List-Member Detection

wiki

Co-Occurrence

List corpus L

Document Corpus D

Synopsis of L


Methodology processing large document collections1

Methodology - Processing large Document Collections

Classification

Classifiers C

… Julia Roberts starred in Pretty Woman in 1988 …

Aggregation

List-Member

Extraction

Context Feature

Extraction

.Scanning D once

{Julia, Roberts, starred, Pretty, Woman,

Julia Roberts, Pretty Woman, … }

Entity-List Pairs

1. the large amount of data written

2. not expected to contain an entity is a member of a list

Verification

(Delete false Positives)

.Our Approach – Bloom Filter

{starred, Pretty, Woman, Pretty Woman, … }

Entity-Feature

Pairs

Entity – Candidate

Context Pairs

(Julia Robert, starred)

(Julia Robert, Pretty)

(Julia Robert, Woman)

(Julia Robert, Pretty Woman)

Verification

n-gram

Extraction

Rule-based

Extraction

List-Member Detection

Co-Occurrence

List corpus L

Document Corpus D

Synopsis of L


Experiments

Experiments


Conclusion

Conclusion

Studied the effect of aggregate context in relation extraction.

Proposed efficient processing techniques for large text corpora.

Both aggregate and co-occurrence features provide significant increase in extraction accuracy compared to single-context classifiers.


Comments

Comments

  • Advantage

    • The first half of this paper is clear.

  • Drawback

    • But the first half of this paper isn’t clear.

  • Application

    • Entity categorization


  • Login