Knowledge base acceleration trec 2012
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

Knowledge Base Acceleration TREC 2012 PowerPoint PPT Presentation


  • 40 Views
  • Uploaded on
  • Presentation posted in: General

November 17, 2011. Knowledge Base Acceleration TREC 2012. John R. Frank [email protected] Ian Soboroff [email protected] November 17, 2011. Number of People C reating Representations of Knowledge. WWW. Expert Systems. Machine Learning. Transistor. Telegraph. Gutenberg Bible.

Download Presentation

Knowledge Base Acceleration TREC 2012

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Knowledge base acceleration trec 2012

November 17, 2011

Knowledge Base AccelerationTREC 2012

John R. Frank

[email protected]

Ian Soboroff

[email protected]


Number of people c reating representations of knowledge

November 17, 2011

Number of People Creating Representations of Knowledge

WWW

Expert Systems

Machine Learning

Transistor

Telegraph

Gutenberg Bible

Library at

Alexandria

maps

300BC 140AD 1828 1900 1950 1970 1984 1994 2001 now


Accelerate

November 17, 2011

Accelerate?

rate of assimilation << rate of new info

# editors << # active entities

(definition of a “large” KB)


Random choices

November 17, 2011

Random Choices

How many days must a news article wait before eventually being cited in Wikipedia?

Time lag in days between publication and eventual citation in Wikipedia of a sample of 50,000 web pages (mostly news) cited in Category:LivingPeople.

num pages

days


Knowledge base acceleration trec 2012

November 17, 2011

Even for entities mentioned frequently in the news, there is no correlation between mention and edit frequencies.

Human analysts follow their personal interests, hunches, hobbies, habits.

True for all large knowledge bases.

mean edit interval (days)

mean mention interval (hours)


Knowledge base acceleration trec 2012

November 17, 2011

Methods in the Madness

mean edit interval (days)

mean mention interval (hours)


First year task basic ccr cumulative citation recommendation

November 17, 2011

First Year Task: Basic CCR“Cumulative Citation Recommendation”

  • Steps:

  • Initialize with a single KB node:

    • Freebase & Wikipedia content

    • WP edits from Aug-Nov 2011

  • Begin iterating over news stream

  • For each article, output a “pertinence” confidence score between 0 and 1.

    • Aug-Sep: train on labels

    • Oct-Mar: labels hidden

  • Your system generates labels and excerpts Oct-Mar

  • Content Stream

  • ~500,000 English de-duplicated articles per day

  • Half news, half blogs & forums

Your System


Challenging example

November 17, 2011

Challenging Example

1

  • Gavin Rain

  • South African

  • Painter

Venice Biennale

(art show)

will have an exhibit in

explicit mentions in news

No explicit

co-occurrence

…inference?

explicit mentions

2

3

Controversy about South African Pavilion at Venice Biennale


Knowledge base acceleration trec 2012

November 17, 2011

Annotations

(guidelines under development)


Knowledge base acceleration trec 2012

November 17, 2011

Future Tasks

Detect changes to infobox slot values

Detect new links between entities

Resurrection of old articles (archive mining)

Identify emerging entities (not yet in KB)

… many more ideas …


Knowledge base acceleration trec 2012

November 17, 2011

Timeline

  • December 2011

  • Call for Participation

  • Test data

  • Three nodes

  • Four months

  • March/April 2012

  • Full data:

  • ~50 nodes

  • Eight months

Submit your runs for eval

Nov Dec Jan Feb Mar Apr Jun Jul Aug Sep Oct Nov

now

Monthly Skype Calls

and discussion in

Google Groups

TREC

2012

  • Summer meet up

  • At a convenient conference?

  • January 2012

  • Tentative:

  • Eval results for baseline system

  • Data for more nodes


Optional output values not judged in 2012

November 17, 2011

Optional Output Values(not judged in 2012)

  • Novelty Group ID:

    • Output is a list of docIDs:

      • Output an empty list means this doc has new information

      • Output one or more previous docIDs means that all of this document’s pertinent info was already revealed in earlier docs

    • Would help us plan future tasks about novelty

  • Links to other nodes:

    • Output a list of other KB nodes that this content item associates to the target node

    • Would help us plan future tasks about link detection

  • Infobox slot name=value

    • Output a list of two-tuples of strings

      • [(slot name, slot value),…]

    • Would help us plan future tasks about detecting infobox changes


  • Login