Overview of the tdt 2003 evaluation and results
Download
1 / 30

Overview of the TDT-2003 Evaluation and Results - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Overview of the TDT-2003 Evaluation and Results. Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002. Outline. TDT Evaluation Overview TDT-2003 Evaluation Result Summaries New Event Detection Topic Detection Topic Tracking Link Detection Other Investigations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overview of the TDT-2003 Evaluation and Results' - miron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Overview of the tdt 2003 evaluation and results

Overview of the TDT-2003Evaluation and Results

Jonathan Fiscus

NIST

Gaithersburg, Maryland

November 17-18, 2002


Outline
Outline

  • TDT Evaluation Overview

  • TDT-2003 Evaluation Result Summaries

    • New Event Detection

    • Topic Detection

    • Topic Tracking

    • Link Detection

  • Other Investigations


Tdt 101 applications for organizing text

5 TDT Applications

Story Segmentation

Topic Tracking

Topic Detection

New Event Detection

Link Detection

TDT 101“Applications for organizing text”

Terabytes of Unorganized data


Tdt s research domain
TDT’s Research Domain

  • Technology challenge

    • Develop applications that organize and locate relevant stories from a continuous feed of news stories

  • Research driven by evaluation tasks

  • Composite applications built from

    • Automatic Speech Recognition

    • Story Segmentation

    • Document Retrieval


Definitions
Definitions

  • An event is …

    • A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences.

  • A topic is …

    • an event or activity, along with all directly related events and activities

  • A broadcast news story is …

    • a section of transcribed text with substantive information content and a unified topical focus


Tdt 02 evaluation corpus
TDT-02 Evaluation Corpus

  • TDT4 Corpus

    • TDT4 Corpus used for last year’s evaluation

    • October 1, 2000 to January 31, 2001

    • 20 sources:

      • 8 English, 5 Arabic, 7 Mandarin Chinese

    • 90735 news, 7513 non-news stories

    • 80 annotated topics

      • 40 topics from 2002

      • 40 new topics

    • See LDC’s presentation for more details


What was new in 2002
What was new in 2002

  • 40 new topics

    • Same number of “On-Topic” stories

    • 20, 10, 10 seed stories for Arabic, English and Mandarin respectively.

    • Much more Arabic “On-Topic” stories

    • Large influence on scores


Participants
Participants

  • Carnegie Mellon Univ. (CMU)

  • Royal Melbourne Insititute of Technology (RMIT)

  • Stottler Henke Associates, Inc. (SHAI)

  • Univ. Massachusetts (UMass)


Tdt evaluation methodology
TDT Evaluation Methodology

  • Evaluation tasks are cast as detection tasks:

    • YES there is a target, or NO there is not

  • Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities”

    • CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget)

    • CMiss = 1 and CFA = 0.1 are preset costs

    • Ptarget = 0.02 is the a priori probability of a target


Tdt evaluation methodology cont d
TDT Evaluation Methodology(cont’d)

  • Detection Cost is normalized to generally lie between 0 and 1:

    • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)}

    • When based on the YES/NO decisions, it is referred to as the actual decision cost

  • Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between PMiss and PFA

    • Makes use of likelihood scores attached to the YES/NO decisions

    • Minimum DET point is the best score a system could achieve with proper thresholds


Tdt experimental control
TDT: Experimental Control

  • Good research requires experimental controls

  • Conditions that affect performance in TDT

    • Newswire vs. Broadcast news

    • Manual vs. automatic transcription of Broadcast News

    • Manual vs. automatic story segmentation

    • Mono vs. multilingual language material

    • Topic training amounts and languages

    • Default, automatic English translation vs. native orthography

    • Decision deferral periods


Outline1
Outline

  • TDT Evaluation Overview

  • TDT-02 Evaluation Result Summaries

    • New Event Detection (NED)

    • Topic Detection

    • Topic Tracking

    • Link Detection

  • Other Investigations


New event detection task
New Event Detection Task

  • System Goal:

    • To detect each new event discussing each topic for the first time

      • Evaluating “part” of a Topic Detection system,I.e., when to start a new cluster

New Event on two topics

= Topic 1

= Topic 2

Not First Stories of Events


Tdt 03 primary ned results sr nwt bnasr te eng nat boundary def 10
TDT-03 Primary NED ResultsSR=nwt+bnasr TE=eng,nat boundary DEF=10


Primary ned results 2002 vs 2003 topics
Primary NED Results2002 vs. 2003 Topics


Topic detection task
Topic Detection Task

  • System Goal:

    • To detect topics in terms of the (clusters of) stories that discuss them.

      • “Unsupervised” topic training

      • New topics must be detected as the incoming stories are processed

      • Input stories are then associated with one of the topics

Topic 1

Story Stream

Topic 2


TDT-03 Topic Detection ResultsMultilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period

Newswire+BNews Manual Trans

Newswire+BNews ASR

Not a primary system


Topic tracking task

training data

on-topic

unknown

unknown

test data

Topic Tracking Task

  • System Goal:

    • To detect stories that discuss the target topic, in multiple source streams

      • Supervised Training

        • Given Nt samples stories that discuss a given target topic

      • Testing

        • Find all subsequent stories that discuss the target topic


TDT-03 Primary TRK ResultsNewswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories

Newswire+ BNews ASR, Nt=1 Nn=0

Newswire + BNews Human Trans., Nt=1 Nn=0

UMass01

CMU1

RMIT1


Primary topic tracking results 2002 vs 2003 topics
Primary Topic Tracking Results2002 vs. 2003 Topics

Minimum DET Cost


Link detection task
Link Detection Task

  • System Goal:

    • To detect whether a pair of stories discuss the same topic.(Can be though of as a “primitive operator” to build a variety of applications)

?


TDT-03 Primary LNK ResultsNewswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period


Tdt 03 primary lnk results 2002 vs 2003 topics
TDT-03 Primary LNK Results 2002 vs. 2003 Topics

Topic Weighted, Minimum DET Cost

UMass01

CMU1


Outline2
Outline

  • TDT Evaluation Overview

  • 2002 TDT Evaluation Result Summaries

    • New Event Detection (NED)

    • Topic Detection

    • Topic Tracking

    • Link Detection

  • Other Investigations


Other investigations
Other Investigations

  • History of performance


Evaluation performance history
Evaluation Performance History

* 0.1798 on 2002 Topics


Evaluation performance history1
Evaluation Performance History

* 0.1618 on 2002 Topics


Evaluation performance history2
Evaluation Performance History

* 0.3007 on 2002 Topics


Evaluation performance history3
Evaluation Performance History

* 0.4283 on 2002 Topics


Summary and issues to discuss
Summary and Issues to Discuss

  • TDT Evaluation Overview

  • 2003 TDT Evaluation Results

  • 2002 vs. 2003 topic sets are very different

    • 2003 set was weighted more towards Arabic

    • Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection

    • Need to calculate the effect of topic set on topic detection

  • TDT 2004

    • Release 2003 topics and TDT4 corpus?

    • Ensure 2004 evaluation will support Go/No Go decisions

    • What tasks will 2004 include?


ad