overview of the tdt 2003 evaluation and results
Download
Skip this Video
Download Presentation
Overview of the TDT-2003 Evaluation and Results

Loading in 2 Seconds...

play fullscreen
1 / 30

Overview of the TDT-2003 Evaluation and Results - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Overview of the TDT-2003 Evaluation and Results. Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002. Outline. TDT Evaluation Overview TDT-2003 Evaluation Result Summaries New Event Detection Topic Detection Topic Tracking Link Detection Other Investigations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Overview of the TDT-2003 Evaluation and Results' - miron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview of the tdt 2003 evaluation and results

Overview of the TDT-2003Evaluation and Results

Jonathan Fiscus

NIST

Gaithersburg, Maryland

November 17-18, 2002

outline
Outline
  • TDT Evaluation Overview
  • TDT-2003 Evaluation Result Summaries
    • New Event Detection
    • Topic Detection
    • Topic Tracking
    • Link Detection
  • Other Investigations
tdt 101 applications for organizing text
5 TDT Applications

Story Segmentation

Topic Tracking

Topic Detection

New Event Detection

Link Detection

TDT 101“Applications for organizing text”

Terabytes of Unorganized data

tdt s research domain
TDT’s Research Domain
  • Technology challenge
    • Develop applications that organize and locate relevant stories from a continuous feed of news stories
  • Research driven by evaluation tasks
  • Composite applications built from
    • Automatic Speech Recognition
    • Story Segmentation
    • Document Retrieval
definitions
Definitions
  • An event is …
    • A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences.
  • A topic is …
    • an event or activity, along with all directly related events and activities
  • A broadcast news story is …
    • a section of transcribed text with substantive information content and a unified topical focus
tdt 02 evaluation corpus
TDT-02 Evaluation Corpus
  • TDT4 Corpus
    • TDT4 Corpus used for last year’s evaluation
    • October 1, 2000 to January 31, 2001
    • 20 sources:
      • 8 English, 5 Arabic, 7 Mandarin Chinese
    • 90735 news, 7513 non-news stories
    • 80 annotated topics
      • 40 topics from 2002
      • 40 new topics
    • See LDC’s presentation for more details
what was new in 2002
What was new in 2002
  • 40 new topics
    • Same number of “On-Topic” stories
    • 20, 10, 10 seed stories for Arabic, English and Mandarin respectively.
    • Much more Arabic “On-Topic” stories
    • Large influence on scores
participants
Participants
  • Carnegie Mellon Univ. (CMU)
  • Royal Melbourne Insititute of Technology (RMIT)
  • Stottler Henke Associates, Inc. (SHAI)
  • Univ. Massachusetts (UMass)
tdt evaluation methodology
TDT Evaluation Methodology
  • Evaluation tasks are cast as detection tasks:
    • YES there is a target, or NO there is not
  • Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities”
    • CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget)
    • CMiss = 1 and CFA = 0.1 are preset costs
    • Ptarget = 0.02 is the a priori probability of a target
tdt evaluation methodology cont d
TDT Evaluation Methodology(cont’d)
  • Detection Cost is normalized to generally lie between 0 and 1:
    • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)}
    • When based on the YES/NO decisions, it is referred to as the actual decision cost
  • Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between PMiss and PFA
    • Makes use of likelihood scores attached to the YES/NO decisions
    • Minimum DET point is the best score a system could achieve with proper thresholds
tdt experimental control
TDT: Experimental Control
  • Good research requires experimental controls
  • Conditions that affect performance in TDT
    • Newswire vs. Broadcast news
    • Manual vs. automatic transcription of Broadcast News
    • Manual vs. automatic story segmentation
    • Mono vs. multilingual language material
    • Topic training amounts and languages
    • Default, automatic English translation vs. native orthography
    • Decision deferral periods
outline1
Outline
  • TDT Evaluation Overview
  • TDT-02 Evaluation Result Summaries
    • New Event Detection (NED)
    • Topic Detection
    • Topic Tracking
    • Link Detection
  • Other Investigations
new event detection task
New Event Detection Task
  • System Goal:
    • To detect each new event discussing each topic for the first time
      • Evaluating “part” of a Topic Detection system,I.e., when to start a new cluster

New Event on two topics

= Topic 1

= Topic 2

Not First Stories of Events

topic detection task
Topic Detection Task
  • System Goal:
    • To detect topics in terms of the (clusters of) stories that discuss them.
      • “Unsupervised” topic training
      • New topics must be detected as the incoming stories are processed
      • Input stories are then associated with one of the topics

Topic 1

Story Stream

Topic 2

slide17
TDT-03 Topic Detection ResultsMultilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period

Newswire+BNews Manual Trans

Newswire+BNews ASR

Not a primary system

topic tracking task

training data

on-topic

unknown

unknown

test data

Topic Tracking Task
  • System Goal:
    • To detect stories that discuss the target topic, in multiple source streams
      • Supervised Training
        • Given Nt samples stories that discuss a given target topic
      • Testing
        • Find all subsequent stories that discuss the target topic
slide19

TDT-03 Primary TRK ResultsNewswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories

Newswire+ BNews ASR, Nt=1 Nn=0

Newswire + BNews Human Trans., Nt=1 Nn=0

UMass01

CMU1

RMIT1

link detection task
Link Detection Task
  • System Goal:
    • To detect whether a pair of stories discuss the same topic.(Can be though of as a “primitive operator” to build a variety of applications)

?

slide22

TDT-03 Primary LNK ResultsNewswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period

tdt 03 primary lnk results 2002 vs 2003 topics
TDT-03 Primary LNK Results 2002 vs. 2003 Topics

Topic Weighted, Minimum DET Cost

UMass01

CMU1

outline2
Outline
  • TDT Evaluation Overview
  • 2002 TDT Evaluation Result Summaries
    • New Event Detection (NED)
    • Topic Detection
    • Topic Tracking
    • Link Detection
  • Other Investigations
other investigations
Other Investigations
  • History of performance
evaluation performance history
Evaluation Performance History

* 0.1798 on 2002 Topics

evaluation performance history1
Evaluation Performance History

* 0.1618 on 2002 Topics

evaluation performance history2
Evaluation Performance History

* 0.3007 on 2002 Topics

evaluation performance history3
Evaluation Performance History

* 0.4283 on 2002 Topics

summary and issues to discuss
Summary and Issues to Discuss
  • TDT Evaluation Overview
  • 2003 TDT Evaluation Results
  • 2002 vs. 2003 topic sets are very different
    • 2003 set was weighted more towards Arabic
    • Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection
    • Need to calculate the effect of topic set on topic detection
  • TDT 2004
    • Release 2003 topics and TDT4 corpus?
    • Ensure 2004 evaluation will support Go/No Go decisions
    • What tasks will 2004 include?
ad