Overview of the TDT-2003 Evaluation and Results

Overview of the TDT-2003Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002

Outline • TDT Evaluation Overview • TDT-2003 Evaluation Result Summaries • New Event Detection • Topic Detection • Topic Tracking • Link Detection • Other Investigations

5 TDT Applications Story Segmentation Topic Tracking Topic Detection New Event Detection Link Detection TDT 101“Applications for organizing text” Terabytes of Unorganized data

TDT’s Research Domain • Technology challenge • Develop applications that organize and locate relevant stories from a continuous feed of news stories • Research driven by evaluation tasks • Composite applications built from • Automatic Speech Recognition • Story Segmentation • Document Retrieval

Definitions • An event is … • A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. • A topic is … • an event or activity, along with all directly related events and activities • A broadcast news story is … • a section of transcribed text with substantive information content and a unified topical focus

TDT-02 Evaluation Corpus • TDT4 Corpus • TDT4 Corpus used for last year’s evaluation • October 1, 2000 to January 31, 2001 • 20 sources: • 8 English, 5 Arabic, 7 Mandarin Chinese • 90735 news, 7513 non-news stories • 80 annotated topics • 40 topics from 2002 • 40 new topics • See LDC’s presentation for more details

What was new in 2002 • 40 new topics • Same number of “On-Topic” stories • 20, 10, 10 seed stories for Arabic, English and Mandarin respectively. • Much more Arabic “On-Topic” stories • Large influence on scores

Participants • Carnegie Mellon Univ. (CMU) • Royal Melbourne Insititute of Technology (RMIT) • Stottler Henke Associates, Inc. (SHAI) • Univ. Massachusetts (UMass)

TDT Evaluation Methodology • Evaluation tasks are cast as detection tasks: • YES there is a target, or NO there is not • Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities” • CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget) • CMiss = 1 and CFA = 0.1 are preset costs • Ptarget = 0.02 is the a priori probability of a target

TDT Evaluation Methodology(cont’d) • Detection Cost is normalized to generally lie between 0 and 1: • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)} • When based on the YES/NO decisions, it is referred to as the actual decision cost • Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between PMiss and PFA • Makes use of likelihood scores attached to the YES/NO decisions • Minimum DET point is the best score a system could achieve with proper thresholds

TDT: Experimental Control • Good research requires experimental controls • Conditions that affect performance in TDT • Newswire vs. Broadcast news • Manual vs. automatic transcription of Broadcast News • Manual vs. automatic story segmentation • Mono vs. multilingual language material • Topic training amounts and languages • Default, automatic English translation vs. native orthography • Decision deferral periods

Outline • TDT Evaluation Overview • TDT-02 Evaluation Result Summaries • New Event Detection (NED) • Topic Detection • Topic Tracking • Link Detection • Other Investigations

New Event Detection Task • System Goal: • To detect each new event discussing each topic for the first time • Evaluating “part” of a Topic Detection system,I.e., when to start a new cluster New Event on two topics = Topic 1 = Topic 2 Not First Stories of Events

TDT-03 Primary NED ResultsSR=nwt+bnasr TE=eng,nat boundary DEF=10

Primary NED Results2002 vs. 2003 Topics

Topic Detection Task • System Goal: • To detect topics in terms of the (clusters of) stories that discuss them. • “Unsupervised” topic training • New topics must be detected as the incoming stories are processed • Input stories are then associated with one of the topics Topic 1 Story Stream Topic 2

TDT-03 Topic Detection ResultsMultilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period Newswire+BNews Manual Trans Newswire+BNews ASR Not a primary system

training data on-topic unknown unknown test data Topic Tracking Task • System Goal: • To detect stories that discuss the target topic, in multiple source streams • Supervised Training • Given Nt samples stories that discuss a given target topic • Testing • Find all subsequent stories that discuss the target topic

TDT-03 Primary TRK ResultsNewswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories Newswire+ BNews ASR, Nt=1 Nn=0 Newswire + BNews Human Trans., Nt=1 Nn=0 UMass01 CMU1 RMIT1

Primary Topic Tracking Results2002 vs. 2003 Topics Minimum DET Cost

Link Detection Task • System Goal: • To detect whether a pair of stories discuss the same topic.(Can be though of as a “primitive operator” to build a variety of applications) ?

TDT-03 Primary LNK ResultsNewswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period

TDT-03 Primary LNK Results 2002 vs. 2003 Topics Topic Weighted, Minimum DET Cost UMass01 CMU1

Outline • TDT Evaluation Overview • 2002 TDT Evaluation Result Summaries • New Event Detection (NED) • Topic Detection • Topic Tracking • Link Detection • Other Investigations

Other Investigations • History of performance

Evaluation Performance History * 0.1798 on 2002 Topics

Summary and Issues to Discuss • TDT Evaluation Overview • 2003 TDT Evaluation Results • 2002 vs. 2003 topic sets are very different • 2003 set was weighted more towards Arabic • Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection • Need to calculate the effect of topic set on topic detection • TDT 2004 • Release 2003 topics and TDT4 corpus? • Ensure 2004 evaluation will support Go/No Go decisions • What tasks will 2004 include?

Overview of the TDT-2003 Evaluation and Results