1 / 30

Overview of the TDT-2003 Evaluation and Results

Overview of the TDT-2003 Evaluation and Results. Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002. Outline. TDT Evaluation Overview TDT-2003 Evaluation Result Summaries New Event Detection Topic Detection Topic Tracking Link Detection Other Investigations.

miron
Download Presentation

Overview of the TDT-2003 Evaluation and Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of the TDT-2003Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002

  2. Outline • TDT Evaluation Overview • TDT-2003 Evaluation Result Summaries • New Event Detection • Topic Detection • Topic Tracking • Link Detection • Other Investigations

  3. 5 TDT Applications Story Segmentation Topic Tracking Topic Detection New Event Detection Link Detection TDT 101“Applications for organizing text” Terabytes of Unorganized data

  4. TDT’s Research Domain • Technology challenge • Develop applications that organize and locate relevant stories from a continuous feed of news stories • Research driven by evaluation tasks • Composite applications built from • Automatic Speech Recognition • Story Segmentation • Document Retrieval

  5. Definitions • An event is … • A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. • A topic is … • an event or activity, along with all directly related events and activities • A broadcast news story is … • a section of transcribed text with substantive information content and a unified topical focus

  6. TDT-02 Evaluation Corpus • TDT4 Corpus • TDT4 Corpus used for last year’s evaluation • October 1, 2000 to January 31, 2001 • 20 sources: • 8 English, 5 Arabic, 7 Mandarin Chinese • 90735 news, 7513 non-news stories • 80 annotated topics • 40 topics from 2002 • 40 new topics • See LDC’s presentation for more details

  7. What was new in 2002 • 40 new topics • Same number of “On-Topic” stories • 20, 10, 10 seed stories for Arabic, English and Mandarin respectively. • Much more Arabic “On-Topic” stories • Large influence on scores

  8. Participants • Carnegie Mellon Univ. (CMU) • Royal Melbourne Insititute of Technology (RMIT) • Stottler Henke Associates, Inc. (SHAI) • Univ. Massachusetts (UMass)

  9. TDT Evaluation Methodology • Evaluation tasks are cast as detection tasks: • YES there is a target, or NO there is not • Performance is measured in terms of detection cost: “a weighted sum of missed detection and false alarm probabilities” • CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget) • CMiss = 1 and CFA = 0.1 are preset costs • Ptarget = 0.02 is the a priori probability of a target

  10. TDT Evaluation Methodology(cont’d) • Detection Cost is normalized to generally lie between 0 and 1: • (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)} • When based on the YES/NO decisions, it is referred to as the actual decision cost • Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between PMiss and PFA • Makes use of likelihood scores attached to the YES/NO decisions • Minimum DET point is the best score a system could achieve with proper thresholds

  11. TDT: Experimental Control • Good research requires experimental controls • Conditions that affect performance in TDT • Newswire vs. Broadcast news • Manual vs. automatic transcription of Broadcast News • Manual vs. automatic story segmentation • Mono vs. multilingual language material • Topic training amounts and languages • Default, automatic English translation vs. native orthography • Decision deferral periods

  12. Outline • TDT Evaluation Overview • TDT-02 Evaluation Result Summaries • New Event Detection (NED) • Topic Detection • Topic Tracking • Link Detection • Other Investigations

  13. New Event Detection Task • System Goal: • To detect each new event discussing each topic for the first time • Evaluating “part” of a Topic Detection system,I.e., when to start a new cluster New Event on two topics = Topic 1 = Topic 2 Not First Stories of Events

  14. TDT-03 Primary NED ResultsSR=nwt+bnasr TE=eng,nat boundary DEF=10

  15. Primary NED Results2002 vs. 2003 Topics

  16. Topic Detection Task • System Goal: • To detect topics in terms of the (clusters of) stories that discuss them. • “Unsupervised” topic training • New topics must be detected as the incoming stories are processed • Input stories are then associated with one of the topics Topic 1 Story Stream Topic 2

  17. TDT-03 Topic Detection ResultsMultilingual Sources, English Translations, Reference Boundaries, 10 File Deferral Period Newswire+BNews Manual Trans Newswire+BNews ASR Not a primary system

  18. training data on-topic unknown unknown test data Topic Tracking Task • System Goal: • To detect stories that discuss the target topic, in multiple source streams • Supervised Training • Given Nt samples stories that discuss a given target topic • Testing • Find all subsequent stories that discuss the target topic

  19. TDT-03 Primary TRK ResultsNewswire+BNews Human Trans., Multilingual sources, English Translations, Reference Boundaries, 1 Training story, 0 Negative Training Stories Newswire+ BNews ASR, Nt=1 Nn=0 Newswire + BNews Human Trans., Nt=1 Nn=0 UMass01 CMU1 RMIT1

  20. Primary Topic Tracking Results2002 vs. 2003 Topics Minimum DET Cost

  21. Link Detection Task • System Goal: • To detect whether a pair of stories discuss the same topic.(Can be though of as a “primitive operator” to build a variety of applications) ?

  22. TDT-03 Primary LNK ResultsNewswire+BNews ASR, Multilingual Sources, English or Native Translations, Reference Boundaries, 10 File Deferral Period

  23. TDT-03 Primary LNK Results 2002 vs. 2003 Topics Topic Weighted, Minimum DET Cost UMass01 CMU1

  24. Outline • TDT Evaluation Overview • 2002 TDT Evaluation Result Summaries • New Event Detection (NED) • Topic Detection • Topic Tracking • Link Detection • Other Investigations

  25. Other Investigations • History of performance

  26. Evaluation Performance History * 0.1798 on 2002 Topics

  27. Evaluation Performance History * 0.1618 on 2002 Topics

  28. Evaluation Performance History * 0.3007 on 2002 Topics

  29. Evaluation Performance History * 0.4283 on 2002 Topics

  30. Summary and Issues to Discuss • TDT Evaluation Overview • 2003 TDT Evaluation Results • 2002 vs. 2003 topic sets are very different • 2003 set was weighted more towards Arabic • Dramatic increase in error rates with new topics; link detection, topic tracking and new event detection • Need to calculate the effect of topic set on topic detection • TDT 2004 • Release 2003 topics and TDT4 corpus? • Ensure 2004 evaluation will support Go/No Go decisions • What tasks will 2004 include?

More Related