New event detection at umass amherst
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

New Event Detection at UMass Amherst PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

New Event Detection at UMass Amherst. Giridhar Kumaran and James Allan. Preprocessing. Lemur Toolkit for tokenization, stopping, k-stemming http://www-2.cs.cmu.edu/~lemur/ BBN Identifinder™ for extracting named entities. Systems fielded. Submitted four systems

Download Presentation

New Event Detection at UMass Amherst

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New event detection at umass amherst

New Event Detection at UMass Amherst

Giridhar Kumaran and

James Allan


Preprocessing

Preprocessing

  • Lemur Toolkit for tokenization, stopping, k-stemming

    • http://www-2.cs.cmu.edu/~lemur/

  • BBN Identifinder™ for extracting named entities

CIIR, UMass Amherst


Systems fielded

Systems fielded

  • Submitted four systems

  • Didn’t include last year’s system

    • Classification according to LDC categories and term – pruning

    • Didn’t work on exclusively NW story corpus

CIIR, UMass Amherst


Primary system umass1

Primary system – UMass1

  • Utility of named entities acknowledged

  • Failure analysis indicates

    • Large number of old stories have low confidence score (false alarms)

    • Conflict with new story scores

    • Reasons

      • Stories on multiple topics

      • Diffuse topics

      • Varying document lengths

CIIR, UMass Amherst


Primary system umass11

Primary system – UMass1

  • Focus

    • Identify old stories better – affects cost

  • Clue

    • Most old stories get low confidence scores as topics linked by

      • only named entities (large number)

      • only non-named entities (few)

CIIR, UMass Amherst


Primary system umass12

Primary system – UMass1

  • Approach

    • Look at the set of closest matching stories

    • If consistently high named entity or non-named entity match modify confidence score

CIIR, UMass Amherst


Primary system umass13

Primary system – UMass1

  • Procedure

    • Double original confidence score if less than a threshold

    • Gradually reduce score towards original score if set of closest stories match neither named entities nor non-named entities

CIIR, UMass Amherst


Umass1 examples from tdt3

UMass1 – Examples from TDT3

  • Russian Financial Crisis - Old Story

CIIR, UMass Amherst


Umass1 examples from tdt31

UMass1 – Examples from TDT3

  • Russian Financial Crisis - Old Story  

CIIR, UMass Amherst


Umass1 examples from tdt32

UMass1 – Examples from TDT3

Threshold = 0.1

  • Russian Financial Crisis - Old Story

CIIR, UMass Amherst


Umass1 examples from tdt33

UMass1 – Examples from TDT3

Threshold = 0.1

  • Russian Financial Crisis - Old Story

CIIR, UMass Amherst


Umass1 examples from tdt34

UMass1 – Examples from TDT3

Threshold = 0.1

  • Russian Financial Crisis - Old Story

CIIR, UMass Amherst


Umass1 examples from tdt35

UMass1 – Examples from TDT3

  • Thai Airbus Crash   - New Story

CIIR, UMass Amherst


Umass1 on tdt3

UMass1 on TDT3

CIIR, UMass Amherst


Umass1 on tdt31

UMass1 on TDT3

CIIR, UMass Amherst


Umass2

UMass2

  • Basic vector space model system

  • Compare with all preceding stories

  • Return highest cosine match

CIIR, UMass Amherst


Umass3

UMass3

  • Same model as UMass2

  • TDT5 – Very large collection

  • Practical system

  • Compare with a maximum of 25000 stories with highest coordination match

    • Faster

CIIR, UMass Amherst


Umass4

UMass4

  • Similar to UMass1

  • Rationale is the same

  • Consider top five matches

  • Use different formula for modifying confidence score

CIIR, UMass Amherst


Performance summary

Performance Summary

CIIR, UMass Amherst


Summary

Summary

  • Basic vector space model did the best

  • Restricting number of stories to be compared with

    • Improved system speed

    • Didn’t improve performance

  • Primary system did extremely well on training data, but failed on TDT5

CIIR, UMass Amherst


  • Login