slide1
Download
Skip this Video
Download Presentation
Topic Detection and Tracking

Loading in 2 Seconds...

play fullscreen
1 / 24

ssification Society - PowerPoint PPT Presentation


  • 324 Views
  • Uploaded on

Lexical Chains for Topic Detection and Tracking British Classification Society Feb 23rd 2001 Joe Carthy & Nicola Stokes University College Dublin [email protected] [email protected] http://www.cs.ucd.ie/staff/jcarthy Tel. +353 1 706 2481 or 706 2469 Fax. +353 1 269 7262.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ssification Society' - Gideon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Lexical Chains for Topic Detection and TrackingBritish Classification SocietyFeb 23rd 2001Joe Carthy & Nicola StokesUniversity College [email protected]@ucd.iehttp://www.cs.ucd.ie/staff/jcarthyTel. +353 1 706 2481 or 706 2469Fax. +353 1 269 7262

topic detection and tracking
Topic Detection and Tracking
  • Topic Detection and Tracking (TDT)
    • DARPA funded TDT project with UMass, CMU and Dragon Systems
    • Domain is all broadcast news: written and spoken
  • TDT includes:
    • First story Detection
    • Event Tracking
    • Segmentation
  • Applications
    • digital news editors
    • media analysts
    • equity traders
topic tracking and detection
Topic Tracking and Detection
  • Tracking may be defined as
    • Take a corpus of news stories
    • Given 1 (or 2,4,8,16) sample stories about anevent
    • Find all subsequent stories in the corpusabout that event
  • Detection: Is this a new story ?
topic tracking and detection4
Topic Tracking and Detection
  • Event is defined by a list of stories that discuss the event e.g. “Kobe earthquake”is defined by first story that describes this event
ucd tdt architecture

SERVER

Lexical Chainer

Event Tracker

Event Detector

UCD TDT ARCHITECTURE
topic detection and tracking6

DATA STREAM

DATE: 02:36

TITLE: O.J. SIMPSON

Bought Knife,

Murder Hearing told

CARLOS

THE JACKEL

NYC SUBWAY

BOMBINGS

O.J. SIMPSON

MURDER TRIAL

Previous Stories

Topic Detection and Tracking
benchmark systems
Benchmark Systems
  • Implemented Benchmark systems using conventional IR techniques:
    • Stemmed keywords
    • Stopword removal(Porter)
    • Term weighting (Robertson, Sparck Jones)
lexical chaining
Lexical Chaining
  • Lexical chains - textual cohesion (Halliday & Hasan)
  • Cohesion: text makes sense as a whole
  • Cohesion occurs where the interpretation of one item is dependent of that of another item in the text. It is this dependency that gives rise to cohesion.
lexical chaining9
Lexical Chaining
  • Where the cohesive elements occur over a number of sentences a cohesive chain is formed.
  • For example, the sentences:John had mud pie for dessert. Mud pie is made of chocolate. John really enjoyed it.
  • give rise to the lexical chain:{mud pie, dessert, mud pie, chocolate, it}
  • Lexical cohesion is as the name suggests lexical - it involves the selection of a lexical item that is in some way related to one occurring previously.
lexical chaining10
Lexical Chaining
  • Reiteration is a form of lexical cohesion which involves the repetition of a lexical item. This may involve simple repetition of the word but also includes the use of a synonym, near-synonym or superordinate. For example in the sentences John bought a Jag. He loves the car. a superordinate, car, refers back to a subordinate Jag. The part-whole relationship is also an example of lexical cohesion e.g. airplane and wing.
  • A lexical chain is a sequence of related words in the text, spanning short or long distances.
lexical chaining11
Lexical Chaining
  • A chain is independent of the grammatical structure of the text and in effect it is a list of words that captures a portion of the cohesive structure of the text.
  • A lexical chain can provide a context for the resolution of an ambiguous term and enable identification of the concept the term represents i.e. word sense disambiguation
  • Morris and Hirst were the first researchers to suggest the use of lexical chains to determine the structure of texts.
lexical chaining12
Lexical Chaining
  • By identifying the lexical chains in a news story we hope to identify the focus of a news story. This can then be used in tracking and detection.
  • It is important to realise that determining lexical chains is not a sophisticated natural language analysis process.
  • Other Applications of Lexical Chaining
    • Hypertext links: Green
    • Summarisation: Barzilay
    • Segmentation: Okumura and Honda
    • IR: Stairmand, Ellman, Mochizuki
    • Malapropism detection: St. Onge
    • Multimedia indexing: Kazman,Al-Halimi
chain generation
Chain Generation
  • In order to construct lexical chains we must be able to identify relationships between terms.
  • This is made possible by the use of WordNet
  • WordNet is a computational lexicon which was developed at Princeton University.
  • In WordNet, synonym sets (synsets) are used to represent concepts where asynonymset corresponds to a concept and consists of all those terms that may be used to refer to that concept.
chain generation14
Chain Generation
  • For example, take the concept airplane it is represented by the synset {airplane, aeroplane, plane}.
  • A WordNet synset has a numerical identifier such as 02054514.
  • Links between synsets in WordNet represent conceptual relations such as synonymy, hyponymy, meronymy (part-of) etc.
  • The synset identifier can be used to represent the concept referred to in the synset, for indexing and lexical chaining purposes.
word sense disambiguation

Exhaust

32748

Automobile

057643

Railway carriage

324932

Train 3984

Word Sense Disambiguation

CAR

1st Term

Has a

Part of

Termi

EXHAUST

Car_exhaust

32748

Tire_out, Fatigue

374222

chain generation16
Chain Generation
  • Chaining procedure for a story:
    • Take the ith term in the story and generate the set Neighbouri of its related synsets
    • For each other term, if it is a member of the set Neighbourithenadd it to the lexical chain for termi.
    • If the lexical chain contains 3 or more elements then store the chain in a chain index file
    • Repeat above for all terms in the story.
slide17
Computing Chain_Sim(Trackseti, Storyj )
    • Overlap Coefficient which may be defined as follows, for two lexical chains c1 and c2:
    • Overlap Coefficient =
evaluation metrics
Evaluation Metrics
  • System returns a set of S documents :
    • a = # in S discussing new events
    • b = # in S not discussing new events
    • c = # in S\' discussing new events
    • d = # in S\' not discussing new events
  • Recall = a / (a+c)
  • Precision = a / (a+b)
  • Miss Rate = c / (a+c) = 1 - R
  • False Alarm Rate = b / (b+d) = Fallout
analysis of results
Analysis of results
  • Expected trade-off between precision and recall
  • Small number of stories are sufficient to construct a tracking query
  • Performance in line with other TDT researchers
  • Lexical Chains - Improvement not significant ?
tdt and lexical chain references
TDT and Lexical Chain References
  • Allan, J., Carbonell, J., Doddington, G., Yamron, J, and Yang, Y., “Topic Detection and Tracking Pilot Study: Final Report”, Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Morgan Kaufmann, San Francisco,1998.
  • Allan, J., Papka, R., and Lavrenko, V., “Online New Event Detection and Tracking”, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998.
  • Barzilay, R., “Lexical Chains for Summarization”, M.Sc. Thesis, Ben-Gurion University of the Negev, Israel, November 1997.
  • Barzilay, R., and Elhadad, M., “Using Lexical Chains for Text Summarization”, The Fifth Bar-Ilan Symposium on Foundations of Artificial Intelligence Focusing on Intelligent Agents, Bar-Ilan University, Ramat Gan, Israel, June, 1997
  • Budanitsky, A., “Lexical Semantic Relatedness and its Application in Natural Language Processing”, (PhD thesis) Technical Report CSRG-390, University of Toronto, 1999.
  • Ellman, J., “Using Roget\'s Thesaurus to Determine the Similarity of Texts”, PhD Thesis, University of Sunderland, 2000.
  • Fellbaum, C., (Ed.), WordNet: An Electronic Lexical Database and Some of its Applications, MIT Press, 1998.
  • Green, S.J., “Automatically Generating Hypertext by Computing Semantic Similarity”, Ph.D. Thesis, University of Toronto, 1997.

http://www.cs.ucd.ie/staff/jcarthy

slide24
Halliday, M.A.K. and Hasan, R., “Cohesion In English”, Longman , 1976.
  • Hatch, P., "Lexical Chaining for the Online Detection of New Events", M.Sc. Thesis, University College Dublin, 2000.
  • Hirst, G., and St-Onge, D., “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms”, in WordNet: An Electronic Lexical Database and Some of its Applications, Fellbaum, C., (Ed.), MIT Press, 1998.
  • Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M., “Four Paradigms for Indexing Video Conferences”, IEEE MultiMedia, 3 (1), Spring 1996.
  • Mochizuki, H., Iwayama, M., and Okumura, M., “Passage Level Document Retrieval Using Lexical Chains”, RIAO 2000, Content Based Multimedia Information Access, 491-506, 2000.
  • Morris J., and Hirst, G., “Lexical Cohesion, the Thesaurus, and the Structure of Text”, Computational Linguistics, 17 (1), 211-232, 1991.
  • Okumura, M., and Honda, T., “Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion”, In Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-94), Vol. 2, 775-761, Kyoto, Japan, August 1994.
  • Porter, M.F., “An Algorithm for Suffix Stripping”, Program, 14, 130-137, 1980.
  • Robertson, S.E. and Sparck Jones, K, "Simple Approaches to Text Retrieval", University of Cambridge Computing Laboratory Technical Report Number 356, May 1997.
  • Stairmand, M.A., “A Computational Analysis of Lexical Cohesion with Applications in Information Retrieval”, Ph.D. Thesis, UMIST, 1996.
  • Stokes, N., Carthy, J., First Story Detection using a Composite Document Representation, HLT 2001, Human Language Technology Confererence, San Diego, California, March 18-21, 2001
  • TDT2000, “The Year 2000 Topic Detection and Tracking (TDT2000) Task Definition and Evaluation Plan”, available at the following URL: http://morph.ldc.upenn.edu/TDT/Guide/manual.front.html, November 2000.

http://www.cs.ucd.ie/staff/jcarthy

ad