riao 2004 2 video retrieval systems l.
Skip this Video
Loading SlideShow in 5 Seconds..
RIAO 2004 2 video retrieval systems PowerPoint Presentation
Download Presentation
RIAO 2004 2 video retrieval systems

Loading in 2 Seconds...

play fullscreen
1 / 28

RIAO 2004 2 video retrieval systems - PowerPoint PPT Presentation

  • Uploaded on

RIAO 2004 2 video retrieval systems. The F íschlár-News-Stories System: Personalised Access to an Archive of TV News. Alan F. Smeaton, Cathal Gurrin, Howon Lee, Kieran McDonald, Noel Murphy, Noel E. O’Connor, David Wilson Centre for Digital Video Processing, Dublin City University

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

RIAO 2004 2 video retrieval systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the f schl r news stories system personalised access to an archive of tv news

The Físchlár-News-Stories System: Personalised Access to an Archive of TV News

Alan F. Smeaton, Cathal Gurrin, Howon Lee, Kieran McDonald, Noel Murphy, Noel E. O’Connor, David Wilson

Centre for Digital Video Processing, Dublin City University

Derry O’Sullivan, Barry Smyth

Smart Media Institute, Department of Computer Science, University College Dublin

  • Físchlár systems
    • A family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information
    • Físchlár-News-Stories
      • Provides access to a growing archive of broadcast TV news
      • Segment news into shots and stories, calendar lookup, text search, link between related stories, personalisation and recommend stories
shot boundary detection
Shot boundary detection
  • Shot
    • A single camera motion in time
    • We can have camera movement as well as object motion
  • Shot cut
    • Hard cut
    • Gradual transition (GT)
  • Boundary detection (Browne, et al., 2000)
    • Frame-frame similarity over a window of frames
    • Evaluation: TRECVID 2001
      • Over 90% precision and recall for hard cuts
      • Somewhat less for GT
story segmentation
Story segmentation
  • Cluster all keyframes of shots
    • Similarity: colour and edge histograms (O’Connor, et al., 2001)
  • Anchorperson shots
    • One of the clusters will have an average keyframe-keyframe similarity much higher than the others and this will most likely be a cluster of anchorperson shots
  • Beginning of news, beginning/end of advertisement
    • Apply a speech-music discrimination algorithm to the audio
story segmentation7
Story segmentation
  • Detect individual advertisements
    • Sadlier, et al., 2002
  • Shot length
    • Outside broadcasts an d footage video tends to have shorter shot lengths than the in-studio broadcasts
  • Using SVM to determine story bounds
    • Combine the output of these analyses
  • Evaluation (TRECVID 2003)
    • 31% recall and 45% precision
  • For the present time, the automatic segmentation is manually checked for accuracy every day
search based on text
Search based on text
  • Closed captions
    • Typing error, omit phrases or sentences
    • Time lagging
  • Retrieval
    • Simple IR engine
    • When a story’s detail is displayed, we use the closed caption text from that story as a query against the closed caption archive and display summaries of the 10 top-ranked stories
  • User feedback
    • Rating on a given news story using a 5-point scale
    • These ratings are used as input to a collaborative filtering system which can recommend news stories to users based on ratings from other users
      • Need to recommend on new content
      • User vs. stories ratings matrix is very sparse
    • Story-story similarity + user-story ratings
cimwos a multimedia retrieval system based on combined text speech and image processing

CIMWOS: A Multimedia Retrieval System based on Combined Text, Speech and Image Processing

Harris Papageorgiou1, Prokopis Prokopidis1,2, Iason Demiros1,2, Nikos Hatzigeorgiou1, George Carayannis1,2

1Institute for Language and Speech Processing

2National Technical University of Athens

    • Multimedia, multimodal and multilingual
    • Content-based indexing, archiving, retrieval and on-demand delivery of audiovisual content
    • Video library
      • Sports, broadcast news and documentaries in English, French and Greek
    • Combine speech, language and image understanding technology
    • Producing XML metadata annotations following the MPEG-7 standard
speech processing subsystem
Speech processing Subsystem
  • Speaker Change Detection (SCD)
  • Automatic Speech Recognition (ASR)
  • Speaker Identification (SID)
  • Speaker Clustering (SC)
text processing subsystem

Story Segmentation

Topic Detection


Speech Transcriptions

Named Entity Detection

Term Extraction

Text processing Subsystem
  • Named Entity Detection (NED)
  • Term Extraction (TE)
  • Story Segmentation (SD)
  • Topic Detection (TD)
text processing subsystem17
Text processing Subsystem
  • Applied on the textual data produced by the Speech Processing Subsystem
  • Named entity detection
    • sentence boundary identification
    • POS tagging
    • NED
      • Lookup modules that match lists of NEs and trigger-words against the text, hand-crafted and automatically generated pattern grammars, maximum entropy modeling, HMM models, decision-tree techniques, SVM classifier, etc.
    • Term extraction
      • Identify single or multi-word indicative keywords
      • Linguistic processing is performed through an augmented term grammar, the results of which are statistically filtered using frequency-based scores
text processing subsystem18
Text processing Subsystem
  • Story detection and topic classification
    • Employ the same set of models
    • Generative, mixture-based HMM
      • One state per topic, one state modeling general language, that is words not specific to any topic
      • Each state models a distribution of words given the particular topic
      • Running the resulting models on a sliding window, thereby noting the change in topic-specific words as the window moves on
image processing subsystem
Image processing Subsystem
  • Automatic Video Segmentation (AVS)
    • Shotcut detection and keyframe extraction
    • Measurement of differences between consecutive frames
    • Adaptive thresholding on motion and texture cues
  • Face Detection (FD) and Identification (FI)
    • Locate faces in video sequences, and associate these faces with names
    • Based on SVM
image processing subsystem20
Image processing Subsystem
  • Object Recognition (OR)
    • The object’s surface is decomposed in a large number of regions
    • The spatial and temporal relationships of these regions are acquired from several example views
  • Video Text Detection and Recognition (TDR)
    • OCR
      • Text detection
      • Text verification
      • Text segmentation
      • OCR
  • All processing modules in the corresponding three modalities converge to a textual XML metadata annotation scheme following the MPEG-7 descriptors
  • These XML metadata annotations are further processed, merged and loaded into CIMWOS Multimedia database
indexing and retrieval
Indexing and retrieval
  • Weighted Boolean model
    • Weight of index term: tf*idf
    • Image processing metadata are not weighted
  • Two-step
    • 1. Boolean exact-match
      • Objects, topics and faces
    • 2. Query best-match
      • Text, terms and named entities
  • Basic retrieval unit
    • passage
indexing schema
Indexing schema











Named entity(ies)




  • Greek news broadcasts
    • 35.5 hours
      • Collection A: 15 news, 18 hours
        • Segmentation, named entity identification, term extraction, retrieval
      • Collection B: 15 news, 17 hours
        • retrieval
  • 3 users
    • Gold annotations on the videos
  • Segmentation
    • Precision: 89.94%
    • Recall: 86.7%
    • F-measure: 88.29
  • Term extraction
    • Precision: 34.8%
    • Recall: 60.28%
  • Named entity identification
  • Retrieval
    • Users translate each topic to queries
    • 5 queries for each topic in average
    • Collection B
      • Segmentation is based on stories
    • 60% filter
      • Filter out results that scored less 60% in the CIMWOS DB ranking system