1 / 15

Video Shot Boundary Detection at RMIT University

Video Shot Boundary Detection at RMIT University. Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University {tvolkmer, saied, hugh}@cs.rmit.edu.au. Overview. Our general approach The moving query window Details of the approach

vogt
Download Presentation

Video Shot Boundary Detection at RMIT University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Shot Boundary Detectionat RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University {tvolkmer, saied, hugh}@cs.rmit.edu.au

  2. Overview • Our general approach • The moving query window • Details of the approach • How we measure frame similarity • Improvements for 2004 cut detection • Detection of gradual transitions • Evaluation • Experimental results • Conclusions

  3. Pre Frames Current Frame Post Frames The Moving Query Window • A moving query window consists of two equal-sized half windows, surrounding a current frame • The moving query window is advanced through the video frame-by-frame • Cut detection and gradual transition detection is performed with separate decision stages during a single pass

  4. Frame feature representation • We use one-dimensional, localised histograms with 4x4 regions in the HSV colour space (16 bins per colour component) • A colour histogram represents each frame region. Corresponding regions are compared • Different weights can be applied to each region during comparison

  5. Cut detection • We disregard the four central regions of each frame to avoid the effect of rapid activity (that is, their weight = 0) • Using the remaining regions, each frame in the moving window is ranked by decreasing similarity to the current frame • Frame similarity is the sum of the inter-region similarities • The number of pre-frames that are ranked in the top half of the rankings is monitored • When a cut is passed, the number of top ranked pre-frames (usually) rises to a maximum and falls to a minimum within a few frames • We have determined an optimum window size and optimum thresholds that are effective for all our training sets • Our cut detection is (now) parameter free

  6. Gradual transition detection • Pre-frames and post-frames are combined into two distinct sets of frames. The average distance of each set to the current frame is computed • We use all frame regions (with identical weights) • The ratio between the pre-frame set distance and the post-frame set distance, the PrePostRatio, is monitored • The end of most gradual transitions is indicated by a peak in the PrePostRatio curve • We maintain a moving average PrePostRatio for calculating a dynamic threshold to detect transitions • As a final decision step, we require a minimum difference between the last frame of the previous shot and the first frame of the new shot

  7. PrePostRatio in detail • A schematised dissolve between a shot A and a shot B: • The PrePostRatio is usually minimal at the beginning of a gradual transition and rises up to a maximum at the end of the transition

  8. PrePostRatio curve example • The curve shows two short gradual transitions and two cuts within a range of 1000 frames

  9. Training and Evaluation • We have trained on the TRECVID 2003 shot boundary test set • Main parameters for gradual transition detection are • The query window size • The size of the history buffer for dynamic thresholding • A threshold level factor • Results are discussed on the next slides. (We achieve similar and better results on the 2002 and 2001 test sets in blind runs.)

  10. Results at TRECVID 2004

  11. Overall results

  12. Frame recall and precision for gradual transitions

  13. Discussion • Cut detection is highly effective • This year, recall is 94% and precision is 92%. Improvements from 2003 due to ignoring centre region • Gradual detection has improved significantly since 2003: • Recall now between 68%--85%, precision 67%--84% • High detection threshold favours precision, low favours recall • Short detection threshold history length was found to be preferable • Final decision step reduces false positives • For television news, we are able to use a fixed moving query window size of 24 frames • Experimented with a simple ASR technique in 10 additional runs, which removed detected transitions that coincided with spoken words. Ad hoc, very unsuccessful…

  14. Conclusions • Disregarding the focus area of frames for cut detection has improved our results by 3% in recall and 9% in precision • Our parameter-free ranking scheme is highly effective in cut detection on a wide variety of footage • Our gradual transition detection method is relatively simple and needs only few parameters • The additional, final preprocessing step reduces false positives and improved results significantly • The use of localised histograms and more dynamic thresholding also improved results in gradual transition detection • Our approach is computationally inexpensive, simple to implement, and effective • 15,500 seconds to process the video (around 4 hours, 18 minutes)

  15. Questions?

More Related