1 / 37

Temporal Video Boundaries -Part One-

Temporal Video Boundaries -Part One-. SNUEE Kim KyungMin. Why do we need temporal segmentation of v ideos? How do we set up boundaries in between video f rames? How do we merge two separate but uniform segments?. Abstract. Much work has been done in automatic

urian
Download Presentation

Temporal Video Boundaries -Part One-

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temporal Video Boundaries-Part One- SNUEE Kim KyungMin

  2. Why do we need temporal segmentation of • videos? • How do we set up boundaries in between video • frames? • How do we merge two separate but uniform • segments?

  3. Abstract • Much work has been done in automatic • video analysis. But while techniques like • local video segmentation, object detection • and genre classificationhave been • developed, little work has been done on • retrieving overall structural properties of a • video content.

  4. Abstract(2) • Retrieving overall structure in a video content • means splitting the video into meaningful tokens • by setting boundaries within the video. =>Temporal Video Boundary Segmentation • We define these boundaries into 3 categories : micro-, macro-, mega- boundaries.

  5. Abstract(3) • Our goal is to have a system for automatic video • analysis, which should eventually work for • applications where a complete metadatais • unavailable.

  6. Introduction • What’s going on? • Great increase in quantity of video contents. • More demand for content-aware apps. • Still the majority of video contents have insufficient metadata. => More demand for information on temporal video boundaries.

  7. BOUNDARIES : definitions • Micro-boundaries : the shortest observable temporal segments. Usually bounded within a sequence of contiguously shot video frames. (frames under the same micro-boundaries.)

  8. Micro-boundaries are associated to the smallest video units, for which a given attributeis constant or slowly varying. The attribute can be visual, sound or text. • Depending on which attribute, micro-boundaries can differ.

  9. BOUNDARIES : definitions(2) • Macro-boundaries : boundaries between different parts of the narrative or the segments of a video content. (frames under the same macro-boundaries.)

  10. Macro-boundaries are boundaries between micro-boundaries that are clearly identifiable organic parts of an event defining a structural or thematic unit.

  11. BOUNDARIES : definitions(3) • Mega-Boundaries : • a boundary • between a • program and any • non-program • material. (frames under different mega-boundaries.)

  12. Mega-Boundaries are boundaries between macro-boundaries which typically exhibit a structural and feature consistency.

  13. BOUNDARIES : FORMAL Definition • A video content contains three types of • modalities : visual, audio, textual • andeach modality has three levels : low-, mid, • high- • These levels describe the “amount of details” • in each modality in terms of granularity and • abstraction.

  14. BOUNDARIES : FORMAL Definition(2) • For each modality and levels is an attribute. An • attribute defined as below. (attribute vector) : denotes modality( ex : m=1, 2 and 3 means visual, audio and text respectively. : denotes the index for the attributes. (ex : m=1 and =1 indexes color ) : denotes the total number of vector components. : time constant ( can be expressed in integers or milliseconds.)

  15. BOUNDARIES : FORMAL Definition(3) • If time interval is defined as , the average and • thedeviationof an attribute throughout the • video can be expressed as below : = avg of (deviation) = Where

  16. BOUNDARIES : FORMAL Definition(4) • By using the vectors defined previously, we now have • two different methods to estimate temporal boundaries :

  17. Micro-boundaries • In multi-media, the term “shot” or “take” is widely used. • Similar concept can be used to define the segment • between micro-boundaries, which is often called a • “family of frames.” • Each segment has an representative frame called • “keyframe.” The keyframe of a family has audio/video • data that well represents the segment. But the method • to pick out the keyframe may vary.

  18. Micro-boundaries(2) • Each family has a “family histogram” to eventually form a • “superhistogram.” • A family histogram is a data structure that represents • the color information of a family of frames. • A superhistogramis a data structure that contains the • information about non-contiguous family histograms • within the larger video segment.

  19. Micro-boundaries(3) • Generation of family histograms and superhistograms • may vary depending on pre-defined dimensions below. • 1) The amount of memory • -No memory means comparing only with the pre- • vious frame. • 2) Contiguity of compared families • -Determining the time step. • 3) Representation for a family • -How we choose the keyframe.

  20. Micro-boundaries : Family of frames • An image histogram is a vector representing the color values and the frequency of their occurrence in the image. • Finding the difference between consecutive histograms and merging similar histograms enable generating family of frames. • For each frame, we compute the histogram( ) and then search the previously computed family histograms( ) to find the closest match.

  21. Micro-boundaries : Family of frames(2) • Several ways to generate histogram difference : • Among them, the L1 and bin-wise histogram intersection gave the best results.

  22. Micro-boundaries : boundary detection • If the difference between two family histograms is less than a given threshold, the current histogram is merged into the family histogram. • Each family histogram consists of : • 1) pointers to each of the constituent histograms and frame numbers. • 2) a merged family histogram.

  23. Micro-boundaries : boundary detection(2) • Merging of family histograms is performed as below: • (basically, the mean of all histograms in the given video.)

  24. Micro-boundaries : boundary detection(3) • Multiple ways to compare and merge families, depends on the choice of contiguity and memory. • Contiguous with zero memory • Contiguous with limited memory • Non-contiguous with unlimited memory • Hybrid : first a new frame histogram is compared using the contiguous frames and then the generated family histograms are merged using the non-contiguous case.

  25. Micro-boundaries : experiments • CNN News Sample. • 27,000 frames • Tested with 9, 30, 90, 300 bins in HSB, 512 bins in RGB • Multiple histogram comparisons: L1, L2, bin-wise intersection and histogram intersection. • Tried on 100 threshold values.

  26. Micro-boundaries : experiments(2) • Tested on a video clip, best results showed when threshold 10 with the L1 comparison/contiguous with limited memory boundary method/HSB space quantized to 9 bins.

  27. Micro-boundaries : experiments(3)

  28. Macro-boundaries • A story is a complete narrative structure, conveying a continuous thought or event. We want micro-segments with the same story to be in the same macro-segment. • Usually we need textual cues(transcripts) for setting such boundaries, but this paper suggests methodologies that does the job solely with audio and visual cues. • We focus on the observation that stories are characterized by multiple constant or slowly varying multimedia attributes.

  29. Macro-boundaries(2) • Two types of uniform segment detection : • Unimodal and multimodal • Unimodal(under the same modality) : when a video segment exhibits the “same” characteristic over a period of time using a single type of modality. • Multimodal : vice versa

  30. Macro-boundaries : single modality segmentation • In case of audio-based segmentation: • 1)Partition a continuous audio stream into non-overlapping segments. • 2) Classify the segments using low-level audio features like bandwidth. • 3) Divide the audio signal into portions of different classes.(speech, music, noise etc.)

  31. Macro-boundaries : single modality segmentation(2) • In case of textual-based segmentation : • 1) If transcript doesn’t exist, extract text data from the audio stream using speech-to-text conversion. • 2) The transcript segmented with respect to a predefined topic list. • 3) A frequency-of-word-occurrence metric is used to compare incoming stories with the profiles of manually pre-categorized stories.

  32. Macro-boundaries : multimodal segments • What we want to do : Retrieve better segmentation resultsby using the results from various unimodal segmentations. • What we need to do : first the pre-merging steps, and then the descent steps.

  33. Macro-boundaries : multimodal segments(2) • Pre-merging Steps : detect micro-segments that exhibit uniform properties, and determine attribute templates for further segmentation. • Uniform segment detection • Intra-modal segment clustering • Attribute template determination -attribute template : a combination of numbers that characterize the attribute. • Dominant attribute determination • Template application

  34. Macro-boundaries : multimodal segments(3) • Descent Methods : By making combinations of multimedia segments across multiple modalities, each attribute with its segments of uniform values is associated with a line.

  35. Macro-boundaries : multimodal segments(4) • Single descent methoddescribes the process of generating story segments by combining these segments. • Single descent with intersecting union • Single descent with intersection • Single descent with secondary attribute • Single descent with conditional union

  36. Macro-boundaries : experiments • Single descent process with conditional union. • Used text transcript as the dominant attribute. • -uniform visual/audio segments • -uniform audio segments • You can find a lag between the story beginning and the production of transcript.

  37. Questions?

More Related