Spatio Temporal Video Retrieval

Spatio Temporal Video Retrieval Team 10 Santhosh Kumar Muriki Shanmukhipriya Ponnada Vijaya Sree Chodavarapu

Outline • Introduction • Problem Statement • Our Contributions • Related Work • Methodology • Comparison • Evaluation • Future Work • Conclusion • References

Introduction • Problem: Growing amounts of video data. • With the development of multimedia data types and available bandwidth there is huge demand of video retrieval systems, as users shift from text based retrieval systems to content based retrieval systems • News Video, Film archives, Surveillance, user-generated content, distance learning, video conferencing, medical applications, sports. • Video data is dynamic. • One could store the digital video information on tapes, CD-ROMs, DVDs, or any such device. • Goal: Effective video retrieval.

Problem Statement • All the papers we worked on are related to retrieval of video data. And how to do this on a compressed video data. • In content-based video retrieval systems choosing features reflect real human interest and how do feature extraction affects the video retrieval.

Our Contributions • We identify the video retrieval approaches from spatial and temporal analysis. We focus on content-based video retrieval systems and video retrieval in compressed data. We classify the methods and summarize the future trends and open problems of video retrieval.

Related Work • Extraction of temporal coherent masks of physical meaningful objects in video sequences. • A novel framework for semantic retrieval of video database. Each frame of video clips, characterized by its HSV (hue-saturation-value) color feature, is first projected onto the spatial principle components • An efficient video retrieval method based user feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. • An interactive platform for semantic video mining and retrieval using relevance feedback (RF), a popular technique in the area of content-based image retrieval (CBIR).

Related Work • Online video retrieving system to extract feature labels of a video clip automatically. By collecting attributes of videos, basic features are produced, and then capture the key frames of video. • Dynamically extraction of features and other content-description (meta-data) from compressed video. • A method for objects detection and features extraction in static video imagery that operates on color/gray-scale frames grabbed by common digital cameras or readily available images from external sources.

Video Retrieval Useful in • Historical Archives • Forensic documents • Fingerprint & DNA matching • Security usage Retrieval Granularity is also important. • How do users want to retrieve materials? • What is the purpose of retrieval? • What is the user expertise?

Content Based Video Retrieval • Content-based video retrieval systems automatically index video material by segmenting it into clips and extracting features such as text, color, texture, motion from each clip to support search. • As digital video collections become more widely available, content-based video retrieval tools will likely grow in importance for an even wider group of users. • CBVR system aims at assisting a human operator (user) to retrieve sequence (target) within a potentially large database

Content Based Video Retrieval • Selection of extracted features play an important role in content based video retrieval regardless of video attributes being under consideration. • Content based Video Indexing and Retrieval (CBVIR), is an extension to application of image retrieval problem • “Content-based” means that the search will analyze the actual content of the video. The term ‘Content’ in this context might refer colors, shapes, textures. • These systems are aiming at accessing video by its content, namely, the spatial-temporal (video) information.

Methodology • The first step for video-content analysis, content based video browsing and retrieval is the partitioning of a video sequence into shots • Once key frames are extracted next step is to extract features • breakdown Sequence->scene->shot->frame->object

Features • Two type • Low-level • High-level • Low-level features such as object motion, color, shape, texture, loudness, power spectrum, bandwidth, and pitch are extracted directly from video in the database • High-level features are also called semantic features. Features such as timbre, rhythm, instruments, and events involve different degrees of semantics contained in the media

Issues • One of the key issues in CBVR is, to bridge the ”semantic gap”, which refers to the gap between low level features and high level semantic meanings of content • Low level features such as color and textures are easy to measure and compute • But it is a challenge to connect the low level features to a semantic meaning, especially involving intellectual and emotional aspects of the human operator (user). • Another issue is how to efﬁciently access the rich content of video information, these involves video content, spatial and temporal analysis of videos

Generalized n-ary relation • The principle component of video data is the spatial/temporal semantics associated with it • Generalization in both spatial and temporal domains is to simplify describing complex spatial or temporal events. • For the spatial domain the operands represent the physical location of the objects • In temporal case they represent the duration of a certain temporal event.

N-ary • Spatial event, consider a player holding the ball in a basketball game. • A frame consisting event "player holding the ball". • This is characterized by six of the n-ary relations in both x and y coordinates . M, O, C, S, CO • Spatial events can serve as the low level (fine-grain) indexing mechanisms for video data. • Temporal event is extension of the spatial event “holding a ball” to ‘passing of a ball between two players”. • B is the before n-ary operation, and d(Events) are the durations of the spatial events

Architecture The system is hierarchical in nature and allows multi-level indexing and searching mechanism by modeling information at various levels of semantic granularity and hence allows processing of content-based queries without processing raw image or video data

Retrieval In Compressed Data • To avoid the processing overhead of decompressing video stream into individual frames, it is better to detect these features directly from compressed video data. • Spatio-temporal data can be dominant regions, color information and motions from compressed video data. • Dominant regions are used in video indexing and retrieval, these are extracted from intensity data. • Color information computed from HSV quantized table and camera motion detection for region-based segmented data is used to compute similarities between images and scene changes can be detected.

TRECVID: A BENCHMARKING EVALUATIONCAMPAIGN FOR VIDEO RETRIEVAL • In terms of video retrieval the largest collaborative benchmarking activity for content-based activities is the series of TRECVID workshops • The TRECVID evaluation meetings are an on-going series of workshops focusing on a list of different information retrieval (IR) research areas in content-based retrieval and exploitation of digital video. • This has involved worldwide participation with over 50 research teams taking part each year in a variety of content-based ‘‘tasks’’ including shot boundary detection, concept or semantic feature detection, automatic summarization as well as content-based video retrieval. • TRECVID 2012 was composed of 57 teams from Europe, the Americas, Asia, and Australia comprising some 400 researchers.

Comparison Study Summary • Key issues we noticed in this study are • 1. Bridging the semantic gap: • To do annotation automatically or semi-automatically, we need to bridge the "semantic gap", i.e., to find algorithms that will infer high-level semantic concepts (sites, objects, events) from low-level image/video features that can be easily extracted from the data (color, texture, shape and structure, layout; motion; audio - pitch, energy, etc.) • One sub-problem is Audio Scene Analysis. Researchers have worked on Visual Scene Analysis (Computer Vision) for many years, but Audio Scene Analysis is still in its infancy, and an under-explored field. • Another sub-problem is multimodal fusion, esp. how to combine visual and audio cues to bridge the semantic gap in video.

Comparison Study Summary • 2) Human intelligence and machine intelligence • One advantage of information retrieval is that in most scenarios there is a human (or humans) in the loop. One prominent example of human-computer interaction is Relevance Feedback. • 3) New Query Paradigms • For image/video retrieval, people have tried query by keywords, similarity, sketching an object, sketching a trajectory, painting a rough image, etc. Can we think of useful new paradigms? • 4) Data Mining • Searching for interesting/unusual patterns and correlations in video has many important applications, including Web Search Engines and dealing with intelligence data. Work to date on Data Mining has been mainly in Text data.

Comparison Study Summary • 5) Unlabeled Data • Can we use the large number of unlabeled samples in the database to help? • Also, how about active learning (to choose the best samples to return to the user to get most information about the user's intention through feedback)? • Another problem related to image/video data annotation is Label Propagation. Can we label a small set of data and let the labels propagate to the unlabeled samples? • 6) Incremental Learning • In most applications, we keep adding new data to the database. We should be able to change the parameters of the retrieval algorithms incrementally, not needing to start from scratch every time we have new data.

Comparison Study Summary • 7) Using Virtual Reality Visualization To Help • Can we use 3D audio/visual visualization techniques to help a user to navigate through the data space to browse and to retrieve? • 8) Structuring Very Large Databases • Researchers in audio/visual scene analysis and those in Databases and Information Retrieval should really collaborate CLOSELY to find good ways of structuring very large video databases for efficient retrieval and search.

Comparison Study Summary • 9) Applications of Video Retrieval • Few real applications of video retrieval have been accepted by the general public so far. Is web video search engine going to be the next killer application? It remains to be seen. With no clear answer to this question, it is still a challenge to do research that is appropriate for real applications.

Conclusion & future work • Despite the considerable progress of academic research in video retrieval, there has been relatively little impact of content based video retrieval research on commercial applications with some niche exceptions such as video segmentation. • Choosing features that reflect real human interest remains an open issue. One promising approach is to use Meta learning • The practical utility of a robust CBVIR system must address the problem of dynamic updating of video databases and feature spaces, as well as dynamic matching of queries and databases • Low to High Level Semantic Gap: Visual feature based techniques at the low level of abstraction, mostly from the contribution of signal processing and computer vision communities have been explored in the literature. • Current research efforts are more inclined towards high-level description and retrieval of visual content. • The techniques that bridge this semantic gap between pixels and predicates are a ﬁeld of growing interest. • Intelligent systems are needed that take low-level feature representation of the visual media and provide a model for the high-level object representation of the content.

References • http://research.microsoft.com/en-us/um/people/yongrui/ps/sigproc06.pdf • Day, Y.F.; Dagtas, S.; Iino, M.; Khokhar, A.; Ghafoor, A., "Spatio-temporal modeling of video data for on-line object-oriented query processing," Multimedia Computing and Systems, 1995., Proceedings of the International Conference on , vol., no., pp.98,105, 15-18 May 1995 • Hang-Bong Kang, "Spatio-temporal feature extraction from compressed video data," TENCON 99. Proceedings of the IEEE Region 10 Conference , vol.2, no., pp.1339,1342 vol.2, Dec 1999 • Sze-Man Chan, S.; Li, Qing, "VideoMAP*: a Web-based architecture for a spatio-temporal video database management system," Web Information Systems Engineering, 2000. Proceedings of the First International Conference on , vol.1, no., pp.393,400 vol.1, 2000 • Xia, J.; Wang, Y., "A spatio-temporal video analysis system for object segmentation," Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd International Symposium on , vol.2, no., pp.812,815 Vol.2, 18-20 Sept. 2003 • Bo Geng; Hong Lu; XiangyangXue, "IncremetalSpatio-Temporal Feature Extraction and Retrieval for Large Video Database," Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on, vol., no., pp.961,964, 27-30 May 2007 • Velusamy, S.; Bhatnagar, S.; Basavaraja, S. V.; Sridhar, V., "SPSA based feature relevance estimation for video retrieval," Multimedia Signal Processing, 2008 IEEE 10th Workshop on , vol., no., pp.598,603, 8-10 Oct. 2008 • XinChen; Chengcui Zhang, "An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.129,138, 18-22 Dec. 2006 • Mehmet EminDönderler;ÖzgürUlusoy; UgurGüdükbay “Rule-based spatiotemporal query processing for video databases”The VLDB Journal- The International Journal on Very Large Data Bases; Volume 13 Issue 1, January 2004; Pages 86 – 103 • FudongSun; Minyong Shi; Weiguo Lin, "Feature Label Extraction of Online Video," Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on , vol.3, no., pp.211,214, 23-25 March 2012 • Divakaran, A.; Vetro, A.; Asai, K.; Nishikawa, H., "Video browsing system based on compressed domain feature extraction," Consumer Electronics, IEEE Transactions on , vol.46, no.3, pp.637,644, Aug 2000 • Al-Salih, A.A.M.; Ahson, S.I., "Object detection and features extraction in video frames using direct thresholding," Multimedia, Signal Processing and Communication Technologies, 2009. IMPACT '09. International , vol., no., pp.221,224, 14-16 March 2009 • SifeiLu; Li, R.M.; Tjhi, W.-C.; KeeKhoon Lee; Long Wang; Xiaorong Li; Di Ma, "A Framework for Cloud-Based Large-Scale Data Analytics and Visualization: Case Study on Multiscale Climate Data," Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on , vol., no., pp.618,622, Nov. 29 2011-Dec. 1 2011

Thank you

Spatio Temporal Video Retrieval

Spatio Temporal Video Retrieval

Presentation Transcript

Spatio-Temporal Compressive Sensing

Spatio-Temporal Data Mining

SPATIO TEMPORAL FRAMEWORKS

Spatio-temporal HAC

Spatio-Temporal Databases

Topic regards: ◆ Browsing of Search Results ◆ Video Retrieval using Spatio-Temporal

Modeling Spatio-temporal Network Computations

Spatio-Temporal Clustering

Rule-Based Spatio-Temporal Query Processing for Video Databases

Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Spatial Databases: Spatio-Temporal Databases

Spatio-Temporal WiFi Localization

SPATIO-TEMPORAL DATABASES

Indexing Spatio-Temporal Data Warehouses

Spatio-temporal Pattern Queries

Spatio-temporal Databases

Spatio-Temporal Predicates

UCERF3 Spatio-Temporal Clustering

Distributed Spatio-Temporal Similarity Search

Spatio-Temporal Databases

Indexing Spatio-Temporal Data Warehouses