290 likes | 430 Views
A Dynamic Probabilistic Multimedia Retrieval Model. Tzvetanka I. Iane va Arjen P. de Vries Thijs Westerveld. Introduction. Video Re presentation schemes used for retrieval: Static Spatio-temporal
E N D
A Dynamic ProbabilisticMultimedia Retrieval Model Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld ICME 2004
Introduction • Video Representation schemes used for retrieval: • Static • Spatio-temporal • Video is a temporal media so a ‘good’ model solves the limitations of keyframe-based shot representation ICME 2004
Spatio-temporal grouping • Spatial priority and tracking of regions from frame to frame • Joint spatial and temporal segmentation • Human vision finds salient structures jointly in space and time (Gepshtein and Kubovy, 2000) ICME 2004
Motivation • Pursue video retrievalinstead of image (keyframe) retrieval • Extension of the Static Probabilistic Multimedia Retrieval model (2003) • GMM in DCT-space-time domain • Diagonal covariance ICME 2004
Docs Models Static Model • Indexing • - Estimate Gaussian Mixture Models from images using EM • - Based on feature vector with colour, texture and position information from pixel blocks • - Fixed number of components ICME 2004
Indexing Estimate a Gaussian Mixture Model from each keyframe (using EM) Fixed number of components (C=8) Feature vectors contain colour, texture, and position information from pixel blocks: <x,y,DCT> Static Model ICME 2004
Static Model Models • Retrieval • Calculate conditional probabilities of query samples given models in collection P(Q|M1) Query P(Q|M2) P(Q|M3) P(Q|M4) ICME 2004
Dynamic Model • Selecting frames • 1 second sequence around the keyframe • Entire video shot as sequence of frames sampled at regular intervals • Features < x, y, t, DCT > ICME 2004
Dynamic Model • Indexing: • GMM of multipleframes around keyframe • Feature vectors extended with time-stamp normalized in [0,1]: <x,y,t,DCT> 1 .5 0 ICME 2004
Dynamic Model ICME 2004
Query example: A single image • Artificial sequence of 29 images as the single query example where the time is normalized between 0 and 1 • Extend the query example image’s features with a fixed temporal feature value of 0.5 – Better results and lower computational cost ICME 2004
Dynamic Model Advantages • More training data for models • Less sensitive to random initialization • Reduced dependency upon selecting appropriate keyframe • Some spatio-temporal aspects of shot are captured • (Dis-)appearance of objects ICME 2004
Dynamic Model ICME 2004
Dynamic Model ICME 2004
Dynamic Model ICME 2004
Retrieval Framework • Smoothing • Building dynamic GMMs Likelihood goes to infinity ??? ICME 2004
Experimental Set-up • Build models for each shot • Static, Dynamic, Language • Build Queries from topics • Construct simple keyword text query • Select visual example • Rescale and compress example images to match video size and quality ICME 2004
Combining Modalities • Independence assumption textual/visual • P(Qt,Qv|Shot) = P(Qt|LM) * P(Qv|GMM) • Combination works if both runs useful [CWI:TREC:2002] • Dynamic run moreuseful than static run ICME 2004
Dynamic: Higher Initial Precision Combining Modalities ICME 2004
Dynamic: Higher initial precision Static run Dynamic run ICME 2004
Dow Jones Topic (120) ICME 2004
“Dow Jones Industrial Average rise day points” Dow Jones Topic (120) + = ICME 2004
Conclusions • Dynamic model captures visual similarity better • Spatio-temporal aspects • More training data • Apropriate key-frame less critical • Less sensitive to the random initialization • ASR + dynamic better than either alone ICME 2004
Future work • More data needs more computation effort – optimizations ? • Avoid the singular solutions Dynamic number of components ? • Full covariance in space-time < x,y,t > • Integration of audio ICME 2004
Thanks !!! ICME 2004
Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging Merging Run Results Combined 1 1 2 2 3 3 4 4 . . 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 ICME 2004
Merging Run Results ICME 2004
Combining (conflicting) examples difficult [CWI:TREC:2002] Single example Miss relevant shots Round-Robin Merging Merging Run Results Combined 1 1 2 2 3 3 4 4 . . 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 ICME 2004
Conclusions • Visual aspects of an information need are best captured by using multiple examples • Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near-best performance for almost all topics ICME 2004