Download Presentation
## Similarity Search in Visual Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Similarity Search in Visual Data**Ph.D. Thesis Defense Anoop Cherian* Department of Computer Science and Engineering University of Minnesota, Twin-Cities Adviser: Prof. Nikolaos Papanikolopoulos *Contact: cherian@cs.umn.edu**Talk Outline**• Introduction • Problem Statement • Algorithms for Similarity Search in • Matrix Valued Data • High Dimensional Vector Data • Conclusion • Future Work**Thesis Related Publications**Journals A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Jensen-Bregman- LogDet-Divergence with Application to Efficient Similarity Search for Covariance Matrices. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), [Accepted with minor revisions]. (Chapter 3) 2. A. Cherian, V. Morellas, and N. Papanikolopoulos. Efficient Nearest Neighbor Retrieval via Sparse Coding. Pattern Recognition Journal, [Being submitted] (Chapters 5, 7) Conference Publications A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Efficient Similarity Search on Covariance Matrices via the Jensen-Bregman-LogDet-Divergence, Intl. Conf. on Computer Vision (ICCV), 2011. (Chapter 3) A. Cherian, V. Morellas, N. Papanikolopoulos, and S. Badros. Dirichlet Process Mixture Models on Symmetric Positive Definite Matrices for Appearance Clustering in Video Surveillance Applications, Computer Vision and Pattern Recognition (CVPR), 2011. (Chapter 4)**Thesis Related Publications**A. Cherian, J. Andersh, V. Morellas, N. Papanikolopoulos, and B. Mettler. Motion Estimation of a Miniature Helicopter using a Single Onboard Camera, American Control Conference (ACC), 2010. (Chapter 5) 4. A. Cherian, S. Sra, and N. Papanikolopoulos. Denoising Sparse Noise via Online Dictionary Learning. Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2011. (Chapter 6) A. Cherian, V. Morellas, and N. Papanikolopoulos. Robust Sparse Hashing. Intl. Conf. on Image Processing (ICIP), 2012 (Chapter 6) [Best Student Paper Award] A. Cherian, V. Morellas, and N. Papanikolopoulos. Approximate Nearest Neighbors via Dictionary Learning, Proceedings of SPIE, 2011. (Chapters 5,6,7) S. Sra, and A. Cherian. Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval, European Conference on Machine Learning (ECML), 2011. (Chapter 8) A. Cherian, and N. Papanikolopoulos. Large Scale Image Search via Sparse Coding. Minnesota Supercomputing Institute (MSI) Poster Presentation, 2012. [Best Poster Award]**Talk Outline**• Introduction • Motivation • Problem Statement • Algorithms for Similarity Search in • Matrix Valued Data • High Dimensional Vector Data • Conclusion • Future Work**Big-Data Challenge**How to connect the information seeker to the right content? Solution Similarity search Three fundamental steps in similarity search 1. Represent the data 2. Describe the query 3. Retrieve data most similar to the query**Visual Data Challenges**• “Never express yourself more clearly • than you are able to think”-- Neils Bohr • It is sometimes difficult to describe • precisely in words, what “data” is to • be retrieved! • This is especially the case in visual content retrieval, • where similarity is defined by an “unconscious” process. • Therefore, characterizing what we see is hard. • It is even harder to teach a machine “visual similarity”. Art courtesy of Thomas Kinkade Pastoral House**A Few Applications using Visual Similarity Search**Medical Image Analysis 3D Reconstruction Content-based image retrieval Visual Surveillance Human-Machine Interaction**3D Scene Reconstruction: Technical Analysis**Goal: 3D street view Input: A set of images Algorithm 1. Find point correspondences between pairs of images 2. Estimate camera parameters 3. Estimate camera motion 4. Estimate 3D point locations Courtesy: Google Street View**3D Scene Reconstruction: Technical Analysis**• Typically SIFT point descriptors (128D) are • used as point descriptors • Each image produces several thousand • SIFT descriptors (let us say 10K SIFTs/image) • There are several thousand images required • for a reliable reconstruction (assume 1K images). • Thus, there are approximately 10Kx1K=107 SIFTs. • Pair-wise computations require 1014 comparisons! • This is for only one scene… think of millions of scenes • in a Street-View application! • Computational bottleneck: Efficient similarity computation! Courtesy: Google Street View**Talk Outline**• Introduction • Motivation • Problem Statement • Algorithms for Similarity Search in • Matrix Valued Data • High Dimensional Vector Data • Conclusion • Future Work**Problem Statement**Approximate Nearest Neighbor**Problem Challenges**• High dimensional data • Poses the curse of dimensionality • Difficult to distinguish near and far points • Examples: SIFT (128D), GIST (960D) • Large scale datasets • Needle in the haystack! • Peta-bytes of visual data and billions of data descriptors Unit ball inside a unit hypercube Desired Similarity Search Algorithm Properties • High retrieval accuracy • Fast retrieval • Low memory footprint • Scalability to large datasets • Scalability to high dimensional data • Robustness to data perturbations • Generalizability to various data descriptors**Thesis Contributions**• We propose NN retrieval algorithms for two different data modalities:- • Matrix valued data (as symmetric positive definite matrices) • A new similarity distance – Jensen Bregman LogDet Divergence • An unsupervised clustering algorithm • High dimensional vector valued data • A novel connection between sparse coding and hashing • A fast and accurate hashing algorithm for NN retrieval • Theoretical analysis of our algorithms • Experimental validation of our algorithms: • against the state-of-the-art techniques in NN retrieval, and • on several computer vision datasets**Talk Outline**• Introduction • Motivation • Problem Statement • Algorithms for Similarity Search in • Matrix Valued Data • High Dimensional Vector Data • Conclusion • Future Work**Matrix (Covariance) Valued Data**Appearance silhouette Features (color + gradient + curvature) Covariance of features**Importance of Covariance Valued Data in Vision**Activity Recognition (12x12D), Guo et al. 2009 • Diffusion Imaging (3x3D) • (DT-MRI) Object Tracking (5x5D), Tuzel, et al. 2006 Emotion Recognition (30x30D), Zheng, et al., 2010 Face Recognition (40x40D), Pang et al. 2008 3D Object Recognition (8x8D), Fehr et al. 2012**Geometry of Covariances**• Covariances form a manifold in Euclidean space due to their positive definiteness property • Distances are not straight lines, but curves! • Incorporating curvature makes • distance computation expensive X Y Sp++**Similarity Metrics on Covariances**• Affine Invariant Riemannian Metric (AIRM) • Natural metric induced by the Riemannian geometry • Log-Euclidean Riemannian Metric (LERM) • Induced by approximating covariances to a flat geometry • Kullback-Leibler Divergence Metric (KLDM) • Considering covariances as objects of an associated • Gaussian distribution • Matrix Frobenius Distance (FROB) • Considering covariances as vectors in the • Euclidean space**Our Distance:Jensen-Bregman LogDet Divergence (JBLD)**• Let f be a convex function • df(X,Y) is the deviation of f(Y) from the tangent • through f(X) (see figure on the right) • Jensen-Bregman divergence is the • average deviation of f from the mid point • of X and Y • Our new measure is derived by substituting • f as the -log| . | function: • where X,Y are covariances and log| . | is the logdet function. f f(X) f(Y)**Computational Speedup using JBLD**Speedup in computing AIRM and JBLD for increasing matrix dimensionality Speedup in computing gradients of AIRM and JBLD for increasing matrix dimensionality**JBLD Geometry**AIRM surface FROB surface KLDM surface JBLD surface**Nearest Neighbors using JBLD**• Considering • NN retrieval on any metric space • Scalability • Ease for exact NN retrieval • Ease for Approximate NN • We decided to use a Metric Tree (MT) on JBLD for NN retrieval • Square-root of JBLD is a metric. • Basically a hierarchical kmeans algorithm • From root (which is the entire dataset), • bipartitions data recursively.**Experiments: Evaluation Datasets**Weizmann Actions dataset ETH Tracking dataset Brodatz Texture dataset Faces in the Wild dataset**Experimental Results using JBLD**Metric Tree Creation Time NN via Metric tree ANN via Metric tree**Unsupervised Clustering of Covariances**• Clustering is an important step in NN retrieval • K-Means type clustering need known • number of clusters (K) • Finding K is non-trivial in practice • Thus, we propose an unsupervised clustering algorithm on covariances • Extension to Dirichlet Process Mixture Model (DPMM) • Uses Wishart-Inverse-Wishart (WIW) conjugate pair • Also investigates other DPMM models such as, • Gaussian on log-Euclidean covariance vectors • Gaussian on vectorized covariances**Experimental Results**Appearances, 5x5 D, 758 matrices, 31 clusters Faces, 40x40 D, 900 matrices, 110 clusters DPMM computational expense against k-means (using AIRM) and EM (using MoW) Appearances, 5x5 D, 758 matrices, 31 clusters Simulation results for increasing true number of clusters Purity is synonymous with Accuracy. Definitions:- le: LERM, f: FROB, l-KLDM, g: AIRM**Talk Outline**• Introduction • Motivation • Problem Statement • Algorithms for Similarity Search in • Matrix Valued Data • High Dimensional Vector Data • Conclusion • Future Work**Importance of Vector Valued Data in Vision**• Fundamental data type in several applications • As histogram based descriptors • - Examples: SIFT, Spin Images, etc. • As feature descriptors • -Example: image patches • As filter outputs: • - Example: GIST descriptor SIFT GIST Texture patches**Related Work**• KD Trees • Partitions space along fixed hyperplanes • Locality Sensitive Hashing (LSH), Indyk et al. 2008 • Generates hash codes by projecting data to random hyperplanes • Spectral Hashing, Torralba et al. 2008 • Projection planes derived from orthogonal subspaces of PCA • Kernelized Hashing, Kulis et al. 2010 • Projection planes derived from PCA over kernel matrix learned from data • Shift Invariant Kernel Hashing, Lazebik et al. 2009 • Spectral hashing with a cosine based kernel • Product Quantization, Jegou et al. 2011 • K-means sub-vector clustering followed by standard LSH • FLANN, Lowe et al. 2009 • Not a hashing algorithm, but a hybrid of Hierarchical K-Means and KD-tree. X 1 2 X 3 4 5 KD-Tree LSH Hash code : 11010**Our Approach**• Based on Dictionary Learning (DL) and Sparse Coding (SC) • Algorithm steps: • For each data vector v, • Represent v as a sparse vector w using a dictionary B • Encode w as a hash code T • Store w at H(T), where H is a hash table indexed by T • End • Given query vector q, • 1. Generate sparse vector wq and hash code Tq • 2. Find ANN(q) in H(Tq)**Dictionary Learning and Sparse Coding**Dictionary learning:- An algorithm to learn “atoms” from data. Sparse Coding:- An algorithm to represent data in terms of a few “atoms” in the dictionary. An Analogy Dictionary Learning Dictionary of basic atoms Data**Dictionary Learning and Sparse Coding**Dictionary learning:- An algorithm to learn “atoms” from data. Sparse Coding:- An algorithm to represent data in terms of a few “atoms” in the dictionary. An Analogy Dictionary Learning Image data Dictionary of basic atoms**Dictionary Learning and Sparse Coding**Dictionary learning:- An algorithm to learn “atoms” from data. Sparse Coding:- An algorithm to represent data in terms of a few “atoms” in the dictionary. An Analogy 0 x Na 0 x Li 0 x Be . 2 x H . . 1 x O . 0 x Xe 0 x Rn Sparse Coding Data vector Sparserepresentation (lots of zeros) Sparse atom selection**Dictionary Learning and Sparse Coding**Dictionary learning:- An algorithm to learn “atoms” from data. Sparse Coding:- An algorithm to represent data in terms of a few “atoms” in the dictionary. An Analogy 0.0 x 0.0 x . 1.2 x . . 0.4 x . 0.0 x 0.0 x Sparse Coding Image Sparserepresentation (lots of zeros) Sparse atom selection**Sparse Codes as Hash Codes**“10, 33, 77, 90” Subspace Combination Tuple (SCT) (hash code) Sparse code Hashing Illustration Dictionary Data vector Sparse code Hash table**Sparse Coding & NN Retrieval Connection**High probability New data point**Advantages of Sparse Coding for NN Retrieval**• Hashing efficiency • Large number of hash codes • 2knCkk-sparse codes against 2kcodes of LSH • Storage efficiency • Need to store only sparse coefficients • Against entire data vectors as in LSH • Query efficiency • Linear search on low dimensional sparse vectors • No curse of dimensionality • Sparse coding complexity • O(ndk) for a dictionary of n atoms each of dimension d • and generating k-sparse codes. 1-sparse 2-sparse**Disadvantage: Sensitivity to Data Perturbation!**• Sparse coding fits hyperplanes to dense regions of data • There are 2knCkhyperplanes for k-sparse code and n-atom dictionary • Example: Assume n=1024, k=10 • We have 1030 hyperplanes • Data partitions can be too small! • Small data perturbations can lead data points to change partitions • Different partitions imply different hash codes and hashing fails!**Robust NN Retrieval**• Align dictionary atoms compensating for data perturbations • Approaches • Let perturbations be noise. Develop a denoising model • Make data immune to worst case perturbation • Hierarchical data space partitioning • Larger partitions subsume smaller partitions • Generate multiple hash codes, one for each partition**Robust Dictionary Learning**Denoising approach Robust Optimization Basis learned Subtract off Gaussian noise Project data to worst case perturbation Basis learned Subtract off Laplacian noise Worst case perturbation • Data has large and small perturbations • Assume Gaussian noise for small perturbations • Assume Laplacian for large but sparse perturbations. • Denoisefor Gaussian + Laplacian noise • Resulting denoised data should produce same SCT! • No assumptions on noise distribution • Learn worst case perturbation from a training set • Project every data point as if perturbed by the worst case noise • Learn basis on the perturbed data • Resulting immunized data should produce same SCT!**Robust Dictionary Learning: Experimental Results**Denoising approach Tree UBC Wall Boat Robust optimization Bark Bike Graf Leu INRIACopyday’s Dataset**Robust Sparse Coding**• Based on the regularization path of sparse coding • Similar data points will have similar regularization paths Similar data points & basis activations Dissimilar data points & basis activations • Main idea: • Generate multiple SCTs for each increasing regularizations • Multi-Regularization Sparse Coding (MRSC) algorithm • Increasing regularization means bigger data partitions & more robustness**Robust Sparse Coding: Experimental Results**MNIST Digits Holidays SIFT (2M) CIFAR 10 objects SHRECspin Images (2M)**Robust Sparse Coding: Experimental Results (SIFT)**Timing Timing/Scalability Scalability Robustness**Sparse Coding for Covariances: Generalized Dictionary**Learning • Basic idea • Extend sparse coding framework for matrix valued data • Sparse vector Sparse diagonal matrix • Vector dictionary Non-negative rank-one dictionary